License: arXiv.org perpetual non-exclusive license
arXiv:2312.04692v1 [cs.CR] 07 Dec 2023

Diffence: Fencing Membership Privacy With Diffusion Models

Yuefeng Peng University of Massachusetts Amherst
[email protected]
   Ali Naseh University of Massachusetts Amherst
[email protected]
   Amir Houmansadr University of Massachusetts Amherst
[email protected]
Abstract

Deep learning models, while achieving remarkable performance across various tasks, are vulnerable to member inference attacks, wherein adversaries identify if a specific data point was part of a model’s training set. This susceptibility raises substantial privacy concerns, especially when models are trained on sensitive datasets. Current defense methods often struggle to provide robust protection without hurting model utility, and they often require retraining the model or using extra data. In this work, we introduce a novel defense framework against membership attacks by leveraging generative models. The key intuition of our defense is to remove the differences between member and non-member inputs which can be used to perform membership attacks, by re-generating input samples before feeding them to the target model. Therefore, our defense works pre-inference, which is unlike prior defenses that are either training-time (modify the model) or post-inference time (modify the model’s output).

A unique feature of our defense is that it works on input samples only, without modifying the training or inference phase of the target model. Therefore, it can be cascaded with other defense mechanisms as we demonstrate through experiments. Through extensive experimentation, we show that our approach can serve as a robust plug-n-play defense mechanism, enhancing membership privacy without compromising model utility in both baseline and defended settings. For example, our method enhanced the effectiveness of recent state-of-the-art defenses, reducing attack accuracy by an average of 5.7% to 12.4% across three datasets, without any impact on the model’s accuracy. By integrating our method with prior defenses, we achieve new state-of-the-art performance in the privacy-utility trade-off.

1 Introduction

Deep learning has achieved remarkable achievements in recent years. However, it’s been demonstrated that deep learning models can memorize information from their training [1, 2, 3], making them susceptible to membership inference attacks (MIAs). Such attacks seek to infer if a specific data point was part of a model’s training set [4]. Given that deep learning models are often trained on sensitive data, including facial images and medical records, the potential success of MIAs poses a significant threat to individual privacy [4].

Refer to caption
Figure 1: Examples of original samples and their reconstructions on CIFAR10. Original samples are successfully identified as members by the adversary, while the reconstructed samples are classified as non-members.
Refer to caption
Figure 2: An illustration of different defense stages and our proposed defense. Defenses are categorized according to their implementation stages in the machine learning pipeline: the training phase, pre-inference phase, and post-inference phase. Different defense strategies can be deployed at various stages for an integrated defense approach. Our method, uniquely positioned in the pre-inference phase, is compatible with all other methods.
TABLE I: A comparison to prior works. ✓means the information is required by the adversary, - otherwise.

Technique

Requires Re-training

Requires Additional Data

Impact on Model Accuracy

Deployment Stage

AdvReg [5]

High

Training

MemGuard [6]

-

None

Post-Inference

DPSGD [7]

-

High

Training

SELENA [8]

-

Low

Training

RelaxLoss [9]

-

None

Training

HAMP [10]

-

Low

Training

Ours (Scenario 1)

-

None

Pre-inference

Ours (Scenario 2)

-

-

None

Pre-inference

Ours (Scenario 3)

-

-

None

Pre-inference

The growing threat of MIAs has motivated the development of different defense strategies. Existing defense mechanisms can be categorized into two main categories based on their application phase in the machine learning pipeline. The first category encompasses defenses that operate during the training phase. These methods employ privacy-preserving techniques to train models in a way that reduces their memorization of training data, thereby mitigating privacy risks [7, 5, 8, 9, 10]. Key approaches within this category include Differential Privacy Stochastic Gradient Descent (DPSGD) [7], adversarial regularization [5], and knowledge distillation [8]. The second category targets the post-inference phase, focusing on reducing membership privacy leakage by directly rectifying the disparities in the model’s outputs between members and non-members. A representative defense in this category is MemGuard [6], which counters the MIAs by altering the confidence vector to fool the adversary’s attack model. Despite the different defense approaches proposed, existing defenses still suffer from several limitations, as exemplified in Table I. For example, they often struggle to achieve an optimal privacy-utility trade-off [9, 8]. Additionally, some defenses often rely on extra data to support their methods [5, 6].

A new category for MIA defense: In this paper, we introduce a third category of defense mechanisms that operates in the pre-inference stage. A pre-inference defense does not alter model outputs or the model itself, as with the two main categories introduced above. Instead, a pre-inference defense mechanism modifies input samples, before they are sent to the target model for inference. To our best knowledge, no prior work has developed a pre-inference defense against MIAs. Figure 2 compares the three categories of MIA defense techniques, and Table I shows example mechanisms across these categories.

Our contributions. In this paper, we introduce a novel diffusion model-based defense that can enhance the privacy of both the model without defense and those equipped with other defenses, without sacrificing any utility. Advanced generative models such as GANs [11] and diffusion models [12] have been previously employed during inference to safeguard deep learning models from adversarial attacks [13]. However, the application of generative models during inference to defend privacy attacks, specifically MIA, remains largely unexplored. In this paper, we show that diffusion models can also serve as powerful tools integrated into the defense framework against MIAs, preserving membership privacy without harming model utility.

The target model’s memorization of members results in distinct behaviors when classifying seen (member) versus unseen (non-member) samples. MIAs exploit these behavioral discrepancies to distinguish between members and non-members, though different attack methods may leverage different features. In this work, we focus on black-box attacks, which typically infer membership by utilizing information from the sample outputs. He et al. [14] categorized the information exploited in these attacks into two parts: prediction posteriors and labels. We contend that the fundamental cause of MIAs is the non-negligible gap between the model’s outputs for members and non-members. Building on this, we divide the features exploitable by attacks into two categories: the train-to-test accuracy gap and the prediction distribution gap. These features, as utilized in existing attacks [4, 15, 16, 5, 17], are summarized in Table II. Note that some attacks which rely solely on labels [18, 19] may also indirectly exploit the prediction distribution gap through the robustness of the predicted labels.

TABLE II: Summary of the features and information utilized by various attacks. ✓means the specific gap is exploited by the attack, - otherwise. Regarding features, P denotes the use of prediction posteriors and L denotes the use of label.

Attacks

Train-to-test Accuracy Gap

Prediction Distribution Gap

Features

NN-based attacks [15, 4, 5]

P,L

Metric-corr [16]

-

L

Metric-loss [16]

P,L

Metric-conf [15]

-

P

Metric-ent [17]

-

P

Metric-ment [15]

P,L

Label-only attacks [18, 19]

P,L

A successful MIA defense should fundamentally eliminate/shrink the two gaps described above between members and nonmembers. Our defense mainly focuses on eliminating the prediction distribution gap while leaving the samples’ prediction labels unchanged to avoid any loss in utility. This is achieved by reconstructing the samples using a generative model before they are input into the target model. Through sample reconstruction, the model encounters samples in the inference stage that are not exact replicas of those observed during training, regardless of whether they are member or non-member samples. We show that this method effectively reduces the discrepancy in the prediction distributions between members and non-members.

Our method involves generating multiple reconstructed images and selecting the best one that suits our purpose. Among the generated samples, only those with the same predicted label as the original sample are considered as candidates for selection. This ensures that the model’s predictive outcomes remain unchanged, thereby maintaining the model’s accuracy. We designed different sample selection strategies for three scenarios based on the assumptions in the defender. These scenarios include a defender having access to both some member and non-member samples, having access only to members, and having access solely to a trained model without any knowledge of samples’ memberships. We demonstrate that our method is effective across all three scenarios. We show that with more information, defenders can perform a more nuanced analysis of the prediction distribution. This enables a more strategic selection of reconstructed images, forcing the prediction distributions of members and non-members to align more closely, thereby providing robust protection. Figure 1 provides some example inputs and their selected reconstructions. The membership of the original samples was successfully inferred by attackers while their reconstructed versions prevent the disclosure of membership.

While our defense can be implemented using arbitrary generative models, we use diffusion models [20] due to their state-of-the-art performances in generative tasks [12]; we therefore call our mechanism Diffence, i.e., a diffusion-based fence against MIAs. A diffusion model operates in two phases: (i) the forward diffusion phase incrementally adds noise to samples, and (ii) the reverse generation phase generates new samples by applying multi-step denoising. To generate a new sample from an existing one, the original sample is subjected to a specific number of forward diffusion steps, followed by its reverse denoising steps. The procedure is suitable for reconstructing an existing sample in detail. The resultant sample, although distinct from the original in irrelevant details, retains identical semantic attributes, which aligns with our desired defensive objective. Note that our pre-inference defence can be cascaded with other (i.e., training or post-inference) defense mechanisms, due to its minimal deployment constraints and its plug-and-play flexibility.

We conduct comprehensive experiments, comparing our defense mechanisms against six state-of-the-art defense strategies across three datasets using two popular model architectures, ResNet [21] and DenseNet [22]. Our empirical results validate our approach’s effectiveness in enhancing membership privacy without decreasing the model utility, irrespective of the model’s operational context—be it a baseline (vanilla) or a defended setting.

We summarize our contributions as follows:

  1. 1.

    We propose a novel diffusion model-based membership inference defense framework, which can enhance the membership privacy of pre-existing (both undefended and defended) models without compromising the utility of the model.

  2. 2.

    We propose a new defense pipeline, which for the first time combines defenses deployed at different stages to achieves better defense performance.

  3. 3.

    We implemented the prototype of the proposed method. Our extensive experiments show that our proposed defense can effectively improve membership privacy of existing models without utility loss. For example, We successfully reduced the attack accuracy of the most recent state-of-the-art defenses, HAMP and RelaxLoss, from 57.41% to 53.61% and from 63.51% to 60.13% on CIFAR100, which achieved new state-of-the-art performance, without any loss in model accuracy. Furthermore, we show that in certain settings our method can enhance both the utility and privacy of the model.

2 Background and Preliminaries

2.1 Diffusion Models

Generative image modeling has seen remarkable advancements in recent years, with Generative Adversarial Networks (GANs) [11] and Variational Autoencoders (VAEs) [23] standing out as the pioneering architectures for synthesizing realistic images [24, 25, 26, 27]. While these frameworks have laid the groundwork and achieved significant successes, diffusion models [28] have recently emerged, surpassing their predecessors in terms of performance and establishing themselves as a leading approach in the domain of image synthesis [12].

Denoising Diffusion Probabilistic Models (DDPMs) [20] have emerged as a significant advancement in generative image modeling and operate by reversing a diffusion process. This diffusion mechanism can be expressed as

xt=1βtxt1+βtϵtsubscript𝑥𝑡1subscript𝛽𝑡subscript𝑥𝑡1subscript𝛽𝑡subscriptitalic-ϵ𝑡x_{t}=\sqrt{1-\beta_{t}}x_{t-1}+\sqrt{\beta_{t}}\epsilon_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = square-root start_ARG 1 - italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + square-root start_ARG italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (1)

where ϵtsubscriptitalic-ϵ𝑡\epsilon_{t}italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT represents Gaussian noise, and βtsubscript𝛽𝑡\beta_{t}italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT dictates the magnitude of noise introduced at each iteration. After a set number of timesteps, typically denoted as T𝑇Titalic_T, the data xTsubscript𝑥𝑇x_{T}italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT becomes predominantly noise-infused.

Reconstructing meaningful samples involves reversing this process. A denoising function is trained by DDPM, which, when given a noised sample xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, predicts the less-noisy version from the preceding timestep xt1subscript𝑥𝑡1x_{t-1}italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT. The optimization goal is to minimize the difference between the true and the predicted data, typically through a mean squared error loss. During the sampling phase, DDPM initiates with a noise distribution sample and employs the trained denoising function to reverse the diffusion, yielding samples that closely mirror the original data distribution.

2.2 Threat Model

2.2.1 Attacker

Black-box Access: Following prior works [6, 10], we assume the attacker has black-box access to the target model. This means the attacker can query the target model using a black-box API and receive corresponding prediction vectors, while direct access to the model’s parameters remains restricted.

Partial Knowledge of Membership: Like previous defenses [8, 10], we assume a strong attacker that knows a limited number of samples from both the training and test sets, i.e., it knows the membership of some members and some non-members. The attack’s goal is to infer the membership of any other unexposed sample.

Full Knowledge of Defense Technique: We assume the attacker is fully aware of the defense technique’s deployment and the architecture of the target model. As a result, for attacks that need shadow models, the attacker can train shadow models that mirror the target model’s training method. In the context of our approach, we assume the attacker has full knowledge of the diffusion model and how it is deployed.

2.2.2 Defender

We assume the defender possesses a private dataset and uses it to train a target model. The defender’s objective is to securely release the target model for user accessibility. The defender aims to strike a balance between achieving minimal membership privacy leakage and maintaining high classification accuracy.

3 Methodology

Our main idea is to eliminate the differences in predictive behavior between members and non-members, thereby fundamentally mitigating membership privacy leakage. As described in Section 1, we categorize these disparities into the train-to-test accuracy gap and the prediction distribution gap. Our method primarily targets the elimination of the prediction distribution gap. However, by combining our approach with other defenses, we can simultaneously address both gaps, achieving state-of-the-art defense performance. In this section, we first explain how differences in predictive behavior can lead to privacy leakage. We then discuss how we design our defense to narrow these gaps.

3.1 Prediction Gaps between Members and Non-members

Prior studies on MIAs have highlighted a distinct model behavior concerning training and test data, pinpointing it as a key contributor to membership privacy leakage [29]. The gap in prediction distributions between members and non-members can be reflected in various features. These may include assigning higher confidence levels, lower loss, and reduced prediction entropy to members. The disparities in these features are key factors leading to membership privacy leakage, with almost all effective MIAs exploiting one or several of these feature discrepancies [16, 4, 17, 29]. Moreover, these features are interrelated; for instance, lower loss often implies reduced prediction entropy and higher confidence levels. Carlini et al. [29] also proposed parametric modeling of prediction confidence to achieve a more Gaussian-like distribution. Such a distribution more effectively distinguishes the prediction distribution gap between members and non-members. In our subsequent discussions, we adopt this parametric modeling approach to represent the prediction gap. The only difference in our approach is the selection of the maximum confidence from the output vector, rather than the confidence of the correct class, as our focus is on the prediction distribution gap rather than the train-to-test accuracy gap. The parametric modeling function is shown in equation 2.

ϕ(p)=log(p1p), for p=max(f(x))formulae-sequenceitalic-ϕ𝑝𝑝1𝑝 for 𝑝𝑚𝑎𝑥𝑓𝑥\phi(p)=\log(\frac{p}{1-p}),\text{ for }p=max(f(x))italic_ϕ ( italic_p ) = roman_log ( divide start_ARG italic_p end_ARG start_ARG 1 - italic_p end_ARG ) , for italic_p = italic_m italic_a italic_x ( italic_f ( italic_x ) ) (2)
Refer to caption
Refer to caption
Refer to caption
Figure 3: Prediction distribution gap between members and non-members: The figure illustrates the disparities in prediction distributions with respect to confidence levels, prediction entropy, and parametrically modeled confidence (predicted logits).

Figure 3 plots histograms illustrating the differences in predictions for members and non-members on CIFAR-100 without defenses. It can be observed that there are significant disparities in the output distributions between members and non-members, especially in the parametric modeled confidence which we refer to as logit in this paper. Attackers can easily exploit these differences to distinguish between members and non-members.

3.2 Proposed Diffence

To solve the issues described above, we propose a novel method that leverages input reconstruction to eliminate the prediction disparity between members and non-members. Our core idea is that by reconstructing samples, the model, during classification, encounters samples that are distinct from those it has seen during the training phase, irrespective of whether they are from members or non-members, thereby reducing inconsistencies in predictions. Our method aims to reconstruct the finer details of samples without altering their semantic content. To this end, we employ diffusion models as our generative model, as they align well with our objectives. We provide more details below.

3.2.1 Sample Reconstruction

For each input image, we apply a two-phase process to obscure and subsequently reconstruct its inherent details. First, we apply the forward diffusion procedure using the closed-formed expression in Equation 3 provided by Ho et al. [20] to add Gaussian noise to the image.

𝒙t=α¯t𝒙0+1α¯tϵsubscript𝒙𝑡subscript¯𝛼𝑡subscript𝒙01subscript¯𝛼𝑡bold-italic-ϵ\bm{x}_{t}=\sqrt{\bar{\alpha}_{t}}\bm{x}_{0}+\sqrt{1-\bar{\alpha}_{t}}\bm{\epsilon}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = square-root start_ARG over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + square-root start_ARG 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG bold_italic_ϵ (3)

Where αt=1βtsubscript𝛼𝑡1subscript𝛽𝑡\alpha_{t}=1-\beta_{t}italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 1 - italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and α¯t=i=1tαisubscript¯𝛼𝑡superscriptsubscriptproduct𝑖1𝑡subscript𝛼𝑖\bar{\alpha}_{t}=\prod_{i=1}^{t}\alpha_{i}over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Then, the reverse process is applied aiming to recover the original image from the noisified image.

The two-step reconstruction of the image aligns well with our defensive objectives. If we look into the frequency domain by using Fourier Transform to both input images x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and noise ϵitalic-ϵ\epsilonitalic_ϵ, we get:

(𝒙t)=α¯t(𝒙0)+1α¯t(ϵ)subscript𝒙𝑡subscript¯𝛼𝑡subscript𝒙01subscript¯𝛼𝑡bold-italic-ϵ\mathcal{F}(\bm{x}_{t})=\sqrt{\bar{\alpha}_{t}}\mathcal{F}(\bm{x}_{0})+\sqrt{1% -\bar{\alpha}_{t}}\mathcal{F}(\bm{\epsilon})caligraphic_F ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = square-root start_ARG over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG caligraphic_F ( bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + square-root start_ARG 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG caligraphic_F ( bold_italic_ϵ ) (4)

Typically, images display a high response to low-frequency content and a notably weaker response to high-frequency content. This is because of the inherent smoothness of most images. The dominant low-frequency components encapsulate essential visual information, while the high-frequency components, associated with fine details and edges, are comparatively subdued.

Considering a small t𝑡titalic_t, where α¯tsubscript¯𝛼𝑡\bar{\alpha}_{t}over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT closely approximates 1, the perturbations in the frequency domain are minor. Note that the Fourier transform of a Gaussian sampling is itself Gaussian. Consequently, at small t𝑡titalic_t values, the forward process washes out the high-frequency content without perturbing the low-frequency content much. This leads to a faster alteration of the high-frequency elements compared to the low-frequency ones in the forward process. This dynamic is also relevant in the context of the reverse process of denoising models. At lower t𝑡titalic_t values, these models predominantly focus on reconstructing high-frequency content [30].

In our defense strategy, we denote the diffusion step applied to each sample as T𝑇Titalic_T. By selecting an appropriate value for T𝑇Titalic_T, we can preserve the essential semantic content of each image while altering its finer details. The impact of the diffusion step T𝑇Titalic_T on the performance of our defense is further explored in Section 4.3.2. During the inference phase, the two-step reconstruction procedure reduces the gap between the prediction distributions of members and non-members.

3.2.2 Sample Selection

Refer to caption
(a) Scenario 1
Refer to caption
(b) Scenario 2
Refer to caption
(c) Scenario 3
Figure 4: Prediction distribution gap between members and non-members under our defense across the three scenarios. This figure displays the distribution of predicted logits on the CIFAR100 dataset using ResNet with our defense in the three scenarios.

We observed that the stochastic nature of sample generation can lead to a decline in sample quality sometimes when only one reconstructed sample is generated for each image, potentially resulting in decreased model test accuracy. To address this issue, we propose generating multiple reconstructed images for each original image. Then in alignment with the specific needs of our defense strategy, we carefully select the most appropriate samples to be utilized as the final input sample.

In our approach, for each original image Iisubscript𝐼𝑖I_{i}italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT requiring privacy protection, we generate N𝑁Nitalic_N reconstructed versions, denoted as Ri={Ri1,Ri2,,RiN}subscript𝑅𝑖subscript𝑅𝑖1subscript𝑅𝑖2subscript𝑅𝑖𝑁R_{i}=\{R_{i1},R_{i2},\ldots,R_{iN}\}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { italic_R start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT , italic_R start_POSTSUBSCRIPT italic_i 2 end_POSTSUBSCRIPT , … , italic_R start_POSTSUBSCRIPT italic_i italic_N end_POSTSUBSCRIPT }. The model f𝑓fitalic_f outputs a prediction for each reconstructed image Rijsubscript𝑅𝑖𝑗R_{ij}italic_R start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, i.e., f(Rij)𝑓subscript𝑅𝑖𝑗f(R_{ij})italic_f ( italic_R start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ). From these, we identify a subset of reconstructions whose predicted labels match the label of the original image Iisubscript𝐼𝑖I_{i}italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT forming a set of candidate reconstructions, denoted as C𝐶Citalic_C. We then select the most appropriate reconstructed image from the candidates. Selecting a sample from this candidate set ensures that the original prediction label of the sample remains unchanged, guaranteeing that our method does not impact the model’s accuracy.

Pi=select({f(Rij)f(Rij)=f(Ii),j=1,,N})subscript𝑃𝑖selectconditional-set𝑓subscript𝑅𝑖𝑗formulae-sequence𝑓subscript𝑅𝑖𝑗𝑓subscript𝐼𝑖𝑗1𝑁P_{i}=\text{select}(\{f(R_{ij})\mid f(R_{ij})=f(I_{i}),j=1,\ldots,N\})italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = select ( { italic_f ( italic_R start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ∣ italic_f ( italic_R start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) = italic_f ( italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_j = 1 , … , italic_N } ) (5)

Here, Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the final prediction for the original image Iisubscript𝐼𝑖I_{i}italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, derived from the model’s prediction for the selected reconstruction. The selection function, denoted as select()select\text{select}(\cdot)select ( ⋅ ), chooses the optimal reconstruction from Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT based on a criterion that evaluates the predictions of all N𝑁Nitalic_N reconstructions. This criterion is aligned with the defense’s objectives and may take into account the defender’s prior knowledge about the prediction distributions of members and non-members. We categorize our defense into three scenarios based on the information available to the defender and have accordingly designed distinct sample selection strategies for each.

Scenario 1. In this scenario, we assume that the defender has access to subsets of both member and non-member data. The defender initially generates N𝑁Nitalic_N reconstructed samples for each data point and then plots the prediction distributions for all these reconstructions. Through the analysis of these distribution plots, the defender can identify an optimal interval where the predictions should ideally fall to maximize the reduction of the prediction distribution gap between members and non-members. Specifically, a grid search is conducted over the overlap** regions of member and non-member prediction distribution [min(logitmem),max(logitnonmem)]𝑚𝑖𝑛𝑙𝑜𝑔𝑖subscript𝑡𝑚𝑒𝑚𝑚𝑎𝑥𝑙𝑜𝑔𝑖subscript𝑡𝑛𝑜𝑛𝑚𝑒𝑚[min(logit_{mem}),max(logit_{nonmem})][ italic_m italic_i italic_n ( italic_l italic_o italic_g italic_i italic_t start_POSTSUBSCRIPT italic_m italic_e italic_m end_POSTSUBSCRIPT ) , italic_m italic_a italic_x ( italic_l italic_o italic_g italic_i italic_t start_POSTSUBSCRIPT italic_n italic_o italic_n italic_m italic_e italic_m end_POSTSUBSCRIPT ) ] to determine this interval. The chosen interval aims to minimize the Jensen-Shannon (JS) divergence [31] between the prediction distributions of members and non-members within the selected range. This approach is based on the observation that a lower JS divergence between member and non-member prediction distributions typically indicates reduced membership privacy leakage [14]. This correlation has been confirmed by our experiments, as detailed in Section 4.3.1.

Defenders require only a small subset of samples to select the optimal interval. After the interval is determined, for each sample requiring protection, the defender randomly selects one sample from the candidate set C𝐶Citalic_C that falls within this interval based on the prediction. If none of the candidates are within the interval, the closest one is chosen. Another possible approach involves continuously generating reconstructions until the sample falls within the defined interval. However, we found this method to be inefficient. More details on this are discussed in Section 4.3.1.

Figure 4 provides an example of our defense strategy as applied to the CIFAR-100 dataset. It illustrates the prediction distribution of reconstructed samples for 1000 data points which is used to select the optimal interval. The figure also showcases the distribution of prediction logits for both members and non-members after the selection of the optimal interval using the steps outlined in our defense strategy. This visualization clearly demonstrates the effectiveness of our approach in aligning the prediction distributions of member and non-member data, thereby minimizing the potential for privacy leakage.

Scenario 2. In this scenario, the defender only has access to a subset of members, the interval selection must rely solely on the prediction distribution of member reconstructions. In such cases, where only the member prediction distribution is available, we set the interval as [min(logitmem),mean(logitmem)]𝑚𝑖𝑛𝑙𝑜𝑔𝑖subscript𝑡𝑚𝑒𝑚𝑚𝑒𝑎𝑛𝑙𝑜𝑔𝑖subscript𝑡𝑚𝑒𝑚[min(logit_{mem}),mean(logit_{mem})][ italic_m italic_i italic_n ( italic_l italic_o italic_g italic_i italic_t start_POSTSUBSCRIPT italic_m italic_e italic_m end_POSTSUBSCRIPT ) , italic_m italic_e italic_a italic_n ( italic_l italic_o italic_g italic_i italic_t start_POSTSUBSCRIPT italic_m italic_e italic_m end_POSTSUBSCRIPT ) ]. This is based on the observation that members typically exhibit higher confidence levels, and often, the lower half of the member distribution significantly overlaps with the upper half of the non-member distribution. By observing Figure 4 this approach also significantly narrows the gap between the prediction logits of members and non-members.

Scenario 3. In this scenario, the defender is unaware of the membership status of any sample, typically occurring when the defender is tasked with protecting an already trained model without having been involved in its training process. Our approach involves randomly selecting the prediction of one sample from the generated candidates as the output. Although this method does not deliberately restrict the prediction distribution, the reconstruction process inherently contributes to reducing the prediction distribution gap between members and non-members. This effect is achieved as the model encounters newly generated samples during the reconstruction phase, which are distinct from any data it was exposed to during training. This novelty in the samples ensures a more uniform response from the model, diminishing the likelihood of differential predictions between members and non-members.

3.2.3 Prediction Aggregation

By default, we adopt the aforementioned sample selection strategy as it consistently provides privacy protection without altering the model’s accuracy. An alternative option is to aggregate the predictions of all generated samples instead of selecting one. This approach may yield superior results in specific contexts; for example, we observed that direct averaging of predictions from generated samples can enhance both privacy and utility when the diffusion model is trained on sufficient data. Further details on this are provided in Section 4.2.4.

3.2.4 Integration with Other Defenses in a Plug-and-Play Manner

Our method simply employs off-the-shelf diffusion models, offering substantial flexibility in deployment. As outlined in Section 1, we categorize defenses into three deployment phases. Defenses deployed in different stages are compatible and can be used concurrently. Our approach, characterized by minimal defense assumptions, is unique in its deployment at the pre-inference stage and offers plug-and-play capability, allowing for integration with all other existing methods. As illustrated in Figure 2, when combined with training phase defenses, our method can integrated into the inference pipeline immediately after the model has been trained using privacy-preserving techniques.

When combined with post-inference defenses, our approach can be employed to reconstruct input samples within the inference pipeline, followed by the application of the post-inference defenses, such as MemGuard, to introduce noise into the output prediction vector. Our experiments validate the effectiveness of our method in conjunction with other defenses, while also providing new perspectives for the deployment of future defense mechanisms.

4 Evaluation

4.1 Experimental Setup

Datasets. We consider the following benchmark datasets that are widely used in prior works:

  • CIFAR-10 [32]: CIFAR-10 Comprising 60,000 32x32 color images across 10 classes, Each class contains 6,000 images.

  • CIFAR-100 [32]: CIFAR-100 have the same data format as CIFAR-10, but it has 100 classes, so each class has only 600 images.

  • SVHN [33]: SVHN contains 99,289 digit images of house numbers collected from Google Street View. Each image has a resolution of 32x32 and is labeled with the integer value of the digit it represents, from 0 to 9.

For CIFAR10/CIFAR100, and SVHN datasets, we selected 25,000 samples and 5,000 samples, respectively, to train the target models. The remaining samples from each dataset were reserved as non-members or references used by defenses or attacks.

Models. For target classifiers, we consider the widely used ResNet-18 [21] and DenseNet121 [22] as the target model architecture. In our default setup, each target model are trained for 100 epochs using the Adam optimizer, with a learning rate of 0.001. We apply an L2 weight decay coefficient of 10-6 and use a batch size of 128. As we found the performance of recent works can be affected by the training configurations of the target models, we also employ their settings to demonstrate that our defense enhances their best outcomes.

For the diffusion models integrated into our defense, we train standard DDPMs from scratch using the default hyper-parameters from the original DDPM paper. Unless otherwise mentioned, the diffusion model is trained on the same training data of the associated defended target classifier, following the assumptions that the model onwer has no extra data. Our default configuration involves generating N=50𝑁50N=50italic_N = 50 reconstructed images for each sample, with the number of diffusion steps T𝑇Titalic_T set to 40. Detailed discussions on the choice of these hyperparameters are provided in Section 4.3.2. To accelerate the inference process, we also employ Denoising Diffusion Implicit Models (DDIM(k𝑘kitalic_k)), where k𝑘kitalic_k indicates the inference interval. In our setting, k𝑘kitalic_k is set to 10, corresponding to 4 denoising steps when T=40𝑇40T=40italic_T = 40.

Attack methods. For evaluation we consider 6 state-of-the-art attack methods. These comprise NN-based attacks [5, 15], four threshold-based attacks (loss, confidence, entropy, M-entropy) and the recent LiRA [29] attack. In line with previous practice, we exclude attacks using only partial model output as they are strictly weaker than attacks above.

Evaluation metrics. We use Top-1 accuracy on validation dataset to quantify model’s utility. For privacy, we default to using five of the described attacks (excluding LiRA) to evaluate two common used metric: (i) attack accuracy and (ii) attack AUC. Additionally, we report the TPR (True Positive Rate) at 0.1% FPR and the TNR (True Negative Rate) at 0.1% FNR on SVHN dataset, using LiRA which is specifically designed to achieve superior results on this metric. We exclusively applied the LiRA attack on the SVHN dataset due to its high computational demands, as it requires the training of over 100 shadow models.

4.2 Experimental Results

Refer to caption
Refer to caption
Refer to caption
(a) Attack accuracy
Refer to caption
Refer to caption
Refer to caption
(b) Attack AUC
Figure 5: Attack accuracy and AUC on three datasets against ResNet under Scenario 1. We report the highest attack accuracy and attack AUC across all attacks. The prediction accuracy delta indicates the prediction accuracy gap compared to the undefended models, with negative numbers indicating a decrease in model utility.
Refer to caption
Refer to caption
Refer to caption
(a) Attack accuracy
Refer to caption
Refer to caption
Refer to caption
(b) Attack AUC
Figure 6: Attack accuracy and AUC on three datasets against DenseNet under Scenario 1. We report the highest attack accuracy and attack AUC across all attacks. The prediction accuracy delta indicates the prediction accuracy gap compared to the undefended models, with negative numbers indicating a decrease in model utility.
TABLE III: Average attack accuracy and AUC across three datasets under three scenarios on ResNet. The optimal defense results in each case are highlighted in bold.

Defenses

Prediction Accuracy Delta (%)

w/o Diffence w/ Diffence (Scenario 1) w/ Diffence (Scenario 2) w/ Diffence (Scenario 3)

Attack AUC (%)

Attack Accuracy (%)

Attack AUC (%)

Attack Accuracy (%)

Attack AUC (%)

Attack Accuracy (%)

Attack AUC (%)

Attack Accuracy (%)

Undefended

0

79.57

77.22

71.33 67.63

73.17

69.9

72.86

69.98

SELENA

-1.89

61.03

58.92

58.53 56.38

59.12

57.59

58.99

57.51

AdvReg

-6.63

61.77

59.54

59.49 57.41

59.57

57.42

59.53

57.43

Hamp

-0.27

80.08

76.67

72.63 67.85

73.98

69.44

73.63

69.46

RelaxLoss

0.11

75.29

72.29

69.06 65.35

69.73

67.28

69.17

67.20

DP-SGD

-9.10

56.44

55.75

55.48

54.8

55.47 54.46

55.33

54.56

Memguard

0

72.50

68.49

67.41 64.63

68.90

65.81

69.10

65.81

4.2.1 Comparison to Baselines

We conducted an extensive evaluation of our method across three datasets. For each dataset, we evaluated the performance of models across seven different cases including one without any protection and six others, each employing a different defense mechanism. We tested the impact of our method on the models’ test accuracy and the changes in attack accuracy and attack AUC under three scenarios introduced in Section 4. Our results show that our method consistently enhanced privacy protection across all settings without compromising the models’ utility. We discuss the specific effects of our method below.

Figure 5 and Figure 6 shows the performance of our method against ResNet and DenseNet under Scenario 1, respectively. We can observe that our approach consistently reduces the attack’s AUC and accuracy across all cases, particularly against those methods that preserve model usability but offer only limited protection. For instance, when applied to ResNet, our method managed to decrease the attack accuracy for undefended, HAMP, RelaxLoss, and Selena models by 12.4%, 11.2%, 8.9%, and 5.7%, respectively on average across three datasets. Similarly, the attack AUCs showed reductions of 10.1%, 9.1%, 7.9%, and 3.9% after employing our approach. The experimental results demonstrate that our method can significantly reduce privacy threats, even with reliance on only a small amount of additional data (1000 non-members in our experiments). Our experiments suggest that the optimal defense practice is combining our method with recent utility-preserving defenses. This combination leverages their advantages in maintaining utility while enhancing their privacy protection. For instance, the integrations of Diffence with MemGuard, Selena, and RelaxLoss respectively achieved the best privacy-utility trade-offs on three different datasets when applied to DenseNet.

Table III presents the experimental results using ResNet under all three different scenarios. We show the average attack AUC and accuracy reduction of our method across three datasets in each scenario. The average defense effectiveness in Scenarios 2 and 3 is slightly weaker than in Scenario 1, due to stronger assumptions placed on the defender in these scenarios. However, it is observable that they still achieve significant privacy enhancements across all settings without compromising the model’s utility. In the research of MIA defense, balancing sufficient privacy protection with model utility has always been one of the most important objectives. Our results suggest that our method represents a significant step forward towards the ideal defense.

4.2.2 Effectiveness on Attack TPR and TNR Metrics

Refer to caption
Refer to caption
Figure 7: Attack TPR and FPR on SVHN dataset We report the highest attack TPR and attack FPR across all attacks.

Following the practice of Carlini et al. [29], we measure the reliability with which an adversary can violate the privacy of even a few users in a sensitive dataset using metrics such as TPR at 0.1% FPR and TNR at 0.1% FNR. We report on the defensive effectiveness of our method under the third scenario against these metrics on SVHN, employing six attacks including LiRA. As shown in Figure 7, our method significantly reduces the attack TPR and TNR in most cases. For instance, our method reduced the attack TPR against the undefended model and HAMP by 84.7% and 61%, respectively. Our approach has a relatively smaller impact on methods that already exhibit low TPR, but these methods often compromise the model’s utility. Therefore, our method can be advantageously combined with privacy-preserving defenses.

4.2.3 Improving Performance Beyond Previous State-of-the-Art

TABLE IV: Performance of our approach on CIFAR-100. Here we directly adopted settings from previous papers.

SELENA

defenses

Training accuracy (%)

Test accuracy (%)

Attack AUC (%)

Attack accuracy (%)

Undefended / +ours

99.98 / 99.98

78.19 / 78.19

77.72 / 63.4

74.02 / 60.73

SELENA / +ours

78.19 / 78.19

74.47 / 74.47

56.56 / 53.37

55.18 / 53.08

RelaxLoss

defenses

Training accuracy

Test accuracy

Attack AUC

Attack accuracy

Undefended / +ours

65.67 / 65.67

32.99 / 32.99

71.69 / 59.5

67.58 / 58.04

RelaxLoss / +ours

52.68 / 52.68

35.7 / 35.7

59.93 / 54.95

57.43 / 53.61

HAMP

defenses

Training accuracy

Test accuracy

Attack AUC

Attack accuracy

Undefended / +ours

92.98 / 92.98

58.51 / 58.51

68.47 / 67.4

67.09 / 66.02

HAMP / +ours

83.38 / 57.25

83.38 / 57.25

65.72 / 62.69

63.51 / 60.13

Recent advanced defenses like SELENA, RelaxLoss, and HAMP have been reported state-of-the-art defense performance in terms of privacy-utility trade-off in their respective papers. However, our experments suggest that the defensive robustness of RelaxLoss and HAMP is contingent upon the choice of training parameters, leading to variability in their effectiveness.

For a fair and accurate comparison, we adopted the experimental settings deployed in their original papers to assess the effectiveness of our proposed method. The results are shown in Table IV. Note that our approach maintains the original predicted labels of the model. While this does not eliminate the privacy risks posed by the accuracy gap between training and test sets, it significantly mitigates the risk of privacy leakage arising from other inconsistent prediction behaviors.

Consider the instance of SELENA: the undefended model exhibits a 21.79% gap (99.98% - 78.19%) between training and test set accuracies. Consequently, a simplistic gap attack — assuming all correctly classified samples as members — could attain an attack accuracy of 60.9%. In contrast, our defense reduces the best performance of other attacks to an accuracy of just 60.73%. This implies that attackers can gain no improvement over the baseline attack, a substantial reduction from the 4.7% improvement possible with SELENA.

Beyond merely lowering attack accuracy, our method can combined with other defenses, achieving reduced attack AUC and accuracy across all scenarios without compromising the model’s accuracy. Our experiments demonstrate that our method can be applied to the optimal defense models proposed in prior works, enhancing their privacy protection.

4.2.4 Defend with Stronger Diffusion Models

In our previous experiments, we assumed that the training set size of the defender’s diffusion model and the size of the classifier’s training set are the same. Furthermore, to ensure no decrease in utility, we only select reconstructed samples that match the original sample labels.

Interestingly, however, we discovered that when a stronger diffusion model is employed in the defense—specifically, a model trained on a dataset larger than that of the classifier, directly averaging the predictions of all generated samples can enhance both the classifier’s privacy and its utility.

We utilized a publicly available diffusion model trained on 50,000 samples from CIFAR-10 to defend our classifiers, which were trained on 25,000 samples. The results, as shown in the Table V, indicate that in all cases, the classifiers not only achieved enhanced privacy but also exhibited higher test accuracy. Given that diffusion model training does not require labeled data, it can be performed on a large, public, unlabeled dataset. We leave a more detailed discussion on how diffusion models can improve classifiers’ utility for future work.

TABLE V: Performance of our approach on CIFAR-10 when using a stronger diffusion model.

Defenses

Test Accuracy (%)

Attack Accuracy (%)

Attack AUC (%)

Undefended / +Ours

81.4 / 83.18

72.98 / 66.19

75.85 / 69.3

AdvReg / +Ours

78.4 / 80.32

60.31 / 57.52

60.99 / 58.17

DPSGD / +Ours

74.66 / 75.72

55.31 / 54.38

55.3 / 54.47

SELENA / +Ours

79.2 / 79.8

60.36 / 57.17

60.08 / 57.02

RelaxLoss / +Ours

81.24 / 82.08

65.13 / 61.48

69.52 / 63.7

HAMP / +Ours

81.12 / 82.74

65.2 / 62.5

69.44 / 62.16

4.3 Ablation study

As introduced in Section 3, our approach employs a diffusion model to reconstruct samples, protecting the membership privacy of the original samples. We generate multiple samples and, based on the defense objectives and the information available to the defender, select the prediction of the most suitable sample as the final output. This process involves considering several factors, including the choice of interval, the number of reconstructions N𝑁Nitalic_N generated for each sample, and the diffusion step T𝑇Titalic_T. In this section, we delve further into discussing the impact of the parameters and sample selection strategy.

4.3.1 Sample Selection Strategies

Refer to caption
Figure 8: Cumulative Distribution Function (CDF) of the number of generations required for reconstructions to fall within different intervals when T=60𝑇60T=60italic_T = 60 on CIFAR100. The ’percentile’ indicates the anticipated probability of reconstructions falling within the given interval, as determined by analyzing the samples available to the defender.

In Scenario 1, the main idea of our method involves setting an interval within which the prediction logits of both members and non-members are encouraged to fall. This strategy aims to align the prediction distributions of members and non-members towards a range in the middle. It is important to note that our method uses a fixed number of reconstructions N𝑁Nitalic_N and diffusion steps T𝑇Titalic_T as the basis for selecting the optimal interval. Given the fixed N𝑁Nitalic_N, not all samples can fall within the chosen interval. For these outliers, our approach is to select the closest sample.

An alternative approach might involve continuously generating reconstructions for each sample after setting the interval, to force their predictions into this range. However, we found this method to be highly inefficient. Figure 8 illustrates the cumulative distribution of samples falling within various intervals as the number of reconstructions increases. It reveals that if a sample does not fall within the interval in its initial generation, it is unlikely to do so in subsequent iterations. Therefore, continuously generating new samples is not a practical solution.

In terms of interval selection, we tested various intervals and opted for the one where the JS divergence between members’ and non-members’ prediction logits is minimized. This approach is predicated on our observation that the JS divergence of prediction logits between member and non-member samples correlates significantly with attack performance. This correlation has also been noted by He et al. [14] in their research.

Figure 9 depicts attack AUC and accuracy on three datasets in relation to the different JS divergence between member and non-member prediction distributions when our method is solely applied. Each point in the figure corresponds to a selectable interval, where choosing different intervals results in varying JS divergences. A key observation is the negative correlation between JS divergence and privacy protection – smaller JS divergences typically indicate stronger membership privacy protection. These results substantiate the effectiveness of our interval selection strategy.

Refer to caption
(a) CIFAR10
Refer to caption
(b) CIFAR100
Refer to caption
(c) SVHN
Figure 9: Attack AUC and accuracy under different levels of Jensen-Shannon (JS) divergence between member and non-member prediction distributions. We tested it on three dataset using ResNet and set diffusion steps T=40𝑇40T=40italic_T = 40 and the number of reconstructions N𝑁Nitalic_N to 50.

4.3.2 Effect of Hyperparameters

Refer to caption
(a) Attack AUC
Refer to caption
(b) Attack accuracy
Figure 10: Attack AUC and Attack Accuracy Against Our Defended Models with Different Numbers of Generated Samples N𝑁Nitalic_N and Diffusion Steps T𝑇Titalic_T. We evaluate using ResNet on CIFAR100 dataset, varying N𝑁Nitalic_N from 25 to 200 and T𝑇Titalic_T from 10 to 80.

We then varied the values of N𝑁Nitalic_N and T𝑇Titalic_T to observe the changes in defense performance. Note that our method does not alter the prediction labels, and therefore, does not impact the model’s test accuracy.

An increase in T𝑇Titalic_T leads to a more substantial alteration of the original sample, thereby enlarging the divergence between the generated samples and the original ones. A higher N𝑁Nitalic_N enhances the pool of alternative images, enriching the selection process for our sample selection strategy. This, in turn, mitigates randomness and enables a more consistent selection of suitable replacement samples.

As depicted in Figure 10, both the attack AUC and accuracy decrease as T𝑇Titalic_T and N𝑁Nitalic_N increase and a more pronounced decrease is observed with larger T𝑇Titalic_T values. However, increasing T𝑇Titalic_T and N𝑁Nitalic_N will also increase the time of sample generation, potentially increasing the latency of inference. This introduces a trade-off: optimizing the defense’s effectiveness versus minimizing inference delay. The balance between improved privacy and reduced latency should be tailored to the requirements of the specific task in practice.

5 Discussion

5.1 Overhead of Different Defenses

TABLE VI: Training and inference overhead comparison of different defenses.For our method, parameters are set to their default values used in our experiments with N=50𝑁50N=50italic_N = 50 and T=40𝑇40T=40italic_T = 40.

Defenses

CIFAR10 CIFAR100 SVHN

Training overhead

Inference overhead

Training overhead

Inference overhead

Training overhead

Inference overhead

Undefended

0.6h

25.8ms

0.6h

24ms

0.4h

24ms

SELENA

10.9h

25.8ms

11.9h

24ms

9.3h

24ms

AdvReg

10.9h

25.8ms

12.6h

24ms

6.8h

24ms

Hamp

0.8h

25.8ms

0.77h

24ms

0.5h

24ms

RelaxLoss

0.53h

25.8ms

0.6h

24ms

0.44h

24ms

DP-SGD

0.83h

25.8ms

0.94h

24ms

0.56h

24ms

Memguard

-

732ms

-

612ms

-

586ms

Ours

-

245ms

-

221ms

-

241ms

We compared the overhead of 7 defense methods. Among them, Hamp, SELENA, RelaxLoss, AdvReg, and DP-SGD are deployed in the training phase, and memguard and our method are deployed in the inference phase. All models are trained with 100 epochs. Times are measured on a single NVIDIA RTX-8000 GPU with 40GB memory.

As shown in Table VI, the training time of the defense deployed in the training phase is 1x similar-to\sim 21x relative to that of the undefended model. Memguard and our method have no impact on the training time. Their overhead is at inference time. Among them, our method outperforms Memguard in terms of running time in the inference phase, as Memguard needs to solve complex optimization problems for each sample.

Note that We used the standard DDIM sampling method in our approach. With the development of fast sampling technology, the overhead of our defense might be further reduced. We leave the use of fast sampling in our defense framework to future work.

6 Limitation

While our method demonstrates significant efficacy in enhancing privacy protection, it is important to acknowledge certain limitations inherent in our approach. A key aspect of our approach is the use of diffusion models. In many cases, off-the-shelf diffusion models are sufficient and can be directly employed without additional training. This availability significantly reduces the resources and time required, making our method more accessible and practical. However, in scenarios where the dataset requiring privacy protection does not have a readily available corresponding diffusion model, training or fine-tuning becomes necessary.

It is essential to note that once fine-tuned, the diffusion model is not limited to a singular application but can be leveraged for various other tasks and scenarios. This versatility offers a significant advantage, as the investment in training the model can be amortized over multiple uses. For instance, a fine-tuned diffusion model on a medical dataset could be utilized not only for privacy protection but also for tasks such as data augmentation, anomaly detection, or even generating synthetic data for research purposes [34, 35].

Nevertheless, the initial requirement of model training or fine-tuning, particularly on private or sensitive datasets, remains a critical factor to consider. This process involves careful handling and management of the data to ensure that the privacy of the data subjects is not compromised during the model development phase.

In summary, while our method requires an initial investment in training or fine-tuning the diffusion model, the subsequent benefits and applications of the model extend well beyond the scope of privacy protection, offering a range of possibilities in data utilization.

7 Related Work

A membership inference attack [4] aims to determine if a specific sample was in a model’s training data, posing risks of sensitive individual information leakage. This section overviews various such attacks and defenses, highlighting their diversity across scenarios.

7.1 Membership Inference Attacks

Shokri et al. [4] introduced a black-box MIA that employs a shadow training technique to train an attack model to differentiate the model’s output, categorizing it as either a member or non-member. In a different approach, Salem et al. [15] streamlined the process by training only a single shadow model, assuming the attacker lacks access to similar distribution data as the training dataset, yet still achieving notable effectiveness. Expanding on these concepts, Nasr et al. [36] presented a white-box MIA targeting ML models. For each data sample, they computed the corresponding gradients over the parameters of the white-box target classifier, utilized as features of the data sample for the purpose of membership inference.

Choquette-Choo et al. [18] developed a label-only MIA concept, where the target model reveals only the predicted label. Their attack’s efficacy relies on the model’s increased resilience to perturbations like augmentations and noise in the training data. Complementing this, Li et al. [19] presented two specific label-only MIAs: the transfer-based MIA and the perturbation-based MIA. Remarkably, these label-only attacks achieve a balanced accuracy on par with that of the shadow-model strategies.

Song [17] employs a modified entropy measure, using shadow models to approximate the distributions of entropy values for members and non-members across each class. Essentially, the attacker conducts a hypothesis test between the distributions of (per-class) members and non-members, given a model f𝑓fitalic_f and a target sample (x,y)𝑥𝑦(x,y)( italic_x , italic_y ). Yeom et al. [16] proposed a loss-based membership inference attack, leveraging the tendency of machine learning models to minimize training loss. This attack identifies training examples by observing lower loss values. Carlini et al. [29] forged ahead with the development of a Likelihood Ratio-based attack (LiRA). This attack has demonstrated success in outperforming previous attacks, especially at low False Positive Rates. In the LiRA approach, an attacker trains N𝑁Nitalic_N shadow models using samples from distribution D𝐷Ditalic_D. Half include the target point (x,y)𝑥𝑦(x,y)( italic_x , italic_y ), and half do not. Two Gaussian fits are applied to these model confidences. The membership inference is then deduced for (x,y)𝑥𝑦(x,y)( italic_x , italic_y ) in the target model using a Likelihood-ratio test based on these fits.

7.2 Existing Defenses

Initial research on defenses against MIA showed that certain regularization techniques, like dropout [37], can curb overfitting, resulting in modest privacy enhancements in neural networks [4]. Another method, early stop** [38], also serves to prevent model overfitting, potentially reducing MIA accuracy, albeit at the cost of compromising model utility.

Several studies have proposed defenses during the training phase. Nasr et al. [5] proposed an adversarial regularization technique, a min-max game-based training algorithm aiming to reduce training loss and increase MIA loss. Shejwalkar [39] introduced a defense against MIAs through knowledge distillation, transferring knowledge from an undefended private-dataset-trained model to another using a public dataset. Tang et al. [8] proposed a knowledge distillation-based defense balancing privacy and utility. They partitioned the training dataset into K subsets, trained K sub-models on each, and used these to train a separate public model with scores from non-training samples. Chen [9] developed RelaxLoss, a training scheme balancing privacy and utility by minimizing loss distribution disparities to reduce membership privacy risks.

Some other defenses are applied separately from the training phase. Jia et al. [6] proposed MemGuard, a method that operates in two stages. It first crafts a noise vector to transform confidence scores into adversarial examples under utility-loss constraints. Then, it integrates this noise into the confidence score vector based on a derived analytical probability. Chen [10] introduced HAMP, a defense mechanism encompassing both training and test-time defenses. Its training component aims to lower model confidence on training samples, countering the overconfidence induced by hard labels in standard training.

Differential privacy (DP) [40, 41] is a prominent method extensively employed to offer theoretical privacy guarantees in ML models. Within this framework, noise can be introduced to both the objective function [42] and gradients [7, 43]. While DP provides robust privacy assurances, it has been observed to significantly compromise utility [5].

8 Conclusion

In this study, we introduced a novel privacy defense method that addresses the challenge of membership inference attacks (MIAs) in machine learning (ML). Our approach effectively diminishes the distinction in prediction behaviors between members and non-members through input reconstruction using diffusion models. By generating multiple reconstructions and selectively utilizing predictions based on defined criteria, our method significantly narrows the prediction distribution gaps that are often exploited in MIAs.

We categorized defenses into three deployment phases and, for the first time, proposed integrating different defenses to enhance overall protection. By combining our method with others, we are able to effectively address both the train-to-test accuracy gap and the prediction distribution gap. Our experiments across various datasets demonstrated the efficacy of our approach in reducing membership privacy leakage, without sacrificing the model’s accuracy.

Our method demonstrates considerable flexibility and effectiveness. However, we recognize its limitations in scenarios where specific diffusion models are not readily available, necessitating additional training or fine-tuning. Despite this, the potential for utilizing fine-tuned diffusion models in various contexts offers a substantial advantage.

In conclusion, our research contributes to the growing body of knowledge in privacy protection in ML, offering a robust, flexible solution to counter MIAs. Future work could further explore optimizing the selection process and investigating its effectiveness against a wider range of privacy threats in ML.

References

  • [1] N. Carlini, C. Liu, Ú. Erlingsson, J. Kos, and D. Song, “The secret sharer: Evaluating and testing unintended memorization in neural networks,” pp. 267–284, 2019.
  • [2] C. Song, T. Ristenpart, and V. Shmatikov, “Machine learning models that remember too much,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS 17), 2017, p. 587–601.
  • [3] K. Leino and M. Fredrikson, “Stolen memories: Leveraging model memorization for calibrated white-box membership inference,” in 29th USENIX Security Symposium (USENIX Security 20), 2020, pp. 1605–1622.
  • [4] R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership inference attacks against machine learning models,” in 2017 IEEE symposium on security and privacy (SP).   IEEE, 2017, pp. 3–18.
  • [5] M. Nasr, R. Shokri, and A. Houmansadr, “Machine learning with membership privacy using adversarial regularization,” in Proceedings of the 2018 ACM SIGSAC conference on computer and communications security, 2018, pp. 634–646.
  • [6] J. Jia, A. Salem, M. Backes, Y. Zhang, and N. Z. Gong, “Memguard: Defending against black-box membership inference attacks via adversarial examples,” in Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, 2019, pp. 259–274.
  • [7] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” in Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, 2016, pp. 308–318.
  • [8] X. Tang, S. Mahloujifar, L. Song, V. Shejwalkar, M. Nasr, A. Houmansadr, and P. Mittal, “Mitigating membership inference attacks by {{\{{Self-Distillation}}\}} through a novel ensemble architecture,” in 31st USENIX Security Symposium (USENIX Security 22), 2022, pp. 1433–1450.
  • [9] D. Chen, N. Yu, and M. Fritz, “Relaxloss: Defending membership inference attacks without losing utility,” arXiv preprint arXiv:2207.05801, 2022.
  • [10] Z. Chen and K. Pattabiraman, “Overconfidence is a dangerous thing: Mitigating membership inference attacks by enforcing less confident prediction,” arXiv preprint arXiv:2307.01610, 2023.
  • [11] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.
  • [12] P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” Advances in neural information processing systems, vol. 34, pp. 8780–8794, 2021.
  • [13] W. Nie, B. Guo, Y. Huang, C. Xiao, A. Vahdat, and A. Anandkumar, “Diffusion models for adversarial purification,” in International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, ser. Proceedings of Machine Learning Research, vol. 162.   PMLR, 2022, pp. 16 805–16 827.
  • [14] X. He, Z. Li, W. Xu, C. Cornelius, and Y. Zhang, “Membership-doctor: Comprehensive assessment of membership inference against machine learning models,” 2022.
  • [15] A. Salem, Y. Zhang, M. Humbert, P. Berrang, M. Fritz, and M. Backes, “Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models,” arXiv preprint arXiv:1806.01246, 2018.
  • [16] S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha, “Privacy risk in machine learning: Analyzing the connection to overfitting,” in 2018 IEEE 31st computer security foundations symposium (CSF).   IEEE, 2018, pp. 268–282.
  • [17] L. Song and P. Mittal, “Systematic evaluation of privacy risks of machine learning models,” in 30th USENIX Security Symposium (USENIX Security 21), 2021, pp. 2615–2632.
  • [18] C. A. Choquette-Choo, F. Tramer, N. Carlini, and N. Papernot, “Label-only membership inference attacks,” in International conference on machine learning.   PMLR, 2021, pp. 1964–1974.
  • [19] Z. Li and Y. Zhang, “Membership leakage in label-only exposures,” in Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, 2021, pp. 880–895.
  • [20] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020.
  • [21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  • [22] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700–4708.
  • [23] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
  • [24] A. Brock, J. Donahue, and K. Simonyan, “Large scale gan training for high fidelity natural image synthesis,” arXiv preprint arXiv:1809.11096, 2018.
  • [25] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of stylegan,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 8110–8119.
  • [26] D. P. Kingma and P. Dhariwal, “Glow: Generative flow with invertible 1x1 convolutions,” Advances in neural information processing systems, vol. 31, 2018.
  • [27] A. Vahdat and J. Kautz, “Nvae: A deep hierarchical variational autoencoder,” Advances in neural information processing systems, vol. 33, pp. 19 667–19 679, 2020.
  • [28] J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” in International conference on machine learning.   PMLR, 2015, pp. 2256–2265.
  • [29] N. Carlini, S. Chien, M. Nasr, S. Song, A. Terzis, and F. Tramer, “Membership inference attacks from first principles,” in 2022 IEEE Symposium on Security and Privacy (SP).   IEEE, 2022, pp. 1897–1914.
  • [30] X. Yang, D. Zhou, J. Feng, and X. Wang, “Diffusion probabilistic model made slim,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023.   IEEE, 2023, pp. 22 552–22 562. [Online]. Available: https://doi.org/10.1109/CVPR52729.2023.02160
  • [31] J. Lin, “Divergence measures based on the shannon entropy,” IEEE Transactions on Information theory, vol. 37, no. 1, pp. 145–151, 1991.
  • [32] A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
  • [33] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits in natural images with unsupervised feature learning,” 2011.
  • [34] B. Trabucco, K. Doherty, M. Gurinas, and R. Salakhutdinov, “Effective data augmentation with diffusion models,” 2023.
  • [35] J. Wolleb, F. Bieder, R. Sandkühler, and P. C. Cattin, “Diffusion models for medical anomaly detection,” in International Conference on Medical image computing and computer-assisted intervention.   Springer, 2022, pp. 35–45.
  • [36] M. Nasr, R. Shokri, and A. Houmansadr, “Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning,” in 2019 IEEE symposium on security and privacy (SP).   IEEE, 2019, pp. 739–753.
  • [37] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014.
  • [38] R. Caruana, S. Lawrence, and C. Giles, “Overfitting in neural nets: Backpropagation, conjugate gradient, and early stop**,” Advances in neural information processing systems, vol. 13, 2000.
  • [39] V. Shejwalkar and A. Houmansadr, “Membership privacy for machine learning models through knowledge transfer,” in Proceedings of the AAAI conference on artificial intelligence, vol. 35, no. 11, 2021, pp. 9549–9557.
  • [40] C. Dwork, “Differential privacy,” in International colloquium on automata, languages, and programming.   Springer, 2006, pp. 1–12.
  • [41] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3.   Springer, 2006, pp. 265–284.
  • [42] R. Iyengar, J. P. Near, D. Song, O. Thakkar, A. Thakurta, and L. Wang, “Towards practical differentially private convex optimization,” in 2019 IEEE Symposium on Security and Privacy (SP).   IEEE, 2019, pp. 299–316.
  • [43] S. Song, K. Chaudhuri, and A. D. Sarwate, “Stochastic gradient descent with differentially private updates,” in 2013 IEEE global conference on signal and information processing.   IEEE, 2013, pp. 245–248.
O6MBhqUIBGk8Hn8HAOVy+T+XLJfLS4ZhTiRJgqIoVBRFIoric47jPnmeB1mW/9rr9ZpSSn3Lsmir1fJZlqWlUonKsvwWwD8ymc/nXwVBeLjf7xEKhdBut9Hr9WgmkyGEkJwsy5eHG5vN5g0AKIoCAEgkEkin0wQAfN9/cXPdheu6P33fBwB4ngcAcByHJpPJl+fn54mD3Gg0NrquXxeLRQAAwzAYj8cwTZPwPH9/sVg8PXweDAauqqr2cDjEer1GJBLBZDJBs9mE4zjwfZ85lAGg2+06hmGgXq+j3+/DsixYlgVN03a9Xu8jgCNCyIegIAgx13Vfd7vdu+FweG8YRkjXdWy329+dTgeSJD3ieZ7RNO0VAXAPwDEAO5VKndi2fWrb9jWl9Esul6PZbDY9Go1OZ7PZ9z/lyuD3OozU2wAAAABJRU5ErkJggg==" alt="[LOGO]">