11institutetext: The Chinese University of Hong Kong Shenzhen, Guangdong, China 22institutetext: Swiss Federal Institute of Technology Lausanne (EPFL), Lausanne, Switzerland

DataCook: Crafting Anti-Adversarial Examples for Healthcare Data Copyright Protection

Sihan Shang 11    Jiancheng Yang 22    Zhenglong Sun 11    Pascal Fua 22
Abstract

In the realm of healthcare, the challenges of copyright protection and unauthorized third-party misuse are increasingly significant. Traditional methods for data copyright protection are applied prior to data distribution, implying that models trained on these data become uncontrollable. This paper introduces a novel approach, named DataCook, designed to safeguard the copyright of healthcare data during the deployment phase. DataCook operates by “cooking” the raw data before distribution, enabling the development of models that perform normally on this processed data. However, during the deployment phase, the original test data must be also “cooked” through DataCook to ensure normal model performance. This process grants copyright holders control over authorization during the deployment phase. The mechanism behind DataCook is by crafting anti-adversarial examples (AntiAdv), which are designed to enhance model confidence, as opposed to standard adversarial examples (Adv) that aim to confuse models. Similar to Adv, AntiAdv introduces imperceptible perturbations, ensuring that the data processed by DataCook remains easily understandable. We conducted extensive experiments on MedMNIST datasets, encompassing both 2D/3D data and the high-resolution variants. The outcomes indicate that DataCook effectively meets its objectives, preventing models trained on AntiAdv from analyzing unauthorized data effectively, without compromising the validity and accuracy of the data in legitimate scenarios. Code and data are available at https://github.com/MedMNIST/DataCook.

Keywords:
Data Security Copyright Protection Data-Centric ML.

1 Introduction

In the healthcare field, the rapid development and widespread application of deep learning technology have brought the issues of healthcare data misuse and copyright protection into sharp focus. Healthcare data sensitivity and confidentiality necessitate not only the protection of patients’ privacy rights but also the safeguarding of data providers’ and creators’ intellectual property rights [36, 48, 49]. Current data copyright protection mechanisms, such as data encryption[42, 43], anonymization [37] and Digital Rights Management [38, 41], although can prevent illegal copying and dissemination, yet they do not address the potential data extraction and generalization capabilities of deep learning models, making it difficult to effectively tackle the misuse of data in unauthorized training.

As illustrated in Fig. 1, watermarking is a widely adopted approach for copyright protection that embeds distinct watermarks into images to offer visual security. This method includes specific modifications designed to prevent unauthorized deep learning models from analyzing the images effectively [39]. However, this technique reduces the images’ usability for legitimate training purposes which means the model trained on watermarking data has crush perform when deploying the model for original test data or watermarking test data. To restore data value for model deployment, it is necessary to remove the watermarks for model development. However, this leaves the data as vulnerable as those without any watermarking, exposing them to the risk of unauthorized use and data breaches. The challenge lies in protecting the data from misuse in unauthorized training scenarios without compromising its deployment utility. To tackle this issue, we introduce a novel setting DataCook aimed at resolving these challenges.

Refer to caption
Figure 1: A Comparison of Data Copyright Protection. Unprotected: Unprotected raw data can be directly utilized by unauthorized third parties for model development and deployment phases. Watermarking: Users receive watermarked data and authorized removal methods to obtain raw data for model development. yet post-removal, raw data remains at risk of breach and access by unauthorized third parties. DataCook: Users receive DataCook-processed data for model development. If data is breached, Models trained on this processed data are designed to work solely with datasets through DataCook-processed.

DataCook, leverages a novel strategy of generating anti-adversarial examples to safeguard healthcare data copyright during the model deployment. Essentially, DataCook operates by “cooking” the raw data before distribution, enabling the development of models that perform normally on this processed data. However, during the deployment phase, the original test data must be also “cooked” through DataCook to ensure normal model performance. This process grants copyright holders control over authorization during the deployment phase. The analogy inspiring DataCook’s name draws from the preference for cooked over raw food observed in daily life, symbolizing how processed data retains its value for authorized uses but becomes indigestible to models not adapted to its original form. Through experiments on the publicly available MedMNIST [9, 10] medical image dataset, we have demonstrated that DataCook effectively prevents unauthorized deep learning models from accurately recognizing raw data, while preserving the original characteristics and utility of the data. This not only showcases the potential application of anti-adversarial examples in the copyright protection of healthcare data but also paves a new path for future work in the protection of healthcare data privacy.

2 Method

2.1 Background

To clearly present DataCook, we contrast it with unprotected data and watermarking [39] as shown in Fig. 1, highlighting DataCook’s difference and advantages. To better explain the comparison, We define a raw image dataset as 𝒟r={(xir,yi)}i=1Nsuperscript𝒟𝑟superscriptsubscriptsubscriptsuperscript𝑥𝑟𝑖subscript𝑦𝑖𝑖1𝑁\mathcal{D}^{r}=\{(x^{r}_{i},y_{i})\}_{i=1}^{N}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT = { ( italic_x start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, where the input space 𝒳r={xir}i=1Ndsuperscript𝒳𝑟superscriptsubscriptsubscriptsuperscript𝑥𝑟𝑖𝑖1𝑁superscript𝑑\mathcal{X}^{r}=\{x^{r}_{i}\}_{i=1}^{N}\subset\mathbb{R}^{d}caligraphic_X start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT = { italic_x start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT corresponds to the raw images and the label space 𝒴={yi}i=1N={1,,c}𝒴superscriptsubscriptsubscript𝑦𝑖𝑖1𝑁1𝑐\mathcal{Y}=\{y_{i}\}_{i=1}^{N}=\{1,\ldots,c\}caligraphic_Y = { italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT = { 1 , … , italic_c } represents their associated labels. Here, N𝑁Nitalic_N denotes the number of samples, drawn from the distribution 𝒟rsuperscript𝒟𝑟\mathcal{D}^{r}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT, covering the combined space of inputs and labels, 𝒳r×𝒴superscript𝒳𝑟𝒴\mathcal{X}^{r}\times\mathcal{Y}caligraphic_X start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT × caligraphic_Y. A classification model fr=fθrr:𝒳rc:superscript𝑓𝑟subscriptsuperscript𝑓𝑟superscript𝜃𝑟superscript𝒳𝑟superscript𝑐f^{r}=f^{r}_{\theta^{r}}:\mathcal{X}^{r}\rightarrow\mathbb{R}^{c}italic_f start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT = italic_f start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT : caligraphic_X start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT with logits output is parameterized by θrsuperscript𝜃𝑟\theta^{r}italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT, optimized based on 𝒟rsuperscript𝒟𝑟\mathcal{D}^{r}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT. This process is denoted as θr=θr({𝒳r,𝒴})superscript𝜃𝑟superscript𝜃𝑟superscript𝒳𝑟𝒴\theta^{r}=\theta^{r}(\{\mathcal{X}^{r},\mathcal{Y}\})italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT = italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ( { caligraphic_X start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , caligraphic_Y } ). Additionally, a protected image dataset, derived from the raw image dataset via specific processing steps, introduces a distribution 𝒟psuperscript𝒟𝑝\mathcal{D}^{p}caligraphic_D start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT. This protected image dataset maintains the same label space 𝒴𝒴\mathcal{Y}caligraphic_Y but occupies a modified input space 𝒳p={xip}i=1Ndsuperscript𝒳𝑝superscriptsubscriptsubscriptsuperscript𝑥𝑝𝑖𝑖1𝑁superscript𝑑\mathcal{X}^{p}=\{x^{p}_{i}\}_{i=1}^{N}\subset\mathbb{R}^{d}caligraphic_X start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT = { italic_x start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, ensuring that the dataset spans the domain 𝒳p×𝒴superscript𝒳𝑝𝒴\mathcal{X}^{p}\times\mathcal{Y}caligraphic_X start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT × caligraphic_Y. A corresponding classification model fθpp:𝒳pc:subscriptsuperscript𝑓𝑝superscript𝜃𝑝superscript𝒳𝑝superscript𝑐f^{p}_{\theta^{p}}:\mathcal{X}^{p}\rightarrow\mathbb{R}^{c}italic_f start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_POSTSUBSCRIPT : caligraphic_X start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT with logits output, parameterized by θp=θp({𝒳p,𝒴})superscript𝜃𝑝superscript𝜃𝑝superscript𝒳𝑝𝒴\theta^{p}=\theta^{p}(\{\mathcal{X}^{p},\mathcal{Y}\})italic_θ start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT = italic_θ start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( { caligraphic_X start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT , caligraphic_Y } ), is trained on 𝒟psuperscript𝒟𝑝\mathcal{D}^{p}caligraphic_D start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT and denoted as fpsuperscript𝑓𝑝f^{p}italic_f start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT.

Unprotected: Raw data can be directly utilized by unauthorized third parties for model development and deployment phases.

Watermarking [39]: Users receive watermarked data along with authorized methods for watermark removal to access the raw data. In the deployment stage, if model fpsuperscript𝑓𝑝f^{p}italic_f start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT is trained on watermark-protected data, model fpsuperscript𝑓𝑝f^{p}italic_f start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT exhibits an inability to classify both watermark-test data and raw test data effectively (crush performance). To preserve the model’s classification capability (normal performance), watermarks must be removed before model development. Once the watermark is removed and occurs data breaches, the data is vulnerable without any protective measures.

DataCook: The proposed DataCook transforms raw data into protected data before dataset distribution. In the deployment stage, models fpsuperscript𝑓𝑝f^{p}italic_f start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT trained on DataCook-protected data achieve normal performance on DataCook-test data but have crush performance on raw test data. Even occurs a data breach after distribution and unauthorized third-party use leaks data for model development, new data still need DataCook processed to ensure the model’s classification capability in the model development stage where only data from authorized users can undergo DataCook processing. This process grants copyright holders control over authorization during the deployment phase and effectively mitigates data breach risks.

2.2 DataCook: Problem Setting

To achieve DataCook, the primary objective is to minimize the performance of model fpsuperscript𝑓𝑝f^{p}italic_f start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT when it is evaluated on the raw image dataset 𝒟rsuperscript𝒟𝑟\mathcal{D}^{r}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT to prioritize data copyright protection. Concurrently, it is vital to ensure that the performance of model fpsuperscript𝑓𝑝f^{p}italic_f start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT on the protected image dataset 𝒟psuperscript𝒟𝑝\mathcal{D}^{p}caligraphic_D start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT closely matches the performance of model frsuperscript𝑓𝑟f^{r}italic_f start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT on the raw image dataset 𝒟psuperscript𝒟𝑝\mathcal{D}^{p}caligraphic_D start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT to preserves model normal performance. Additionally, to ensure similarity, the distance between the raw image space 𝒳rsuperscript𝒳𝑟\mathcal{X}^{r}caligraphic_X start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT and the protected image space 𝒳psuperscript𝒳𝑝\mathcal{X}^{p}caligraphic_X start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT must be minimized. we adopted the Structural Similarity (SSIM) index as a distance metric between the raw image space and the protected image space, aiming to minimize the intrusiveness of the transformation. Thus, The optimization task can be formalized as:

minimize𝒳psuperscript𝒳𝑝minimize\displaystyle\underset{\mathcal{X}^{p}}{\text{minimize}}start_UNDERACCENT caligraphic_X start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG minimize end_ARG (fp,𝒟r)superscript𝑓𝑝superscript𝒟𝑟\displaystyle\mathcal{E}(f^{p},\mathcal{D}^{r})caligraphic_E ( italic_f start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) (1)
w.r.t. |(fp,𝒟p)(fr,𝒟r)|ϵ,superscript𝑓𝑝superscript𝒟𝑝superscript𝑓𝑟superscript𝒟𝑟italic-ϵ\displaystyle|\mathcal{E}(f^{p},\mathcal{D}^{p})-\mathcal{E}(f^{r},\mathcal{D}% ^{r})|\leq\epsilon,| caligraphic_E ( italic_f start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ) - caligraphic_E ( italic_f start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) | ≤ italic_ϵ , (2)
d(𝒳r,𝒳p)α𝑑superscript𝒳𝑟superscript𝒳𝑝𝛼\displaystyle d(\mathcal{X}^{r},\mathcal{X}^{p})\leq\alphaitalic_d ( caligraphic_X start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , caligraphic_X start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ) ≤ italic_α (3)

where (f,𝒟)𝑓𝒟\mathcal{E}(f,\mathcal{D})caligraphic_E ( italic_f , caligraphic_D ) denotes the evaluation metric for model f𝑓fitalic_f on dataset 𝒟𝒟\mathcal{D}caligraphic_D; α𝛼\alphaitalic_α is a threshold for the distance between 𝒳rsuperscript𝒳𝑟\mathcal{X}^{r}caligraphic_X start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT and 𝒳psuperscript𝒳𝑝\mathcal{X}^{p}caligraphic_X start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT; ϵitalic-ϵ\epsilonitalic_ϵ Represents an acceptable performance discrepancy, ideally zero; fp=fθppsuperscript𝑓𝑝subscriptsuperscript𝑓𝑝superscript𝜃𝑝f^{p}=f^{p}_{\theta^{p}}italic_f start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT = italic_f start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is parameterized by θp=θp({𝒳p,𝒴})superscript𝜃𝑝superscript𝜃𝑝superscript𝒳𝑝𝒴\theta^{p}=\theta^{p}(\{\mathcal{X}^{p},\mathcal{Y}\})italic_θ start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT = italic_θ start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( { caligraphic_X start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT , caligraphic_Y } ).

2.3 Crafting Anti-Adversarial Examples

Our strategy capitalizes on the potential of adversarial examples, which we have found to be particularly suited for resolving this optimization task. Adversarial examples are crafted by introducing a subtle perturbation ζ𝜁\zetaitalic_ζ to the input raw image xirsubscriptsuperscript𝑥𝑟𝑖x^{r}_{i}italic_x start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as shown in Eq. 4, aimed at minimally yet significantly impacting the model’s prediction accuracy [47].

xip=argmaxζ(fθr(xir+ζ),y^)subscriptsuperscript𝑥𝑝𝑖subscript𝜁subscriptsuperscript𝑓𝑟𝜃subscriptsuperscript𝑥𝑟𝑖𝜁^𝑦x^{p}_{i}=\arg\max_{\zeta}\mathcal{L}(f^{r}_{\theta}(x^{r}_{i}+\zeta),\hat{y})\ italic_x start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_arg roman_max start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT caligraphic_L ( italic_f start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_ζ ) , over^ start_ARG italic_y end_ARG ) (4)

Perturbations are informed by data gradients, making them a targeted feature rather than mere noise [28]. Their subtlety and difficulty in detection ensure that adversarial examples satisfy the similarity criteria in Eq. 3 and adversarial examples serving in adversarial training to bolster model robustness [31, 46, 45, 2] which align with Eq. 2 to maintain model performance. However, the potential of adversarial training to reduce model performance led us to develop the concept of anti-adversarial examples [40, 44]. Contrary to traditional adversarial examples that aim to degrade performance, anti-adversarial examples are designed to improve it, directly addressing the criteria of Eq. 2. Our method confronts the complex challenge delineated by Eq. 1, with forthcoming experiments aimed at validating that our anti-adversarial examples not only adhere to the guidelines of Eq. 2 and Eq. 3 but also achieve the overarching goals of Eq. 1.

As illustrated in Fig. 2, we detail the process of generating anti-adversarial examples. Firstly, employing a pre-trained surrogate model frsuperscript𝑓𝑟f^{r}italic_f start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT to predict the output of a raw image xirsubscriptsuperscript𝑥𝑟𝑖x^{r}_{i}italic_x start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and use the highest probability output as a pseudo-label y^isubscript^𝑦𝑖\hat{y}_{i}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, anti-adversarial examples are crafted by maximizing the logistic probability of the pseudo label as shown in Eq. 5, contrasting with adversarial examples that aim to minimize this probability in Eq. 4:

xip=argminζ(fθr(xir+ζ),y^)subscriptsuperscript𝑥𝑝𝑖subscript𝜁subscriptsuperscript𝑓𝑟𝜃subscriptsuperscript𝑥𝑟𝑖𝜁^𝑦x^{p}_{i}=\arg\min_{\zeta}\mathcal{L}(f^{r}_{\theta}(x^{r}_{i}+\zeta),\hat{y})\ italic_x start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT caligraphic_L ( italic_f start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_ζ ) , over^ start_ARG italic_y end_ARG ) (5)

where \mathcal{L}caligraphic_L is the loss function tailored to maximize the logistic probability of y^isubscript^𝑦𝑖\hat{y}_{i}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT directly. We set SSIM threshold of 0.8 to maintain visual similarity.

Refer to caption
Figure 2: Generating Anti-Adversarial Examples in DataCook. Utilizing the initial highest logic probability output from the surrogate model as a pseudo label y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG serves as the direction for anti-adversarial attack iterations under the constraints of termination conditions.

3 Experiments

In this section, we evaluate our DataCook in 2D datasets(28×28282828\times 2828 × 28) , 3D datasets(28×28×2828282828\times 28\times 2828 × 28 × 28) and high-resolution datasets(224×224224224224\times 224224 × 224). Our experiment results show that the anti-adversarial examples crafted through DataCook fulfill the objectives of the optimization task. To assess the effectiveness of anti-adversarial examples, we designed comparative experiments using various optimization conditions.

3.1 Setup

Evaluation Metrics.

Our evaluation of anti-adversarial examples hinges on two key metrics: Copyright Protection (CP) and Performance Preservation (PP). CP represent the definition of Eq.1. To facilitate effect comparison, CP is determined as (fp,𝒟r)(fr,𝒟r)superscript𝑓𝑝superscript𝒟𝑟superscript𝑓𝑟superscript𝒟𝑟\mathcal{E}(f^{p},\mathcal{D}^{r})-\mathcal{E}(f^{r},\mathcal{D}^{r})caligraphic_E ( italic_f start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) - caligraphic_E ( italic_f start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) and lower values indicate better protection. PP outlined in Eq. 2, we define PP = -|(fp,𝒟p)(fr,𝒟r)|superscript𝑓𝑝superscript𝒟𝑝superscript𝑓𝑟superscript𝒟𝑟|\mathcal{E}(f^{p},\mathcal{D}^{p})-\mathcal{E}(f^{r},\mathcal{D}^{r})|| caligraphic_E ( italic_f start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ) - caligraphic_E ( italic_f start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) | and values nearing to zero denoting optimal model performance preservation. We use the Area Under the ROC Curve (AUC)[26] and accuracy (ACC) for quantitative analysis. Experiment results show alignment between AUC and ACC, but here we focus on ACC due to space limitations, with detailed AUC available in supplementary materials.

Method Comparison.

To quantitatively evaluate DataCook’s performance, we utilize several benchmarking methods for comparison: Random Noise: Introducing random Gaussian noise as a basic benchmark; Adversarial Target: This theoretical benchmark, represented as an oracle state with provided true data labels, aims to delineate the upper limits of performance, particularly infeasible during deployment; Adv/AntiAdv: Differentiating adversarial from anti-adversarial examples to clarify the practical implications of each strategy.

Datasets.

To rigorously evaluate our method, we utilize the MedMNIST dataset, a comprehensive repository of pre-processed medical images available for public use, cited from both medmnistv2 and medmnistv1. MedMNIST is tailored for benchmarking machine learning models on a wide array of medical image analysis tasks, encompassing diverse data types such as 2D, 3D, and high-resolution images. This diversity positions MedMNIST as an exemplary dataset for testing the efficacy and versatility of our approach across varied medical imaging contexts. Our evaluation strategy involves computing the average performance across all MedMNIST subsets, offering a detailed assessment of our method’s capability in the sophisticated domain of medical image analysis, ensuring both logical coherence and succinctness in our validation process.

Configuration.

Our experiments leveraged PyTorch 2.0  [1] for evaluating wide applicability of DataCook across diverse model architectures, including ResNet18, ResNet50 [5], VGG16 [7] and ConvNet [6]. Training for 2D, 3D, and high-resolution datasets involved 200 epochs using Stochastic Gradient Descent [24], with batch size of 128, momentum of 0.9, and learning rate of 1×1031superscript1031\times 10^{-3}1 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT. For protected dataset generation, a uniform learning rate of 5×1035superscript1035\times 10^{-3}5 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT was applied. All protected data achieved an SSIM over 0.8 to the corresponding raw data.

3.2 Quantitative Experiments on 2D Data

Settings.

The MedMNIST 2D dataset includes 12 subsets, ranging from 100 to over 100,000 images. We analyzed 11 of these datasets—excluding Chest [12] due to its multi-label focus, which our method doesn’t currently support. The datasets include Path [11], Derma [13, 14], OCT [15], Pneumonia, Retina, Breast [16], Blood [17], Tissue [18], OrganA, OrganC, and OrganS. Despite the limitation, we foresee potential future adaptation to multi-label tasks. In addition to standard 2D datasets, MedMNIST also provides high-resolution versions 224×224224224224\times 224224 × 224 of 2D datasets. Given computational constraints, we particularly focused on Blood [17], Retina, OrganA, Breast [16], and Pneumonia for deeper investigation.

Table 1: Protection Results on 2D datasets. We evaluate CP and PP on the 2D datasets comparing various models. The reported values represent the average percentage change in ACC of CP and PP across all 2D datasets; Underline data entries highlight key comparative findings.
ResNet-18 ResNet-50 VGG-16 ConvNext-t
Method CP\downarrow PP\uparrow CP\downarrow PP\uparrow CP\downarrow PP\uparrow CP\downarrow PP\uparrow
Random Noise -2.65 -1.61 -4.43 -2.01 -3.43 -1.92 -1.05 -2.08
Adv Oracle -35.17 -17.66 -16.52 -7.44 -42.58 -12.79 -39.59 -32.21
Pseudo -27.02 -1.58 -11.99 -4.51 -24.95 -2.41 -18.37 -3.54
AntiAdv Oracle -13.15 -22.14 -7.68 -16.45 -12.27 -18.33 -7.91 -32.4
Pseudo -18.78 -0.67 -9.63 -0.91 -16.1 -0.56 -13.8 -1.59

Results.

As shown in Tab. 1, our analysis reveals three key insights. First, perturbations introduced by DataCook significantly outperform random noise in enhancing copyright protection, effectively demonstrating that Anti Adv examples can achieve the optimization objectives outlined in Eq. 1. Second, the Oracle state, despite offering better copyright protection, substantially impairs model performance as shown in data with underline, underscoring a crucial trade-off. This finding affirms that the use of pseudo-labels does not compromise model performance. Third, the juxtaposition of Adv and Anti Adv examples indicates that while Anti Adv examples may provide marginally less robust copyright protection, they excel in preserving model performance. This aligns with our main goal of copyright protection without impacting model performance.

Ablation on Optimization Procedure.

Our study includes ablation experiments focusing on optimization procedures, which examine the impact of external adversarial targets, various loss functions, and optimizers. The results are depicted in supplementary materials. We show the current method to craft anti-adversarial examples by pseudo label is the best.

Generalize to High-Resolution Data.

We also conducted tests on the DataCook using high-resolution data, with the results detailed in the supplementary materials. These outcomes are consistent with those obtained from 2D data, demonstrating theDataCook robustness and applicability across different data formats.

3.3 Quantitative Experiments on 3D Data

Settings.

The MedMNIST 3D datasets include six subsets, ranging from from 1,500 to 200 images. Given computational constraints, we focused on OrganMNIST3D [19], NoduleMNIST3D, and FractureMNIST3D.

Table 2: Protection Results on 3D data. We evaluate CP and PP on the 3D data datasets comparing various model. The reported values represent the average change in ACC of CP and PP across all 3D datasets. Underline data entries highlight key comparative findings.
ResNet-18 ResNet-50 VGG-16 ConvNext-t
Method CP\downarrow PP\uparrow CP\downarrow PP\uparrow CP\downarrow PP\uparrow CP\downarrow PP\uparrow
Random Noisy -2.94 -3.13 -3.6 -1.2 -1.87 -1.43 0.67 -0.53
Adv Oracle -6.03 -23.93 -6.97 -20.01 -33.37 -21.93 -17.41 -38.81
Pseudo -1.44 -2.16 1.26 -2.11 -13.61 -3.96 -8.81 -2.63
AntiAdv Oracle -10.44 -26.23 -9.71 -27.23 -11.11 -24.31 0.43 -38.96
Pseudo -18.67 -1.75 -1.81 -1.35 -26.7 -0.91 -15.27 -1.01

Results.

Tab. 2 presents the outcomes from the 3D datasets, revealing trends consistent with those found in the 2D counterparts, yet with distinct nuances. Notably, Anti Adv examples demonstrate superior performance in CP and PP over Adv examples in larger datasets as shown in data with underline, a phenomenon more evident that Anti Adv example can deepen its effects in the context of 3D data. However, this protective efficacy diminishes in deeper architectures like ResNet50, hinting that such networks might overlook perturbations in 3D data. This observation suggests an avenue for enhancement in future research. Intriguingly, pseudo labels yield better results than the Oracle in ResNet18 and ConvNet models, suggesting that providing data labels might lead to reduced performance with 3D datasets, highlighting the complexity of model-data interactions in 3D spaces.

Analysis on DataCook-Generated Perturbation

As illustrated in Fig. 3. The selected model exhibits identical training and testing results across raw, protected, and perturbed data, and this effect persists across other models as well. Firstly, this indicates a stable capacity for performance perseverance. Secondly, it suggests that perturbation serves as an effective feature, impacting both model learning and performance. In comparison to protected data, raw data lacks the effective feature of perturbation, which is essential for validating Eq. 1. The bottom figure shows that despite the introduction of protective perturbations, the visual alterations to protected data are subtle to the human eye. This reveals that DataCook, through its integration of label information via classifiable perturbations, does not degrade model performance or visual perceptibly, contrasting with data compression methods[34, 35] that might compromise the integrity of raw data. Our analysis demonstrates that DataCook is designed to offer complete confidentiality and obscurity, enabling its use without detection by others, thus significantly increasing its practical utility.

Refer to caption
Figure 3: Top: The figure presents a comparison among the performance outcomes raw data, protected data, and perturbation on convnet. Bottom: Raw data, Perturbation, and protected data comparison.

4 Conclusion

In conclusion, this study introduces DataCook, a pioneering approach for healthcare data copyright protection during the deployment phase, effectively addressing the challenges of unauthorized third-party misuse. Through the innovative use of anti-adversarial examples, DataCook ensures that data remains protected and only accessible by authorized users, without compromising data integrity and model performance.

References

  • [1] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” vol. 32, 2019.
  • [2] J. Yang, Y. Jiang, X. Huang, B. Ni, and C. Zhao, “Learning black-box attackers with transferable priors and query feedback,” vol. 33, pp. 12 288–12 299, 2020.
  • [3] J. Ma, Y. Zhang, S. Gu, C. Zhu, C. Ge, Y. Zhang, X. An, C. Wang, Q. Wang, X. Liu, S. Cao, Q. Zhang, S. Liu, Y. Wang, Y. Li, J. He, and X. Yang, “Abdomenct-1k: Is abdominal organ segmentation a solved problem?” 2021.
  • [4] J. Yang, X. Huang, Y. He, J. Xu, C. Yang, G. Xu, and B. Ni, “Reinventing 2d convolutions for 3d images,” vol. 25, no. 8, pp. 3009–3018, 2021.
  • [5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2016, pp. 770–778.
  • [6] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” 2022, pp. 11 976–11 986.
  • [7] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International conference on learning representations, 2015.
  • [8] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” International Conference on Learning Representations, 2021.
  • [9] J. Yang, R. Shi, D. Wei, Z. Liu, L. Zhao, B. Ke, H. Pfister, and B. Ni, “Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification,” Scientific Data, vol. 10, no. 1, p. 41, 2023.
  • [10] J. Yang, R. Shi, and B. Ni, “Medmnist classification decathlon: A lightweight automl benchmark for medical image analysis,” in IEEE 18th International Symposium on Biomedical Imaging (ISBI), 2021, pp. 191–195.
  • [11] J. N. Kather, J. Krisam et al., “Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study,” PLOS Medicine, vol. 16, no. 1, pp. 1–22, 01 2019.
  • [12] X. Wang, Y. Peng et al., “Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases,” in CVPR, 2017, pp. 3462–3471.
  • [13] P. Tschandl, C. Rosendahl, and H. Kittler, “The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions,” Scientific data, p. 180161, 2018.
  • [14] N. Codella, V. Rotemberg, P. Tschandl, M. E. Celebi, S. Dusza, D. Gutman, B. Helba, A. Kalloo, K. Liopyris, M. Marchetti et al., “Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic),” arXiv preprint arXiv:1902.03368, 2019.
  • [15] D. S. Kermany, M. Goldbaum et al., “Identifying medical diagnoses and treatable diseases by image-based deep learning,” Cell, vol. 172, no. 5, pp. 1122 – 1131.e9, 2018.
  • [16] W. Al-Dhabyani, M. Gomaa, H. Khaled, and A. Fahmy, “Dataset of breast ultrasound images,” Data in Brief, vol. 28, p. 104863, 2020.
  • [17] A. Acevedo, A. Merino, S. Alférez, Ángel Molina, L. Boldú, and J. Rodellar, “A dataset of microscopic peripheral blood cell images for development of automatic recognition systems,” Data in Brief, vol. 30, p. 105474, 2020.
  • [18] V. Ljosa, K. L. Sokolnicki, and A. E. Carpenter, “Annotated high-throughput microscopy image sets for validation.” Nature methods, vol. 9, no. 7, pp. 637–637, 2012.
  • [19] P. Bilic, P. F. Christ et al., “The liver tumor segmentation benchmark (lits),” CoRR, vol. abs/1901.04056, 2019.
  • [20] X. Xu, F. Zhou et al., “Efficient multiple organ localization in ct image using 3d region proposal network,” IEEE Transactions on Medical Imaging, vol. 38, no. 8, pp. 1885–1898, 2019.
  • [21] S. G. Armato III, G. McLennan, L. Bidaut, M. F. McNitt-Gray, C. R. Meyer, A. P. Reeves, B. Zhao, D. R. Aberle, C. I. Henschke, E. A. Hoffman, E. A. Kazerooni, H. MacMahon, E. J. R. van Beek, D. Yankelevitz, A. M. Biancardi, P. H. Bland, M. S. Brown, R. M. Engelmann, G. E. Laderach, D. Max, R. C. Pais, D. P.-Y. Qing, R. Y. Roberts, A. R. Smith, A. Starkey, P. Batra, P. Caligiuri, A. Farooqi, G. W. Gladish, C. M. Jude, R. F. Munden, I. Petkovska, L. E. Quint, L. H. Schwartz, B. Sundaram, L. E. Dodd, C. Fenimore, D. Gur, N. Petrick, J. Freymann, J. Kirby, B. Hughes, A. Vande Casteele, S. Gupte, M. Sallam, M. D. Heath, M. H. Kuhn, E. Dharaiya, R. Burns, D. S. Fryd, M. Salganicoff, V. Anand, U. Shreter, S. Vastagh, B. Y. Croft, and L. P. Clarke, “The lung image database consortium (lidc) and image database resource initiative (idri): A completed reference database of lung nodules on ct scans,” Medical Physics, vol. 38, no. 2, pp. 915–931, 2011.
  • [22] L. **, J. Yang, K. Kuang, B. Ni, Y. Gao, Y. Sun, P. Gao, W. Ma, M. Tan, H. Kang, J. Chen, and M. Li, “Deep-learning-assisted detection and segmentation of rib fractures from ct scans: Development and validation of fracnet,” EBioMedicine, vol. 62, p. 103106, 2020.
  • [23] X. Yang, D. Xia, T. Kin, and T. Igarashi, “Intra: 3d intracranial aneurysm dataset for deep learning,” in CVPR, June 2020.
  • [24] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance of initialization and momentum in deep learning,” in Proceedings of the 30th International Conference on Machine Learning (ICML-13), 2013, pp. 1139–1147.
  • [25] R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in IJCAI, vol. 14, no. 2, 1995, pp. 1137–1145.
  • [26] A. P. Bradley, “The use of the area under the roc curve in the evaluation of machine learning algorithms,” Pattern Recognition, vol. 30, pp. 1145–1159, 1997.
  • [27] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.
  • [28] A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry, “Adversarial examples are not bugs, they are features,” in Advances in Neural Information Processing Systems, vol. 32.   Curran Associates, Inc., 2019.
  • [29] Q. Wang, F. Liu, Y. Zhang, J. Zhang, C. Gong, T. Liu, and B. Han, “Watermarking for out-of-distribution detection,” Advances in Neural Information Processing Systems, vol. 35, pp. 15 545–15 557, 2022.
  • [30] S. Liang, Y. Li, and R. Srikant, “Enhancing the reliability of out-of-distribution image detection in neural networks,” arXiv preprint arXiv:1706.02690, 2017.
  • [31] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
  • [32] T. N. Le, T. Gu, H. H. Nguyen, and I. Echizen, “Rethinking adversarial examples for location privacy protection,” in 2022 IEEE International Workshop on Information Forensics and Security (WIFS).   IEEE, December 2022, pp. 1–6.
  • [33] J. Liu, C. P. Lau, and R. Chellappa, “Diffprotect: Generate adversarial examples with diffusion models for facial privacy protection,” arXiv preprint arXiv:2305.13625, 2023.
  • [34] J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” arXiv preprint arXiv:1802.01436, 2018.
  • [35] D. Minnen, J. Ballé, and G. D. Toderici, “Joint autoregressive and hierarchical priors for learned image compression,” in Advances in Neural Information Processing Systems, vol. 31, 2018.
  • [36] W. N. Price and I. G. Cohen, “Privacy in the age of medical big data,” Nature Medicine, vol. 25, no. 1, pp. 37–43, 2019.
  • [37] J. Yoon, M. Mizrahi, N. F. Ghalaty, T. Jarvinen, A. S. Ravi, P. Brune et al., “Ehr-safe: generating high-fidelity and privacy-preserving synthetic electronic health records,” NPJ Digital Medicine, vol. 6, no. 1, p. 141, 2023.
  • [38] A. Mohanarathinam, S. Kamalraj, G. K. D. Prasanna Venkatesan, R. V. Ravi, and C. S. Manikandababu, “Digital watermarking techniques for image security: a review,” Journal of Ambient Intelligence and Humanized Computing, vol. 11, pp. 3221–3229, 2020.
  • [39] X. Wei, B. Pu, S. Zhao, C. Chi, and H. Fu, “Preventing unauthorized ai over-analysis by medical image adversarial watermarking,” 2023.
  • [40] M. Alfarra, J. C. Pérez, A. Thabet, A. Bibi, P. H. Torr, and B. Ghanem, “Combating adversaries with anti-adversaries,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 6, June 2022, pp. 5992–6000.
  • [41] Y. Yang et al., “A digital mask to safeguard patient privacy,” Nature Medicine, vol. 28, pp. 1883–1892, 2022.
  • [42] X. Guo, M. A. Khalid, I. Domingos, A. L. Michala, M. Adriko, C. Rowel et al., “Smartphone-based dna diagnostics for malaria detection using deep learning for local decision support and blockchain technology for security,” Nature Electronics, vol. 4, no. 8, pp. 615–624, 2021.
  • [43] J. C.-S. Cheung, “Vaccination: keep records secure with blockchain,” Nature, vol. 590, pp. 389–390, 2021.
  • [44] Q. Wang, F. Liu, Y. Zhang, J. Zhang, C. Gong, T. Liu, and B. Han, “Watermarking for out-of-distribution detection,” in Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 15 545–15 557.
  • [45] H. Zhang, Y. Yu, J. Jiao, E. Xing, L. El Ghaoui, and M. Jordan, “Theoretically principled trade-off between robustness and accuracy,” in International Conference on Machine Learning.   PMLR, May 2019, pp. 7472–7482.
  • [46] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” arXiv preprint arXiv:1706.06083, 2017.
  • [47] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013.
  • [48] M. Ienca, “Don’t pause giant ai for the wrong reasons,” Nat. Mach. Intell., pp. 1–2, 2023.
  • [49] B. Smith, “Stop talking about tomorrow’s ai doomsday when ai poses risks today,” Nature, vol. 618, pp. 885–886, 2023.

Supplementary Materials
DataCook
: Crafting Anti-Adversarial Examples for Healthcare Data Copyright Protection

Sihan ShangJiancheng YangZhenglong Sun Pascal Fua

Table A1: Protection Results(ACC) on 2D data. We evaluate CP and PP on the 2D datasets comparing various models. The reported values represent the average percentage change in ACC of CP and PP across all 2D datasets. Target (T)delineates the adversarial attack aim: ”Oracle” represents the ideal conditions, ”Probability (Max P)” indicates attacks steered by the maximum output probability of the model, and ”Pseudo Labels” refers to using the initial prediction’s highest probability to guide the attack direction. Loss (L) involves two approaches: using Logistic Probability (Logit) directly as the loss function and applying the logarithm of Logistic Probability (logP𝑃\log Proman_log italic_P) as the loss criterion. Adv/Anti-adv indicates whether the strategy aims to minimize or maximize the loss function. Optimizer (O) specifies the choice of optimizer for image updates, enhancing the clarity and logical flow of the evaluation methodology.
Method ResNet-18 ResNet-50 VGG-16 ConvNext-t
L T O CP\downarrow PP\uparrow CP\downarrow PP\uparrow CP\downarrow PP\uparrow CP\downarrow PP\uparrow
Random Noise -2.65 -1.61 -4.43 -2.01 -3.43 -1.92 -1.05 -2.08
Adv Logit Oracle Adam -35.17 -17.66 -16.52 -7.44 -42.58 -12.79 -39.59 -32.21
logP𝑃\log Proman_log italic_P Oracle Adam -21.31 -14.39 -16.01 -7.85 -31.51 -14.34 -29.21 -27.97
Logit Max P𝑃Pitalic_P Adam -6.19 -3.74 -6.63 -4.53 -16.98 -3.11 -7.06 -6.25
logP𝑃\log Proman_log italic_P Pseudo Adam -18.82 -3.04 -12.44 -5.03 -17.01 -3.49 -14.52 -1.29
Logit Pseudo SGD-M -17.56 -2.41 -8.81 -6.03 -36.37 -2.95 -26.63 -5.19
Logit Pseudo Adam -27.02 -1.58 -11.99 -4.51 -24.95 -2.41 -18.37 -3.54
Anti Adv Logit Oracle Adam -13.15 -22.14 -7.68 -16.45 -12.27 -18.33 -7.91 -32.4
logP𝑃\log Proman_log italic_P Oracle Adam -7.98 -22.19 -8.06 -16.41 -6.58 -18.71 -3.19 -32.39
Logit Max P𝑃Pitalic_P Adam -18.36 -0.75 -8.03 -1.17 -15.99 -1.01 -13.65 -1.58
logP𝑃\log Proman_log italic_P Pseudo Adam -12.47 -0.78 -10.29 -0.72 -11.29 -0.53 -8.87 -0.96
Logit Pseudo SGD-M -15.71 -0.69 -7.41 -1.28 -20.32 -0.74 -15.51 -1.26
Logit Pseudo Adam -18.78 -0.67 -9.63 -0.91 -16.11 -0.56 -13.81 -1.59
Table A2: Protection Results(AUC) on 2D data. We evaluate CP and PP on the 2D datasets comparing various models. The reported values represent the average percentage change in AUC of CP and PP across all 2D datasets. The bold denoted the best performance on PP.Target (T)delineates the adversarial attack aim: ”Oracle” represents the ideal conditions, ”Probability (Max P)” indicates attacks steered by the maximum output probability of the model, and ”Pseudo Labels” refers to using the initial prediction’s highest probability to guide the attack direction. Loss (L) involves two approaches: using Logistic Probability (Logit) directly as the loss function and applying the logarithm of Logistic Probability (logP𝑃\log Proman_log italic_P) as the loss criterion. Adv/Anti-adv indicates whether the strategy aims to minimize or maximize the loss function. Optimizer (O) specifies the choice of optimizer for image updates, enhancing the clarity and logical flow of the evaluation methodology.
Method ResNet-18 ResNet-50 VGG-16 ConvNext-t
L T O CP\downarrow PP\uparrow CP\downarrow PP\uparrow CP\downarrow PP\uparrow CP\downarrow PP\uparrow
Random Noise 0.01 -1.71 -0.44 -1.67 0.01 -1.38 0.21 -0.72
Adv Logit Oracle Adam -25.03 -6.88 -8.35 -3.69 -24.86 -6.27 -27.53 -13.51
logP𝑃\log Proman_log italic_P Oracle Adam -16.73 -6.52 -8.55 -3.83 -17.78 -5.86 -18.18 -11.82
Logit Max P𝑃Pitalic_P Adam -4.16 -3.75 -2.96 -2.08 -3.96 -1.81 -1.04 -1.81
logP𝑃\log Proman_log italic_P Pseudo Adam -9.08 -3.52 -6.55 -2.57 -7.88 -2.39 -6.04 -1.26
Logit Pseudo SGD-M -8.71 -3.83 -2.81 -2.94 -14.08 -2.61 -9.58 -1.67
Logit Pseudo Adam -12.82 -4.29 -5.55 -2.41 -10.98 -2.21 -6.94 -1.11
Anti Adv Logit Oracle Adam -4.07 -9.29 -3.14 -7.91 -2.69 -7.66 -3.35 -13.51
logP𝑃\log Proman_log italic_P Oracle Adam -3.83 -9.11 -3.19 -7.45 -1.31 -7.07 -0.82 -13.51
Logit Max P𝑃Pitalic_P Adam -7.23 -2.97 -3.35 -2.65 -3.28 -1.36 -4.61 -1.35
logP𝑃\log Proman_log italic_P Pseudo Adam -7.11 -1.61 -3.52 -2.85 -2.24 -1.29 -2.98 -0.72
Logit Pseudo SGD-M -5.69 -3.62 -1.02 -1.72 -3.54 -1.49 -5.78 -1.74
Logit Pseudo Adam -7.19 -2.96 -3.11 -2.99 -3.66 -1.62 -4.62 -1.35
Table A3: Protection Results on High-Resolution datasets. We evaluate CP and PP on the High-Resolution datasets , comparing various models. The reported values represent the average percentage change in CP ACC and PP ACC across all High-Resolution datasets.the results in the High-Resolution Dataset present a quantitative analysis that mirrors our previously discussed findings in 2D datasets.
ResNet-18 ResNet-50 VGG-16 ConvNext-t
Method CP\downarrow PP\uparrow CP\downarrow PP\uparrow CP\downarrow PP\uparrow CP\downarrow PP\uparrow
Random Noisy -0.82 -0.47 -0.7 -0.9 -0.8 -0.82 -0.7 -0.93
Adv Oracle -26.77 -14.05 -12.45 -4.35 -14.4 -4.52 -49.77 -22.97
Pseudo -15.52 -0.4 -6.92 -0.72 -12.81 -0.55 -23.42 -3.5
AntiAdv Oracle -19.47 -15.8 -25.25 -16.42 -10.67 -16.72 -6.69 -13.1
Pseudo -17.75 -0.42 -9.45 -0.61 -13.2 -0.15 -15.37 -0.62
Table A4: Original Results(ACC) on 2D data. We show original result of 2D datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. The reported values represent the original value in ACC of our method across all 2D datasets. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions, ”Probability (Max P)” indicates attacks steered by the maximum output probability of the model, and ”Pseudo Labels” refers to using the initial prediction’s highest probability to guide the attack direction. Loss (L) involves two approaches: using Logistic Probability (Logit) directly as the loss function and applying the logarithm of Logistic Probability (logP𝑃\log Proman_log italic_P) as the loss criterion. Adv/Anti-adv indicates whether the strategy aims to minimize or maximize the loss function. Optimizer (O) specifies the choice of optimizer for image updates, enhancing the clarity and logical flow of the evaluation methodology. Dataset (D) represent benign (original dataset) and protected dataset.
DataSet Method ResNet-18 ResNet-50 VGG-16 ConvNext-t
DermanMNIST A L T O D Surrogate Protected Surrogate Protected Surrogate Protected Surrogate Protected
Random Noise Benign 0.744 0.747 0.717 0.717 0.702 0.722 0.706 0.714
Protected 0.728 0.743 0.707 0.712 0.697 0.724 0.702 0.711
Adv Logit Pseudo SGD-M Benign 0.744 0.697 0.717 0.682 0.702 0.681 0.706 0.663
Protected 0.109 0.713 0.635 0.683 0.146 0.697 0.036 0.705
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.744 0.689 0.717 0.694 0.702 0.241 0.706 0.271
Protected 0.031 0.858 0.292 0.71 0.147 0.886 0 1
Logit Oracle Adam Benign 0.744 0.701 0.717 0.688 0.702 0.143 0.706 0.089
Protected 0.058 0.841 0.447 0.697 0.189 0.905 0 1
Logit Max P𝑃Pitalic_P Adam Benign 0.744 0.685 0.717 0.691 0.702 0.699 0.706 0.697
Protected 0.303 0.695 0.565 0.701 0.245 0.707 0.151 0.714
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.744 0.694 0.717 0.696 0.702 0.692 0.706 0.697
Protected 0.142 0.711 0.371 0.701 0.247 0.706 0.155 0.702
Logit Pseudo Adam Benign 0.744 0.686 0.717 0.681 0.702 0.695 0.706 0.693
Protected 0.161 0.715 0.505 0.703 0.259 0.715 0.052 0.704
Anti Adv Logit Pseudo SGD-M Benign 0.744 0.687 0.717 0.694 0.702 0.475 0.706 0.668
Protected 0.743 0.742 0.69 0.713 0.689 0.706 0.706 0.714
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.744 0.698 0.717 0.701 0.702 0.684 0.706 0.683
Protected 0.972 0.989 0.839 0.883 0.863 0.923 0.999 1
Logit Oracle Adam Benign 0.744 0.693 0.717 0.697 0.702 0.658 0.706 0.628
Protected 0.968 0.987 0.742 0.851 0.756 0.932 0.998 1
Logit Max P𝑃Pitalic_P Adam Benign 0.744 0.689 0.717 0.687 0.702 0.654 0.706 0.709
Protected 0.716 0.743 0.678 0.708 0.691 0.713 0.706 0.717
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.744 0.684 0.717 0.692 0.702 0.661 0.706 0.705
Protected 0.741 0.743 0.706 0.705 0.702 0.707 0.706 0.711
Logit Pseudo Adam Benign 0.744 0.686 0.717 0.694 0.702 0.663 0.706 0.709
Protected 0.741 0.745 0.689 0.711 0.691 0.711 0.706 0.717
BreastMNIST Random Noise Benign 0.756 0.808 0.744 0.757 0.801 0.801 0.699 0.705
Protected 0.756 0.821 0.776 0.751 0.801 0.801 0.731 0.737
Adv Logit Pseudo SGD-M Benign 0.756 0.731 0.744 0.494 0.801 0.776 0.699 0.282
Protected 0.218 0.769 0.628 0.591 0.737 0.801 0.269 0.269
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.756 0.718 0.744 0.442 0.801 0.718 0.699 0.436
Protected 0 0.955 0.038 0.532 0 0.911 0.013 1
Logit Oracle Adam Benign 0.756 0.705 0.744 0.583 0.801 0.744 0.699 0.532
Protected 0 0.923 0.045 0.596 0 0.841 0.212 1
Logit Max P𝑃Pitalic_P Adam Benign 0.756 0.763 0.744 0.641 0.801 0.756 0.699 0.756
Protected 0.276 0.635 0.314 0.654 0.429 0.801 0.269 0.333
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.756 0.679 0.744 0.481 0.801 0.718 0.699 0.391
Protected 0.212 0.724 0.231 0.686 0.186 0.756 0.199 0.647
Logit Pseudo Adam Benign 0.756 0.724 0.744 0.494 0.801 0.756 0.699 0.558
Protected 0.212 0.763 0.244 0.679 0.186 0.782 0.269 0.397
Anti Adv Logit Pseudo SGD-M Benign 0.756 0.365 0.744 0.577 0.801 0.519 0.699 0.699
Protected 0.788 0.788 0.661 0.686 0.808 0.827 0.801 0.731
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.756 0.635 0.744 0.455 0.801 0.596 0.699 0.731
Protected 1 1 0.974 0.936 1 1 0.987 1
Logit Oracle Adam Benign 0.756 0.474 0.744 0.455 0.801 0.455 0.699 0.699
Protected 0.974 0.944 0.994 0.949 0.994 1 0.917 1
Logit Max P𝑃Pitalic_P Adam Benign 0.756 0.295 0.744 0.545 0.801 0.462 0.699 0.538
Protected 0.795 0.801 0.737 0.718 0.833 0.827 0.808 0.756
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.756 0.571 0.744 0.455 0.801 0.423 0.699 0.442
Protected 0.788 0.795 0.782 0.763 0.814 0.821 0.801 0.661
Logit Pseudo Adam Benign 0.756 0.288 0.744 0.449 0.801 0.429 0.699 0.538
Protected 0.788 0.795 0.795 0.776 0.814 0.808 0.808 0.756
OctMNIST Random Noise Benign 0.655 0.558 0.753 0.694 0.784 0.695 0.639 0.618
Protected 0.356 0.629 0.474 0.761 0.379 0.71 0.46 0.64
Adv Logit Pseudo SGD-M Benign 0.655 0.275 0.753 0.685 0.784 0.458 0.639 0.307
Protected 0.083 0.672 0.291 0.664 0.096 0.772 0.054 0.645
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.655 0.191 0.753 0.512 0.784 0.454 0.639 0.235
Protected 0 1 0.036 0.863 0.003 0.962 0 0.999
Logit Oracle Adam Benign 0.655 0.117 0.753 0.359 0.784 0.326 0.639 0.224
Protected 0 1 0.1 0.914 0.001 1 0 1
Logit Max P𝑃Pitalic_P Adam Benign 0.655 0.672 0.753 0.636 0.784 0.554 0.639 0.633
Protected 0.232 0.633 0.271 0.664 0.241 0.761 0.261 0.655
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.655 0.237 0.753 0.542 0.784 0.46 0.639 0.355
Protected 0.135 0.654 0.071 0.675 0.064 0.749 0.109 0.633
Logit Pseudo Adam Benign 0.655 0.491 0.753 0.523 0.784 0.32 0.639 0.353
Protected 0.07 0.659 0.146 0.693 0.013 0.783 0.047 0.635
Anti Adv Logit Pseudo SGD-M Benign 0.655 0.546 0.753 0.678 0.784 0.629 0.639 0.499
Protected 0.655 0.664 0.304 0.731 0.635 0.786 0.639 0.638
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.655 0.701 0.753 0.694 0.784 0.781 0.639 0.644
Protected 1 1 0.78 0.96 0.987 1 1 1
Logit Oracle Adam Benign 0.655 0.607 0.753 0.698 0.784 0.807 0.639 0.653
Protected 1 1 0.744 0.988 0.999 1 1 1
Logit Max P𝑃Pitalic_P Adam Benign 0.655 0.591 0.753 0.712 0.784 0.717 0.639 0.52
Protected 0.655 0.658 0.401 0.726 0.716 0.786 0.639 0.642
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.655 0.622 0.753 0.657 0.784 0.671 0.639 0.531
Protected 0.655 0.67 0.652 0.745 0.772 0.783 0.639 0.644
Logit Pseudo Adam Benign 0.655 0.596 0.753 0.632 0.784 0.726 0.639 0.521
Protected 0.655 0.659 0.612 0.749 0.783 0.783 0.639 0.642
Table A5: Original Results(ACC) on 2D data. We show original result of 2D datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. The reported values represent the original value in ACC of our method across all 2D datasets. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions, ”Probability (Max P)” indicates attacks steered by the maximum output probability of the model, and ”Pseudo Labels” refers to using the initial prediction’s highest probability to guide the attack direction. Loss (L) involves two approaches: using Logistic Probability (Logit) directly as the loss function and applying the logarithm of Logistic Probability (logP𝑃\log Proman_log italic_P) as the loss criterion. Adv/Anti-adv indicates whether the strategy aims to minimize or maximize the loss function. Optimizer (O) specifies the choice of optimizer for image updates, enhancing the clarity and logical flow of the evaluation methodology. Dataset (D) represent benign (original dataset) and protected dataset.
DataSet Method ResNet-18 ResNet-50 VGG-16 ConvNext-t
PneumoniaMNIST A L T O D Surrogate Protected Surrogate Protected Surrogate Protected Surrogate Protected
Random Noise Benign 0.875 0.846 0.862 0.861 0.865 0.845 0.875 0.899
Protected 0.885 0.862 0.857 0.856 0.867 0.861 0.853 0.873
Adv Logit Pseudo SGD-M Benign 0.857 0.715 0.862 0.741 0.865 0.561 0.875 0.657
Protected 0.357 0.833 0.575 0.835 0.197 0.883 0.162 0.854
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.857 0.646 0.862 0.631 0.865 0.441 0.875 0.761
Protected 0.013 0.991 0.053 0.883 0.042 1 0 0.998
Logit Oracle Adam Benign 0.857 0.641 0.862 0.67 0.865 0.444 0.875 0.704
Protected 0.054 0.974 0.037 0.854 0.064 0.998 0 0.998
Logit Max P𝑃Pitalic_P Adam Benign 0.857 0.628 0.862 0.736 0.865 0.841 0.875 0.731
Protected 0.462 0.83 0.478 0.851 0.428 0.885 0.375 0.745
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.857 0.667 0.862 0.635 0.865 0.606 0.875 0.806
Protected 0.173 0.838 0.191 0.824 0.196 0.888 0.162 0.843
Logit Pseudo Adam Benign 0.857 0.673 0.862 0.734 0.865 0.758 0.875 0.617
Protected 0.215 0.83 0.197 0.843 0.218 0.877 0.162 0.851
Anti Adv Logit Pseudo SGD-M Benign 0.857 0.715 0.862 0.702 0.865 0.646 0.875 0.822
Protected 0.822 0.84 0.612 0.835 0.846 0.867 0.838 0.838
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.857 0.625 0.862 0.739 0.865 0.711 0.875 0.875
Protected 0.997 0.995 0.982 0.968 1 0.998 1 1
Logit Oracle Adam Benign 0.857 0.63 0.862 0.761 0.865 0.7 0.875 0.875
Protected 0.998 0.995 0.974 0.984 1 0.995 1 1
Logit Max P𝑃Pitalic_P Adam Benign 0.857 0.701 0.862 0.724 0.865 0.755 0.875 0.817
Protected 0.841 0.841 0.764 0.831 0.846 0.848 0.838 0.812
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.857 0.639 0.862 0.744 0.865 0.832 0.875 0.859
Protected 0.84 0.841 0.825 0.835 0.846 0.856 0.838 0.851
Logit Pseudo Adam Benign 0.857 0.697 0.862 0.747 0.865 0.755 0.875 0.817
Protected 0.841 0.84 0.817 0.838 0.846 0.848 0.838 0.812
RetianMNIST Random Noise Benign 0.495 0.528 0.505 0.515 0.505 0.502 0.497 0.517
Protected 0.467 0.528 0.501 0.535 0.502 0.507 0.507 0.551
Adv Logit Pseudo SGD-M Benign 0.495 0.417 0.505 0.471 0.505 0.438 0.497 0.527
Protected 0.152 0.487 0.401 0.481 0.312 0.443 0.113 0.521
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.495 0.328 0.505 0.492 0.505 0.168 0.497 0.265
Protected 0 0.811 0.072 0.463 0.107 0.733 0.028 1
Logit Oracle Adam Benign 0.495 0.312 0.505 0.472 0.505 0.233 0.497 0.102
Protected 0 0.797 0.142 0.49 0.233 0.627 0.058 1
Logit Max P𝑃Pitalic_P Adam Benign 0.495 0.485 0.505 0.453 0.505 0.443 0.497 0.491
Protected 0.241 0.501 0.372 0.501 0.315 0.482 0.228 0.547
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.495 0.263 0.505 0.481 0.505 0.441 0.497 0.522
Protected 0.133 0.475 0.198 0.487 0.247 0.502 0.163 0.505
Logit Pseudo Adam Benign 0.495 0.345 0.505 0.481 0.505 0.438 0.497 0.491
Protected 0.142 0.482 0.28 0.502 0.307 0.465 0.172 0.502
Anti Adv Logit Pseudo SGD-M Benign 0.495 0.355 0.505 0.448 0.505 0.305 0.497 0.537
Protected 0.495 0.492 0.435 0.497 0.505 0.525 0.497 0.497
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.495 0.448 0.505 0.463 0.505 0.427 0.497 0.501
Protected 0.981 0.985 0.745 0.701 0.651 0.748 0.988 1
Logit Oracle Adam Benign 0.495 0.468 0.505 0.472 0.505 0.23 0.497 0.453
Protected 0.985 0.983 0.681 0.691 0.651 0.762 0.971 1
Logit Max P𝑃Pitalic_P Adam Benign 0.495 0.351 0.505 0.461 0.505 0.333 0.497 0.491
Protected 0.491 0.505 0.445 0.505 0.443 0.521 0.497 0.497
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.495 0.422 0.505 0.445 0.505 0.323 0.497 0.485
Protected 0.495 0.497 0.482 0.495 0.505 0.522 0.497 0.497
Logit Pseudo Adam Benign 0.495 0.378 0.505 0.492 0.505 0.318 0.497 0.491
Protected 0.497 0.497 0.472 0.482 0.505 0.517 0.497 0.497
PathMNIST Random Noise Benign 0.875 0.869 0.868 0.759 0.864 0.887 0.72 0.682
Protected 0.605 0.878 0.552 0.723 0.458 0.839 0.452 0.707
Adv Logit Pseudo SGD-M Benign 0.875 0.612 0.868 0.831 0.864 0.284 0.721 0.311
Protected 0.159 0.84 0.253 0.823 0.128 0.853 0.077 0.723
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.875 0.399 0.868 0.722 0.864 0.584 0.72 0.297
Protected 0.119 0.891 0.131 0.847 0.081 0.952 0.043 0.988
Logit Oracle Adam Benign 0.875 0.355 0.868 0.695 0.864 0.214 0.72 0.287
Protected 0.122 0.948 0.143 0.811 0.105 0.973 0.071 0.992
Logit Max P𝑃Pitalic_P Adam Benign 0.875 0.745 0.868 0.819 0.864 0.354 0.721 0.529
Protected 0.222 0.829 0.214 0.805 0.147 0.847 0.175 0.721
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.875 0.426 0.868 0.756 0.864 0.607 0.721 0.404
Protected 0.168 0.822 0.176 0.816 0.121 0.838 0.156 0.726
Logit Pseudo Adam Benign 0.875 0.368 0.868 0.726 0.864 0.284 0.721 0.375
Protected 0.153 0.855 0.173 0.808 0.116 0.851 0.083 0.739
Anti Adv Logit Pseudo SGD-M Benign 0.875 0.814 0.868 0.873 0.864 0.682 0.72 0.552
Protected 0.711 0.881 0.436 0.871 0.751 0.852 0.72 0.734
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.875 0.775 0.868 0.781 0.864 0.841 0.721 0.592
Protected 0.954 0.984 0.914 0.979 0.904 0.981 0.992 0.999
Logit Oracle Adam Benign 0.875 0.757 0.868 0.771 0.864 0.804 0.721 0.542
Protected 0.939 0.985 0.895 0.984 0.933 0.986 0.989 0.998
Logit Max P𝑃Pitalic_P Adam Benign 0.875 0.805 0.868 0.842 0.864 0.617 0.72 0.562
Protected 0.552 0.875 0.531 0.869 0.742 0.845 0.721 0.736
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.875 0.744 0.868 0.739 0.864 0.756 0.721 0.578
Protected 0.856 0.882 0.816 0.868 0.808 0.858 0.721 0.734
Logit Pseudo Adam Benign 0.875 0.736 0.868 0.741 0.864 0.633 0.72 0.545
Protected 0.841 0.882 0.793 0.868 0.828 0.856 0.721 0.738
Table A6: Original Results(ACC) on 2D data. We show original result of 2D datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. The reported values represent the original value in ACC of our method across all 2D datasets. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions, ”Probability (Max P)” indicates attacks steered by the maximum output probability of the model, and ”Pseudo Labels” refers to using the initial prediction’s highest probability to guide the attack direction. Loss (L) involves two approaches: using Logistic Probability (Logit) directly as the loss function and applying the logarithm of Logistic Probability (logP𝑃\log Proman_log italic_P) as the loss criterion. Adv/Anti-adv indicates whether the strategy aims to minimize or maximize the loss function. Optimizer (O) specifies the choice of optimizer for image updates, enhancing the clarity and logical flow of the evaluation methodology. Dataset (D) represent benign (original dataset) and protected dataset.
DataSet Method ResNet-18 ResNet-50 VGG-16 ConvNext-t
BloodMNIST A L T O D Surrogate Protected Surrogate Protected Surrogate Protected Surrogate Protected
Random Noise Benign 0.937 0.891 0.921 0.812 0.905 0.836 0.845 0.818
Protected 0.855 0.935 0.821 0.968 0.721 0.932 0.809 0.825
Adv Logit Pseudo SGD-M Benign 0.937 0.575 0.921 0.661 0.905 0.198 0.845 0.471
Protected 0.028 0.919 0.151 0.836 0.013 0.904 0.011 0.849
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.937 0.607 0.921 0.671 0.905 0.486 0.845 0.317
Protected 0 0.961 0.005 0.879 0.028 0.988 0 1
Logit Oracle Adam Benign 0.937 0.505 0.92 0.542 0.905 0.231 0.845 0.232
Protected 0.011 0.988 0.072 0.901 0.006 0.993 0 1
Logit Max P𝑃Pitalic_P Adam Benign 0.937 0.768 0.921 0.764 0.905 0.291 0.845 0.605
Protected 0.183 0.894 0.232 0.841 0.124 0.884 0.142 0.834
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.937 0.629 0.921 0.671 0.905 0.717 0.845 0.602
Protected 0.044 0.902 0.055 0.849 0.056 0.892 0.058 0.841
Logit Pseudo Adam Benign 0.937 0.499 0.921 0.614 0.905 0.459 0.845 0.506
Protected 0.035 0.924 0.092 0.851 0.025 0.899 0.013 0.848
Anti Adv Logit Pseudo SGD-M Benign 0.937 0.557 0.92 0.739 0.905 0.474 0.845 0.539
Protected 0.937 0.937 0.747 0.915 0.547 0.91 0.845 0.849
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.937 0.734 0.921 0.774 0.905 0.791 0.845 0.836
Protected 1 1 1 0.999 0.996 1 1 1
Logit Oracle Adam Benign 0.937 0.786 0.92 0.818 0.905 0.729 0.845 0.838
Protected 1 1 0.999 0.999 0.567 0.999 1 1
Logit Max P𝑃Pitalic_P Adam Benign 0.937 0.507 0.921 0.691 0.905 0.593 0.845 0.586
Protected 0.937 0.937 0.877 0.897 0.521 0.918 0.845 0.844
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.937 0.498 0.921 0.659 0.905 0.782 0.845 0.747
Protected 0.937 0.937 0.921 0.921 0.902 0.904 0.845 0.847
Logit Pseudo Adam Benign 0.937 0.507 0.921 0.728 0.905 0.611 0.845 0.586
Protected 0.937 0.937 0.919 0.919 0.541 0.907 0.845 0.844
OrganCMNIST Random Noise Benign 0.886 0.882 0.885 0.875 0.878 0.86 0.645 0.693
Protected 0.883 0.89 0.874 0.881 0.86 0.874 0.649 0.697
Adv Logit Pseudo SGD-M Benign 0.886 0.868 0.885 0.858 0.878 0.325 0.645 0.334
Protected 0.085 0.872 0.291 0.84 0.121 0.813 0.061 0.656
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.886 0.878 0.885 0.856 0.878 0.686 0.645 0.285
Protected 0.016 0.858 0.021 0.815 0.17 0.815 0.007 0.995
Logit Oracle Adam Benign 0.886 0.409 0.885 0.851 0.878 0.585 0.645 0.221
Protected 0.014 0.975 0.079 0.802 0.179 0.871 0.012 0.997
Logit Max P𝑃Pitalic_P Adam Benign 0.886 0.879 0.885 0.871 0.878 0.814 0.645 0.493
Protected 0.144 0.858 0.33 0.856 0.224 0.801 0.114 0.714
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.886 0.88 0.885 0.855 0.878 0.724 0.645 0.561
Protected 0.071 0.871 0.082 0.831 0.229 0.794 0.127 0.663
Logit Pseudo Adam Benign 0.886 0.458 0.885 0.861 0.878 0.644 0.645 0.506
Protected 0.048 0.867 0.114 0.833 0.222 0.822 0.097 0.659
Anti Adv Logit Pseudo SGD-M Benign 0.886 0.828 0.885 0.845 0.878 0.754 0.645 0.323
Protected 0.905 0.905 0.866 0.884 0.878 0.879 0.645 0.66
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.886 0.869 0.885 0.861 0.878 0.87 0.645 0.561
Protected 0.995 0.995 0.993 0.991 0.986 0.989 0.987 0.997
Logit Oracle Adam Benign 0.886 0.802 0.885 0.871 0.878 0.87 0.645 0.491
Protected 0.997 0.996 0.991 0.992 0.974 0.977 0.982 0.998
Logit Max P𝑃Pitalic_P Adam Benign 0.886 0.753 0.885 0.839 0.878 0.791 0.645 0.508
Protected 0.886 0.886 0.881 0.884 0.878 0.88 0.645 0.656
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.886 0.845 0.885 0.859 0.878 0.836 0.645 0.571
Protected 0.886 0.885 0.885 0.886 0.878 0.878 0.645 0.652
Logit Pseudo Adam Benign 0.886 0.753 0.885 0.846 0.878 0.791 0.645 0.508
Protected 0.886 0.88 0.885 0.886 0.878 0.88 0.645 0.656
OrganAMNIST Random Noise Benign 0.867 0.869 0.927 0.922 0.868 0.851 0.763 0.768
Protected 0.856 0.872 0.901 0.937 0.849 0.871 0.751 0.774
Adv Logit Pseudo SGD-M Benign 0.867 0.471 0.927 0.918 0.868 0.326 0.763 0.378
Protected 0.052 0.849 0.231 0.883 0.151 0.786 0.043 0.762
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.867 0.827 0.927 0.913 0.868 0.764 0.763 0.592
Protected 0.008 0.803 0.019 0.861 0.248 0.825 0.005 0.969
Logit Oracle Adam Benign 0.867 0.49 0.927 0.915 0.868 0.593 0.763 0.447
Protected 0.004 0.979 0.078 0.858 0.195 0.855 0.008 0.988
Logit Max P𝑃Pitalic_P Adam Benign 0.867 0.881 0.927 0.939 0.868 0.776 0.763 0.792
Protected 0.141 0.862 0.317 0.898 0.239 0.798 0.098 0.786
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.867 0.787 0.927 0.911 0.868 0.761 0.763 0.641
Protected 0.075 0.836 0.063 0.876 0.319 0.792 0.114 0.767
Logit Pseudo Adam Benign 0.867 0.509 0.927 0.922 0.868 0.606 0.763 0.476
Protected 0.05 0.862 0.111 0.879 0.251 0.804 0.041 0.763
Anti Adv Logit Pseudo SGD-M Benign 0.867 0.712 0.927 0.919 0.868 0.757 0.763 0.399
Protected 0.867 0.867 0.916 0.927 0.868 0.867 0.763 0.762
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.867 0.862 0.927 0.921 0.868 0.857 0.763 0.761
Protected 0.999 0.999 0.999 0.998 0.991 0.989 0.998 0.998
Logit Oracle Adam Benign 0.867 0.809 0.927 0.919 0.868 0.855 0.763 0.764
Protected 0.999 0.998 1 0.998 0.978 0.979 0.998 0.998
Logit Max P𝑃Pitalic_P Adam Benign 0.867 0.719 0.927 0.912 0.868 0.802 0.763 0.482
Protected 0.867 0.867 0.927 0.926 0.868 0.867 0.763 0.762
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.867 0.827 0.927 0.912 0.868 0.842 0.763 0.681
Protected 0.867 0.867 0.928 0.927 0.868 0.868 0.763 0.761
Logit Pseudo Adam Benign 0.867 0.719 0.927 0.912 0.868 0.802 0.763 0.482
Protected 0.867 0.867 0.928 0.927 0.868 0.867 0.763 0.762
Table A7: Original Results(ACC) on 2D data. We show original result of 2D datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. The reported values represent the original value in ACC of our method across all 2D datasets. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions, ”Probability (Max P)” indicates attacks steered by the maximum output probability of the model, and ”Pseudo Labels” refers to using the initial prediction’s highest probability to guide the attack direction. Loss (L) involves two approaches: using Logistic Probability (Logit) directly as the loss function and applying the logarithm of Logistic Probability (logP𝑃\log Proman_log italic_P) as the loss criterion. Adv/Anti-adv indicates whether the strategy aims to minimize or maximize the loss function. Optimizer (O) specifies the choice of optimizer for image updates, enhancing the clarity and logical flow of the evaluation methodology. Dataset (D) represent benign (original dataset) and protected dataset.
DataSet Method ResNet-18 ResNet-50 VGG-16 ConvNext-t
OrganSMNIST A L T O D Surrogate Protected Surrogate Protected Surrogate Protected Surrogate Protected
Random Noise Benign 0.723 0.701 0.751 0.737 0.737 0.744 0.481 0.456
Protected 0.711 0.715 0.771 0.747 0.748 0.769 0.469 0.466
Adv Logit Pseudo SGD-M Benign 0.723 0.675 0.751 0.713 0.737 0.225 0.481 0.278
Protected 0.09 0.677 0.213 0.693 0.104 0.698 0.053 0.499
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.723 0.639 0.751 0.657 0.737 0.441 0.481 0.199
Protected 0.01 0.723 0.022 0.661 0.064 0.866 0.008 0.997
Logit Oracle Adam Benign 0.723 0.261 0.751 0.732 0.737 0.301 0.481 0.163
Protected 0.014 0.977 0.065 0.685 0.096 0.872 0.013 0.998
Logit Max P𝑃Pitalic_P Adam Benign 0.723 0.733 0.751 0.75 0.737 0.681 0.481 0.406
Protected 0.139 0.709 0.28 0.726 0.158 0.689 0.101 0.493
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.723 0.664 0.751 0.697 0.737 0.543 0.481 0.413
Protected 0.149 0.683 0.144 0.687 0.192 0.681 0.161 0.485
Logit Pseudo Adam Benign 0.723 0.302 0.751 0.726 0.737 0.469 0.481 0.373
Protected 0.094 0.709 0.122 0.702 0.154 0.705 0.101 0.494
Anti Adv Logit Pseudo SGD-M Benign 0.723 0.641 0.751 0.716 0.737 0.636 0.481 0.28
Protected 0.76 0.723 0.736 0.748 0.727 0.736 0.481 0.494
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.723 0.681 0.751 0.721 0.737 0.714 0.481 0.401
Protected 0.998 0.998 0.998 0.992 0.985 0.987 0.989 0.998
Logit Oracle Adam Benign 0.723 0.556 0.751 0.731 0.737 0.672 0.481 0.274
Protected 0.994 0.994 0.987 0.982 0.912 0.945 0.983 0.999
Logit Max P𝑃Pitalic_P Adam Benign 0.723 0.479 0.751 0.711 0.737 0.631 0.481 0.347
Protected 0.722 0.722 0.749 0.747 0.732 0.731 0.481 0.486
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.723 0.647 0.751 0.701 0.737 0.673 0.481 0.402
Protected 0.722 0.721 0.751 0.748 0.737 0.737 0.481 0.481
Logit Pseudo Adam Benign 0.723 0.481 0.751 0.701 0.737 0.621 0.481 0.347
Protected 0.722 0.722 0.751 0.746 0.732 0.735 0.481 0.486
TissueMNIST Random Noise Benign 0.649 0.468 0.645 0.475 0.645 0.481 0.561 0.445
Protected 0.436 0.631 0.498 0.641 0.316 0.601 0.441 0.538
Adv Logit Pseudo SGD-M Benign 0.649 0.373 0.645 0.572 0.645 0.328 0.56 0.425
Protected 0.134 0.627 0.337 0.583 0.052 0.638 0.026 0.562
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.649 0.181 0.645 0.227 0.645 0.108 0.561 0.181
Protected 0.016 0.978 0.027 0.829 0.027 0.983 0 1
Logit Oracle Adam Benign 0.649 0.08 0.645 0.253 0.645 0.057 0.561 0.075
Protected 0.037 0.986 0.048 0.819 0.048 0.986 0 1
Logit Max P𝑃Pitalic_P Adam Benign 0.649 0.524 0.645 0.54 0.645 0.48 0.561 0.521
Protected 0.238 0.597 0.265 0.583 0.181 0.607 0.123 0.561
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.649 0.46 0.645 0.484 0.645 0.416 0.561 0.442
Protected 0.149 0.62 0.153 0.592 0.138 0.627 0.133 0.563
Logit Pseudo Adam Benign 0.649 0.417 0.645 0.498 0.645 0.381 0.56 0.462
Protected 0.13 0.626 0.141 0.586 0.095 0.637 0.026 0.563
Anti Adv Logit Pseudo SGD-M Benign 0.649 0.553 0.645 0.617 0.645 0.431 0.561 0.367
Protected 0.641 0.647 0.524 0.642 0.633 0.644 0.561 0.565
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.649 0.535 0.645 0.582 0.645 0.561 0.561 0.497
Protected 0.956 0.996 0.912 0.976 0.946 0.996 1 1
Logit Oracle Adam Benign 0.649 0.416 0.645 0.541 0.645 0.424 0.561 0.345
Protected 0.941 0.998 0.808 0.969 0.893 0.996 0.994 1
Logit Max P𝑃Pitalic_P Adam Benign 0.649 0.537 0.645 0.572 0.645 0.44 0.56 0.369
Protected 0.62 0.643 0.542 0.639 0.601 0.645 0.56 0.565
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.649 0.574 0.645 0.582 0.645 0.515 0.56 0.453
Protected 0.642 0.646 0.626 0.644 0.636 0.645 0.56 0.566
Logit Pseudo Adam Benign 0.649 0.539 0.645 0.576 0.645 0.434 0.56 0.369
Protected 0.639 0.646 0.587 0.642 0.617 0.646 0.561 0.564
Table A8: Original Results(AUC) on 2D data. We show original result of 2D datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. The reported values represent the original value in AUC of our method across all 2D datasets. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions, ”Probability (Max P)” indicates attacks steered by the maximum output probability of the model, and ”Pseudo Labels” refers to using the initial prediction’s highest probability to guide the attack direction. Loss (L) involves two approaches: using Logistic Probability (Logit) directly as the loss function and applying the logarithm of Logistic Probability (logP𝑃\log Proman_log italic_P) as the loss criterion. Adv/Anti-adv indicates whether the strategy aims to minimize or maximize the loss function. Optimizer (O) specifies the choice of optimizer for image updates, enhancing the clarity and logical flow of the evaluation methodology. Dataset (D) represent benign (original dataset) and protected dataset.
DataSet Method ResNet-18 ResNet-50 VGG-16 ConvNext-t
DermanMNIST A L T O D Surrogate Protected Surrogate Protected Surrogate Protected Surrogate Protected
Random Noise Benign 0.851 0.854 0.851 0.851 0.791 0.809 0.821 0.821
Protected 0.831 0.851 0.831 0.861 0.792 0.811 0.819 0.821
Adv Logit Pseudo SGD-M Benign 0.851 0.825 0.851 0.807 0.791 0.763 0.821 0.767
Protected 0.411 0.818 0.565 0.818 0.595 0.781 0.386 0.828
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.851 0.775 0.851 0.806 0.791 0.671 0.821 0.611
Protected 0.138 0.892 0.331 0.817 0.449 0.855 0.029 1
Logit Oracle Adam Benign 0.851 0.772 0.851 0.793 0.791 0.529 0.821 0.473
Protected 0.072 0.882 0.202 0.802 0.369 0.885 0.005 1
Logit Max P𝑃Pitalic_P Adam Benign 0.851 0.811 0.851 0.824 0.791 0.786 0.821 0.803
Protected 0.591 0.819 0.608 0.836 0.713 0.791 0.455 0.815
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.851 0.811 0.851 0.823 0.791 0.778 0.821 0.807
Protected 0.463 0.832 0.508 0.833 0.645 0.785 0.529 0.83
Logit Pseudo Adam Benign 0.851 0.798 0.851 0.796 0.791 0.781 0.821 0.81
Protected 0.427 0.825 0.516 0.831 0.672 0.788 0.444 0.835
Anti Adv Logit Pseudo SGD-M Benign 0.851 0.781 0.851 0.829 0.791 0.759 0.821 0.789
Protected 0.781 0.825 0.691 0.845 0.784 0.773 0.718 0.822
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.851 0.779 0.851 0.779 0.791 0.792 0.821 0.805
Protected 0.991 0.995 0.921 0.941 0.871 0.876 0.999 1
Logit Oracle Adam Benign 0.851 0.761 0.851 0.779 0.791 0.764 0.821 0.765
Protected 0.985 0.994 0.896 0.933 0.882 0.875 0.999 1
Logit Max P𝑃Pitalic_P Adam Benign 0.851 0.781 0.851 0.819 0.791 0.787 0.821 0.804
Protected 0.774 0.836 0.662 0.841 0.778 0.788 0.795 0.82
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.851 0.778 0.851 0.825 0.791 0.776 0.821 0.797
Protected 0.791 0.829 0.738 0.844 0.785 0.789 0.735 0.818
Logit Pseudo Adam Benign 0.851 0.781 0.851 0.831 0.791 0.787 0.821 0.804
Protected 0.785 0.836 0.708 0.848 0.781 0.791 0.795 0.821
BreastMNIST Random Noise Benign 0.827 0.925 0.773 0.878 0.833 0.833 0.769 0.778
Protected 0.803 0.938 0.819 0.897 0.831 0.835 0.772 0.777
Adv Logit Pseudo SGD-M Benign 0.827 0.709 0.773 0.716 0.833 0.753 0.769 0.691
Protected 0.334 0.699 0.624 0.661 0.501 0.798 0.331 0.724
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.827 0.573 0.773 0.752 0.833 0.548 0.769 0.441
Protected 0.001 0.988 0.028 0.561 0.001 0.981 0.001 1
Logit Oracle Adam Benign 0.827 0.579 0.773 0.775 0.833 0.623 0.769 0.445
Protected 0.001 0.978 0.017 0.617 0.001 0.917 0.011 1
Logit Max P𝑃Pitalic_P Adam Benign 0.827 0.749 0.773 0.688 0.833 0.796 0.769 0.778
Protected 0.431 0.601 0.567 0.714 0.524 0.841 0.553 0.717
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.827 0.577 0.773 0.698 0.833 0.588 0.769 0.731
Protected 0.292 0.712 0.363 0.691 0.268 0.775 0.364 0.767
Logit Pseudo Adam Benign 0.827 0.647 0.773 0.684 0.833 0.621 0.769 0.755
Protected 0.329 0.712 0.336 0.706 0.301 0.775 0.344 0.765
Anti Adv Logit Pseudo SGD-M Benign 0.827 0.671 0.773 0.807 0.833 0.746 0.769 0.689
Protected 0.738 0.741 0.613 0.761 0.795 0.837 0.739 0.718
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.827 0.781 0.773 0.719 0.833 0.813 0.769 0.766
Protected 1 1 0.999 0.982 1 1 1 1
Logit Oracle Adam Benign 0.827 0.799 0.773 0.684 0.833 0.754 0.769 0.775
Protected 1 1 0.999 0.989 1 1 0.988 1
Logit Max P𝑃Pitalic_P Adam Benign 0.827 0.554 0.773 0.721 0.833 0.751 0.769 0.705
Protected 0.717 0.762 0.611 0.715 0.782 0.801 0.758 0.739
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.827 0.562 0.773 0.711 0.833 0.766 0.769 0.705
Protected 0.816 0.811 0.741 0.694 0.804 0.814 0.755 0.766
Logit Pseudo Adam Benign 0.827 0.559 0.773 0.721 0.833 0.696 0.769 0.705
Protected 0.716 0.765 0.728 0.711 0.781 0.776 0.758 0.739
OctMNIST Random Noise Benign 0.899 0.861 0.943 0.927 0.954 0.919 0.876 0.877
Protected 0.752 0.881 0.847 0.941 0.759 0.931 0.785 0.874
Adv Logit Pseudo SGD-M Benign 0.899 0.607 0.943 0.927 0.954 0.849 0.876 0.597
Protected 0.278 0.876 0.523 0.893 0.203 0.915 0.221 0.856
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.899 0.343 0.943 0.819 0.954 0.815 0.876 0.471
Protected 0.051 1 0.119 0.975 0.147 0.983 0.006 1
Logit Oracle Adam Benign 0.899 0.258 0.943 0.731 0.954 0.622 0.876 0.478
Protected 0 1 0.029 0.991 0.012 0.998 0 1
Logit Max P𝑃Pitalic_P Adam Benign 0.899 0.893 0.943 0.911 0.954 0.841 0.876 0.862
Protected 0.499 0.876 0.527 0.895 0.503 0.919 0.518 0.872
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.899 0.661 0.943 0.854 0.954 0.847 0.876 0.621
Protected 0.334 0.887 0.225 0.892 0.251 0.911 0.263 0.858
Logit Pseudo Adam Benign 0.899 0.721 0.943 0.819 0.954 0.784 0.876 0.619
Protected 0.263 0.879 0.179 0.886 0.136 0.921 0.198 0.852
Anti Adv Logit Pseudo SGD-M Benign 0.899 0.849 0.943 0.931 0.954 0.891 0.876 0.763
Protected 0.832 0.887 0.678 0.901 0.893 0.933 0.801 0.862
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.899 0.901 0.943 0.928 0.954 0.933 0.876 0.875
Protected 1 1 0.991 0.996 0.994 0.999 1 1
Logit Oracle Adam Benign 0.899 0.837 0.943 0.921 0.954 0.944 0.876 0.871
Protected 1 1 0.999 1 0.999 0.999 1 1
Logit Max P𝑃Pitalic_P Adam Benign 0.899 0.859 0.943 0.911 0.954 0.911 0.876 0.765
Protected 0.826 0.883 0.718 0.891 0.871 0.935 0.797 0.847
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.899 0.862 0.943 0.909 0.954 0.931 0.876 0.808
Protected 0.807 0.898 0.876 0.897 0.898 0.929 0.772 0.876
Logit Pseudo Adam Benign 0.899 0.861 0.943 0.897 0.954 0.926 0.876 0.765
Protected 0.826 0.883 0.866 0.892 0.898 0.933 0.797 0.847
Table A9: Original Results(AUC) on 2D data. We show original result of 2D datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. The reported values represent the original value in AUC of our method across all 2D datasets. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions, ”Probability (Max P)” indicates attacks steered by the maximum output probability of the model, and ”Pseudo Labels” refers to using the initial prediction’s highest probability to guide the attack direction. Loss (L) involves two approaches: using Logistic Probability (Logit) directly as the loss function and applying the logarithm of Logistic Probability (logP𝑃\log Proman_log italic_P) as the loss criterion. Adv/Anti-adv indicates whether the strategy aims to minimize or maximize the loss function. Optimizer (O) specifies the choice of optimizer for image updates, enhancing the clarity and logical flow of the evaluation methodology. Dataset (D) represent benign (original dataset) and protected dataset.
DataSet Method ResNet-18 ResNet-50 VGG-16 ConvNext-t
PneumoniaMNIST A L T O D Surrogate Protected Surrogate Protected Surrogate Protected Surrogate Protected
Random Noise Benign 0.955 0.921 0.947 0.918 0.952 0.963 0.951 0.956
Protected 0.946 0.924 0.939 0.931 0.939 0.959 0.932 0.946
Adv Logit Pseudo SGD-M Benign 0.955 0.851 0.947 0.869 0.952 0.807 0.951 0.891
Protected 0.321 0.826 0.656 0.911 0.314 0.923 0.308 0.905
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.955 0.805 0.947 0.538 0.952 0.571 0.951 0.843
Protected 0 0.999 0.006 0.955 0.008 1 0 1
Logit Oracle Adam Benign 0.955 0.799 0.947 0.717 0.952 0.584 0.951 0.843
Protected 0 0.999 0.003 0.915 0.018 1 0 1
Logit Max P𝑃Pitalic_P Adam Benign 0.955 0.751 0.947 0.829 0.952 0.925 0.951 0.945
Protected 0.739 0.911 0.443 0.917 0.381 0.949 0.723 0.943
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.955 0.819 0.947 0.587 0.952 0.812 0.951 0.891
Protected 0.302 0.825 0.348 0.914 0.309 0.951 0.327 0.903
Logit Pseudo Adam Benign 0.955 0.817 0.947 0.735 0.952 0.839 0.951 0.888
Protected 0.359 0.824 0.361 0.916 0.323 0.925 0.329 0.911
Anti Adv Logit Pseudo SGD-M Benign 0.955 0.847 0.947 0.881 0.952 0.949 0.951 0.941
Protected 0.716 0.818 0.816 0.881 0.898 0.951 0.861 0.944
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.955 0.833 0.947 0.836 0.952 0.921 0.951 0.951
Protected 1 1 1 0.999 1 1 1 1
Logit Oracle Adam Benign 0.955 0.867 0.947 0.853 0.952 0.908 0.951 0.951
Protected 1 1 1 0.999 1 1 1 1
Logit Max P𝑃Pitalic_P Adam Benign 0.955 0.838 0.947 0.839 0.952 0.959 0.951 0.932
Protected 0.687 0.836 0.741 0.864 0.887 0.949 0.859 0.937
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.955 0.802 0.947 0.833 0.952 0.948 0.951 0.936
Protected 0.907 0.913 0.861 0.857 0.937 0.951 0.961 0.939
Logit Pseudo Adam Benign 0.955 0.845 0.947 0.832 0.952 0.959 0.951 0.932
Protected 0.687 0.834 0.692 0.838 0.887 0.949 0.859 0.937
RetianMNIST Random Noise Benign 0.665 0.679 0.665 0.688 0.671 0.664 0.723 0.725
Protected 0.654 0.675 0.645 0.671 0.674 0.665 0.721 0.722
Adv Logit Pseudo SGD-M Benign 0.665 0.611 0.665 0.646 0.671 0.673 0.723 0.711
Protected 0.424 0.661 0.479 0.658 0.537 0.651 0.432 0.708
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.665 0.543 0.665 0.643 0.671 0.345 0.723 0.548
Protected 0.021 0.884 0.135 0.642 0.297 0.731 0.031 1
Logit Oracle Adam Benign 0.665 0.568 0.665 0.646 0.671 0.492 0.723 0.375
Protected 0.015 0.888 0.087 0.646 0.182 0.833 0.019 1
Logit Max P𝑃Pitalic_P Adam Benign 0.665 0.641 0.665 0.656 0.671 0.663 0.723 0.718
Protected 0.531 0.648 0.538 0.668 0.544 0.642 0.486 0.673
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.665 0.582 0.665 0.627 0.671 0.561 0.723 0.703
Protected 0.406 0.645 0.398 0.679 0.371 0.662 0.476 0.712
Logit Pseudo Adam Benign 0.665 0.609 0.665 0.691 0.671 0.661 0.723 0.711
Protected 0.433 0.633 0.432 0.677 0.516 0.644 0.491 0.715
Anti Adv Logit Pseudo SGD-M Benign 0.665 0.561 0.665 0.661 0.671 0.611 0.723 0.688
Protected 0.625 0.631 0.612 0.667 0.677 0.695 0.701 0.68
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.665 0.562 0.665 0.627 0.671 0.637 0.723 0.725
Protected 0.999 0.993 0.951 0.876 0.803 0.792 0.999 1
Logit Oracle Adam Benign 0.665 0.625 0.665 0.664 0.671 0.651 0.723 0.712
Protected 0.993 0.995 0.955 0.924 0.861 0.861 0.999 1
Logit Max P𝑃Pitalic_P Adam Benign 0.665 0.533 0.665 0.629 0.671 0.596 0.723 0.695
Protected 0.681 0.687 0.611 0.684 0.632 0.657 0.715 0.694
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.665 0.548 0.665 0.616 0.671 0.603 0.723 0.694
Protected 0.631 0.663 0.659 0.668 0.673 0.653 0.712 0.697
Logit Pseudo Adam Benign 0.665 0.538 0.665 0.652 0.671 0.609 0.723 0.695
Protected 0.684 0.687 0.646 0.627 0.661 0.657 0.715 0.694
PathMNIST Random Noise Benign 0.976 0.981 0.971 0.951 0.969 0.968 0.941 0.944
Protected 0.891 0.981 0.891 0.967 0.911 0.962 0.891 0.942
Adv Logit Pseudo SGD-M Benign 0.976 0.942 0.971 0.959 0.969 0.784 0.941 0.859
Protected 0.351 0.963 0.597 0.954 0.346 0.957 0.228 0.938
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.976 0.862 0.971 0.956 0.969 0.886 0.941 0.782
Protected 0.214 0.986 0.309 0.973 0.408 0.982 0.181 0.998
Logit Oracle Adam Benign 0.976 0.741 0.971 0.953 0.969 0.636 0.941 0.684
Protected 0.176 0.986 0.233 0.969 0.242 0.981 0.117 0.999
Logit Max P𝑃Pitalic_P Adam Benign 0.976 0.961 0.971 0.968 0.969 0.824 0.941 0.924
Protected 0.634 0.966 0.634 0.966 0.451 0.953 0.587 0.944
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.976 0.897 0.971 0.962 0.969 0.905 0.941 0.891
Protected 0.295 0.961 0.387 0.967 0.459 0.954 0.381 0.938
Logit Pseudo Adam Benign 0.976 0.836 0.971 0.961 0.969 0.743 0.941 0.899
Protected 0.257 0.964 0.321 0.967 0.297 0.956 0.234 0.945
Anti Adv Logit Pseudo SGD-M Benign 0.976 0.963 0.971 0.975 0.969 0.924 0.941 0.911
Protected 0.913 0.968 0.843 0.964 0.938 0.964 0.921 0.941
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.976 0.958 0.971 0.957 0.969 0.968 0.941 0.927
Protected 0.991 0.993 0.988 0.993 0.968 0.991 0.998 0.999
Logit Oracle Adam Benign 0.976 0.957 0.971 0.959 0.969 0.963 0.941 0.883
Protected 0.992 0.995 0.991 0.995 0.985 0.991 0.995 0.999
Logit Max P𝑃Pitalic_P Adam Benign 0.976 0.964 0.971 0.954 0.969 0.922 0.941 0.921
Protected 0.792 0.967 0.844 0.957 0.922 0.964 0.921 0.944
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.976 0.952 0.971 0.953 0.969 0.948 0.941 0.936
Protected 0.925 0.957 0.935 0.951 0.924 0.954 0.887 0.942
Logit Pseudo Adam Benign 0.976 0.946 0.971 0.956 0.969 0.917 0.941 0.917
Protected 0.938 0.965 0.938 0.959 0.951 0.961 0.921 0.944
Table A10: Original Results(AUC) on 2D data. We show original result of 2D datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. The reported values represent the original value in AUC of our method across all 2D datasets. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions, ”Probability (Max P)” indicates attacks steered by the maximum output probability of the model, and ”Pseudo Labels” refers to using the initial prediction’s highest probability to guide the attack direction. Loss (L) involves two approaches: using Logistic Probability (Logit) directly as the loss function and applying the logarithm of Logistic Probability (logP𝑃\log Proman_log italic_P) as the loss criterion. Adv/Anti-adv indicates whether the strategy aims to minimize or maximize the loss function. Optimizer (O) specifies the choice of optimizer for image updates, enhancing the clarity and logical flow of the evaluation methodology. Dataset (D) represent benign (original dataset) and protected dataset.
DataSet Method ResNet-18 ResNet-50 VGG-16 ConvNext-t
BloodMNIST A L T O D Surrogate Protected Surrogate Protected Surrogate Protected Surrogate Protected
Random Noise Benign 0.993 0.991 0.991 0.948 0.895 0.961 0.961 0.957
Protected 0.976 0.991 0.938 0.919 0.882 0.964 0.951 0.956
Adv Logit Pseudo SGD-M Benign 0.993 0.896 0.991 0.941 0.895 0.579 0.961 0.846
Protected 0.076 0.978 0.284 0.969 0.165 0.961 0.069 0.961
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.993 0.939 0.991 0.939 0.895 0.875 0.961 0.736
Protected 0.023 0.996 0.156 0.981 0.324 0.989 0.023 1
Logit Oracle Adam Benign 0.993 0.874 0.991 0.893 0.895 0.677 0.961 0.644
Protected 0.037 0.999 0.089 0.988 0.131 0.971 0 1
Logit Max P𝑃Pitalic_P Adam Benign 0.993 0.965 0.991 0.971 0.895 0.898 0.961 0.929
Protected 0.552 0.979 0.622 0.973 0.469 0.945 0.492 0.966
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.993 0.953 0.991 0.945 0.895 0.924 0.961 0.896
Protected 0.087 0.977 0.232 0.974 0.372 0.957 0.147 0.961
Logit Pseudo Adam Benign 0.993 0.907 0.991 0.928 0.895 0.805 0.961 0.857
Protected 0.102 0.978 0.161 0.974 0.189 0.942 0.078 0.961
Anti Adv Logit Pseudo SGD-M Benign 0.993 0.963 0.991 0.964 0.895 0.903 0.961 0.879
Protected 0.971 0.979 0.926 0.974 0.881 0.956 0.947 0.961
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.993 0.973 0.991 0.971 0.895 0.901 0.961 0.961
Protected 1 1 1 1 0.922 0.931 1 1
Logit Oracle Adam Benign 0.993 0.991 0.991 0.977 0.895 0.885 0.961 0.961
Protected 1 1 1 1 0.907 0.959 1 1
Logit Max P𝑃Pitalic_P Adam Benign 0.993 0.955 0.991 0.951 0.895 0.916 0.961 0.898
Protected 0.969 0.978 0.951 0.974 0.861 0.957 0.948 0.961
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.993 0.939 0.991 0.955 0.895 0.925 0.961 0.939
Protected 0.969 0.971 0.968 0.973 0.891 0.938 0.925 0.959
Logit Pseudo Adam Benign 0.993 0.955 0.991 0.963 0.895 0.911 0.961 0.898
Protected 0.969 0.978 0.961 0.977 0.881 0.952 0.948 0.961
OrganCMNIST Random Noise Benign 0.983 0.983 0.982 0.974 0.951 0.938 0.875 0.913
Protected 0.982 0.984 0.977 0.976 0.949 0.948 0.875 0.915
Adv Logit Pseudo SGD-M Benign 0.983 0.982 0.982 0.981 0.951 0.775 0.875 0.788
Protected 0.249 0.981 0.572 0.977 0.527 0.931 0.248 0.878
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.983 0.983 0.982 0.978 0.951 0.896 0.875 0.722
Protected 0.301 0.978 0.359 0.968 0.703 0.938 0.255 0.999
Logit Oracle Adam Benign 0.983 0.754 0.982 0.978 0.951 0.864 0.875 0.678
Protected 0.058 0.994 0.227 0.965 0.618 0.961 0.142 1
Logit Max P𝑃Pitalic_P Adam Benign 0.983 0.984 0.982 0.983 0.951 0.943 0.875 0.871
Protected 0.675 0.978 0.809 0.977 0.738 0.935 0.608 0.925
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.983 0.983 0.982 0.978 0.951 0.908 0.875 0.859
Protected 0.365 0.981 0.419 0.974 0.752 0.926 0.504 0.881
Logit Pseudo Adam Benign 0.983 0.797 0.982 0.981 0.951 0.891 0.875 0.836
Protected 0.151 0.955 0.297 0.976 0.678 0.942 0.385 0.884
Anti Adv Logit Pseudo SGD-M Benign 0.983 0.976 0.982 0.977 0.951 0.923 0.875 0.816
Protected 0.968 0.972 0.971 0.975 0.953 0.951 0.879 0.893
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.983 0.983 0.982 0.978 0.951 0.952 0.875 0.853
Protected 0.999 0.999 0.998 0.998 0.988 0.989 0.996 0.999
Logit Oracle Adam Benign 0.983 0.978 0.982 0.979 0.951 0.951 0.875 0.831
Protected 1 1 0.996 0.998 0.982 0.983 0.996 0.999
Logit Max P𝑃Pitalic_P Adam Benign 0.983 0.969 0.982 0.974 0.951 0.929 0.875 0.841
Protected 0.961 0.965 0.961 0.974 0.956 0.951 0.891 0.884
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.983 0.977 0.982 0.977 0.951 0.941 0.875 0.848
Protected 0.971 0.972 0.972 0.973 0.951 0.951 0.874 0.876
Logit Pseudo Adam Benign 0.983 0.969 0.982 0.976 0.951 0.929 0.875 0.841
Protected 0.961 0.965 0.961 0.973 0.956 0.951 0.891 0.884
OrganAMNIST Random Noise Benign 0.979 0.982 0.992 0.981 0.959 0.953 0.945 0.947
Protected 0.976 0.982 0.987 0.991 0.951 0.958 0.943 0.951
Adv Logit Pseudo SGD-M Benign 0.979 0.846 0.992 0.991 0.959 0.801 0.945 0.823
Protected 0.158 0.961 0.485 0.986 0.543 0.933 0.187 0.925
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.979 0.966 0.992 0.991 0.959 0.928 0.945 0.883
Protected 0.345 0.963 0.405 0.981 0.737 0.951 0.173 0.995
Logit Oracle Adam Benign 0.979 0.829 0.992 0.991 0.959 0.884 0.945 0.817
Protected 0.032 0.996 0.222 0.981 0.611 0.958 0.051 0.998
Logit Max P𝑃Pitalic_P Adam Benign 0.979 0.984 0.992 0.994 0.959 0.932 0.945 0.955
Protected 0.636 0.981 0.816 0.988 0.751 0.943 0.554 0.951
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.979 0.962 0.992 0.991 0.959 0.929 0.945 0.909
Protected 0.353 0.963 0.448 0.985 0.791 0.936 0.353 0.918
Logit Pseudo Adam Benign 0.979 0.851 0.992 0.992 0.959 0.889 0.945 0.842
Protected 0.144 0.952 0.278 0.986 0.681 0.941 0.201 0.934
Anti Adv Logit Pseudo SGD-M Benign 0.979 0.957 0.992 0.992 0.959 0.939 0.945 0.874
Protected 0.957 0.962 0.981 0.984 0.957 0.951 0.898 0.914
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.979 0.979 0.992 0.992 0.959 0.955 0.945 0.945
Protected 1 1 1 1 0.991 0.991 1 1
Logit Oracle Adam Benign 0.979 0.971 0.992 0.992 0.959 0.955 0.945 0.945
Protected 1 1 0.999 1 0.991 0.991 1 1
Logit Max P𝑃Pitalic_P Adam Benign 0.979 0.958 0.992 0.991 0.959 0.943 0.945 0.886
Protected 0.953 0.963 0.977 0.984 0.961 0.956 0.908 0.918
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.979 0.973 0.992 0.991 0.959 0.951 0.945 0.926
Protected 0.962 0.967 0.981 0.983 0.949 0.948 0.902 0.919
Logit Pseudo Adam Benign 0.979 0.958 0.992 0.991 0.959 0.943 0.945 0.886
Protected 0.953 0.963 0.977 0.984 0.961 0.956 0.908 0.918
Table A11: Original Results(AUC) on 2D data. We show original result of 2D datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. The reported values represent the original value in AUC of our method across all 2D datasets. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions, ”Probability (Max P)” indicates attacks steered by the maximum output probability of the model, and ”Pseudo Labels” refers to using the initial prediction’s highest probability to guide the attack direction. Loss (L) involves two approaches: using Logistic Probability (Logit) directly as the loss function and applying the logarithm of Logistic Probability (logP𝑃\log Proman_log italic_P) as the loss criterion. Adv/Anti-adv indicates whether the strategy aims to minimize or maximize the loss function. Optimizer (O) specifies the choice of optimizer for image updates, enhancing the clarity and logical flow of the evaluation methodology. Dataset (D) represent benign (original dataset) and protected dataset.
DataSet Method ResNet-18 ResNet-50 VGG-16 ConvNext-t
OrganSMNIST A L T O D Surrogate Protected Surrogate Protected Surrogate Protected Surrogate Protected
Random Noise Benign 0.952 0.945 0.961 0.936 0.929 0.924 0.829 0.821
Protected 0.946 0.949 0.933 0.941 0.924 0.928 0.827 0.826
Adv Logit Pseudo SGD-M Benign 0.952 0.945 0.961 0.957 0.929 0.712 0.829 0.762
Protected 0.307 0.943 0.512 0.953 0.422 0.924 0.313 0.836
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.952 0.941 0.961 0.946 0.929 0.835 0.829 0.657
Protected 0.277 0.948 0.433 0.942 0.629 0.971 0.197 1
Logit Oracle Adam Benign 0.952 0.739 0.961 0.961 0.929 0.747 0.829 0.608
Protected 0.079 0.998 0.223 0.951 0.422 0.972 0.111 1
Logit Max P𝑃Pitalic_P Adam Benign 0.952 0.958 0.961 0.964 0.929 0.915 0.829 0.807
Protected 0.636 0.951 0.777 0.957 0.642 0.918 0.566 0.837
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.952 0.945 0.961 0.953 0.929 0.881 0.829 0.797
Protected 0.438 0.943 0.545 0.951 0.714 0.914 0.545 0.816
Logit Pseudo Adam Benign 0.952 0.817 0.961 0.959 0.929 0.826 0.829 0.789
Protected 0.288 0.911 0.343 0.953 0.519 0.921 0.437 0.832
Anti Adv Logit Pseudo SGD-M Benign 0.952 0.946 0.961 0.955 0.929 0.904 0.829 0.763
Protected 0.949 0.943 0.945 0.952 0.931 0.921 0.811 0.838
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.952 0.951 0.961 0.959 0.929 0.927 0.829 0.801
Protected 1 1 0.998 0.999 0.984 0.988 0.995 1
Logit Oracle Adam Benign 0.952 0.921 0.961 0.958 0.929 0.907 0.829 0.726
Protected 0.999 1 0.989 0.996 0.972 0.973 0.995 1
Logit Max P𝑃Pitalic_P Adam Benign 0.952 0.922 0.961 0.952 0.929 0.901 0.829 0.805
Protected 0.924 0.932 0.936 0.951 0.932 0.921 0.829 0.835
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.952 0.941 0.961 0.951 0.929 0.919 0.829 0.815
Protected 0.931 0.938 0.943 0.951 0.921 0.925 0.812 0.831
Logit Pseudo Adam Benign 0.952 0.922 0.961 0.951 0.929 0.896 0.829 0.805
Protected 0.924 0.933 0.936 0.951 0.932 0.914 0.829 0.835
TissueMNIST Random Noise Benign 0.882 0.824 0.886 0.801 0.863 0.831 0.821 0.798
Protected 0.811 0.857 0.822 0.872 0.761 0.851 0.778 0.811
Adv Logit Pseudo SGD-M Benign 0.882 0.749 0.886 0.853 0.863 0.673 0.821 0.718
Protected 0.298 0.857 0.571 0.844 0.271 0.858 0.258 0.819
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.882 0.389 0.886 0.651 0.863 0.442 0.821 0.489
Protected 0.119 0.997 0.205 0.943 0.165 0.991 0.111 1
Logit Oracle Adam Benign 0.882 0.296 0.886 0.603 0.863 0.374 0.821 0.439
Protected 0.075 0.999 0.105 0.948 0.101 0.979 0.017 1
Logit Max P𝑃Pitalic_P Adam Benign 0.882 0.821 0.886 0.847 0.863 0.808 0.821 0.806
Protected 0.557 0.844 0.618 0.845 0.453 0.848 0.483 0.813
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.882 0.791 0.886 0.822 0.863 0.766 0.821 0.745
Protected 0.353 0.856 0.373 0.846 0.396 0.856 0.378 0.823
Logit Pseudo Adam Benign 0.882 0.753 0.886 0.805 0.863 0.722 0.821 0.743
Protected 0.291 0.857 0.285 0.845 0.294 0.862 0.252 0.817
Anti Adv Logit Pseudo SGD-M Benign 0.882 0.839 0.886 0.871 0.863 0.794 0.821 0.763
Protected 0.841 0.871 0.812 0.871 0.811 0.864 0.801 0.821
logP𝑃\log Proman_log italic_P Oracle Adam Benign 0.882 0.847 0.886 0.863 0.863 0.826 0.821 0.813
Protected 0.991 1 0.968 0.996 0.961 0.988 1 1
Logit Oracle Adam Benign 0.882 0.809 0.886 0.848 0.863 0.791 0.821 0.726
Protected 0.977 1 0.94 0.994 0.924 0.978 0.998 1
Logit Max P𝑃Pitalic_P Adam Benign 0.882 0.835 0.886 0.851 0.863 0.791 0.821 0.756
Protected 0.832 0.871 0.774 0.871 0.798 0.865 0.802 0.821
logP𝑃\log Proman_log italic_P Pseudo Adam Benign 0.882 0.848 0.886 0.852 0.863 0.814 0.821 0.781
Protected 0.831 0.865 0.828 0.861 0.795 0.858 0.771 0.817
Logit Pseudo Adam Benign 0.882 0.838 0.886 0.851 0.863 0.792 0.821 0.757
Protected 0.837 0.871 0.803 0.871 0.803 0.864 0.802 0.821
Table A12: Original Results(ACC) on 3D data. We show original result of 3D datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions and ”Pseudo” refers to using the initial prediction’s highest probability to guide the attack direction. Adv/Anti-adv indicates whether the strategy aims to minimize or maximize the loss function. Dataset (D) represent benign (original) dataset and protected dataset.Using Logistic Probability (Logit) directly as the loss function and Adam as optimizer.
DataSet Method ResNet-18 ResNet-50 VGG-16 ConvNext-t
OrganMNIST3d A T D Surrogate Protected Surrogate Protected Surrogate Protected Surrogate Protected
Random Noise Benign 0.926 0.908 0.895 0.795 0.921 0.895 0.598 0.597
Protected 0.795 0.921 0.664 0.902 0.744 0.923 0.551 0.593
Adv Oracle Benign 0.923 0.901 0.915 0.918 0.926 0.734 0.598 0.303
Protected 0 0.915 0 0.892 0 0.916 0 0.995
Pseudo Benign 0.926 0.911 0.895 0.911 0.921 0.616 0.598 0.398
Protected 0.043 0.889 0.031 0.913 0.038 0.885 0.046 0.613
Anti Adv Oracle Benign 0.923 0.716 0.915 0.828 0.926 0.682 0.598 0.598
Protected 1 1 1 1 0.933 1 1 1
Pseudo Benign 0.923 0.544 0.915 0.821 0.926 0.533 0.598 0.428
Protected 0.923 0.923 0.931 0.931 0.862 0.925 0.598 0.598
NoduleMNIST3d Random Noise Benign 0.848 0.845 0.835 0.848 0.871 0.874 0.816 0.829
Protected 0.858 0.826 0.829 0.848 0.848 0.865 0.801 0.813
Adv Oracle Benign 0.832 0.794 0.835 0.848 0.845 0.319 0.816 0.794
Protected 0.197 1 0 0.845 0 0.997 0.784 1
Pseudo Benign 0.848 0.845 0.835 0.865 0.871 0.813 0.816 0.794
Protected 0.123 0.845 0.155 0.839 0.352 0.858 0.794 0.794
Anti Adv Oracle Benign 0.832 0.813 0.835 0.868 0.845 0.806 0.816 0.842
Protected 1 1 1 1 1 1 0.958 1
Pseudo Benign 0.832 0.855 0.835 0.829 0.845 0.842 0.816 0.529
Protected 0.829 0.852 0.858 0.852 0.861 0.861 0.848 0.206
FractureMNIST3d Random Noise Benign 0.479 0.412 0.533 0.512 0.517 0.483 0.417 0.425
Protected 0.479 0.412 0.533 0.517 0.517 0.483 0.417 0.425
Adv Oracle Benign 0.458 0.338 0.433 0.208 0.5 0 217 0 417 0 212
Protected 0 1 0.433 1 0 0.996 0 1
Pseudo Benign 0.479 0.454 0.533 0.525 0.517 0.471 0.417 0.375
Protected 0.263 0.454 0.308 0.492 0.501 0.446 0.388 0.375
Anti Adv Oracle Benign 0.458 0.371 0.433 0.196 0.501 0.451 0.417 0.404
Protected 1 1 0.433 1 1 1 1 1
Pseudo Benign 0.458 0.254 0.433 0.458 0.501 0.396 0.417 0.329
Protected 0.517 0.508 0.433 0.454 0.501 0.501 0.417 0.392
Table A13: Original Results(AUC) on 3D data. We show original result of 3D datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions and ”Pseudo” refers to using the initial prediction’s highest probability to guide the attack direction. Adv/Anti-adv indicates whether the strategy aims to minimize or maximize the loss function. Dataset (D) represent benign (original) dataset and protected dataset.Using Logistic Probability (Logit) directly as the loss function and Adam as optimizer.
DataSet Method ResNet-18 ResNet-50 VGG-16 ConvNext-t
OrganMNIST3d A T D Surrogate Protected Surrogate Protected Surrogate Protected Surrogate Protected
Random Noise Benign 0.995 0.994 0.989 0.973 0.995 0.992 0.877 0.878
Protected 0.979 0.996 0.953 0.989 0.979 0.995 0.865 0.871
Adv Oracle Benign 0.993 0.992 0.994 0.992 0.988 0.952 0.877 0.756
Protected 0.008 0.992 0 0.988 0.045 0.991 0 0.998
Pseudo Benign 0.995 0.993 0.989 0.996 0.995 0.925 0.877 0.838
Protected 0.096 0.991 0.109 0.994 0.147 0.982 0.206 0.879
Anti Adv Oracle Benign 0.993 0.991 0.994 0.992 0.988 0.973 0.877 0.877
Protected 1 1 1 1 0.994 1 1 1
Pseudo Benign 0.993 0.967 0.994 0.985 0.988 0.955 0.877 0.867
Protected 0.965 0.982 0.981 0.987 0.974 0.982 0.812 0.861
NoduleMNIST3d Random Noise Benign 0.803 0.871 0.822 0.841 0.917 0.912 0.857 0.847
Protected 0.799 0.853 0.791 0.864 0.909 0.896 0.861 0.844
Adv Oracle Benign 0.737 0.761 0.822 0.841 0.862 0.461 0.857 0.429
Protected 0 1 0 0.832 0 1 0 1
Pseudo Benign 0.803 0.868 0.822 0.871 0.917 0.748 0.857 0.765
Protected 0.318 0.866 0.334 0.857 0.359 0.815 0.324 0.765
Anti Adv Oracle Benign 0.737 0.812 0.822 0.865 0.862 0.835 0.857 0.86
Protected 1 1 1 1 1 1 1 1
Pseudo Benign 0.737 0.865 0.822 0.821 0.862 0.819 0.857 0.851
Protected 0.771 0.874 0.771 0.811 0.745 0.836 0.846 0.817
FractureMNIST3d Random Noise Benign 0.577 0.654 0.691 0.716 0.619 0.641 0.564 0.547
Protected 0.577 0.655 0.691 0.716 0.619 0.639 0.564 0.547
Adv Oracle Benign 0.583 0.524 0.473 0.536 0.629 0.531 0.564 0.485
Protected 0 1 0 1 0.007 1 0 1
Pseudo Benign 0.577 0.611 0.691 0.665 0.619 0.647 0.564 0.534
Protected 0.433 0.609 0.444 0.649 0.586 0.651 0.538 0.544
Anti Adv Oracle Benign 0.583 0.539 0.473 0.571 0.629 0.581 0.564 0.563
Protected 1 1 1 1 1 1 1 1
Pseudo Benign 0.583 0.512 0.473 0.577 0.629 0.602 0.564 0.546
Protected 0.633 0.603 0.456 0.579 0.628 0.644 0.552 0.529
Table A14: Raw Results(ACC) on High-Resolution data. We show original result of High-Resolution datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions and ”Pseudo” refers to using the initial prediction’s highest probability to guide the attack direction. Adv/Anti-adv indicates whether the strategy aims to minimize or maximize the loss function. Dataset (D) represent benign (original) dataset and protected dataset.Using Logistic Probability (Logit) directly as the loss function and Adam as optimizer.
DataSet Method ResNet-18 ResNet-50 VGG-16 ConvNext-t
BloodMNIST A T D Surrogate Protected Surrogate Protected Surrogate Protected Surrogate Protected
Random Noise Benign 0.985 0.982 0.983 0.985 0.986 0.983 0.856 0.857
Protected 0.977 0.983 0.981 0.984 0.974 0.988 0.855 0.864
Adv Oracle Benign 0.985 0.593 0.983 0.543 0.986 0.869 0.856 0.786
Protected 1 1 1 1 1 1 0.989 1
Pseudo Benign 0.985 0.833 0.983 0.866 0.986 0.875 0.856 0.803
Protected 0.985 0.985 0.983 0.983 0.986 0.986 0.856 0.867
Anti Adv Oracle Benign 0.985 0.701 0.983 0.956 0.986 0.778 0.856 0.433
Protected 0 1 0.007 0.992 0.009 0.997 0.146 0.999
Pseudo Benign 0.985 0.784 0.983 0.968 0.986 0.868 0.856 0.881
Protected 0.011 0.985 0.013 0.981 0.011 0.985 0.172 0.926
RetianMNIST Random Noise Benign 0.551 0.541 0.531 0.521 0.547 0.525 0.542 0.527
Protected 0.541 0.545 0.515 0.512 0.555 0.535 0.541 0.531
Adv Oracle Benign 0.551 0.521 0.531 0.323 0.547 0.445 0.542 0.461
Protected 1 1 0.995 0.968 1 0.998 0.841 1
Pseudo Benign 0.551 0.438 0.531 0.482 0.547 0.505 0.542 0.502
Protected 0.551 0.552 0.531 0.517 0.547 0.547 0.542 0.545
Anti Adv Oracle Benign 0.551 0.362 0.531 0.532 0.547 0.537 0.542 0.05
Protected 0 0.94 0 0.515 0 0.537 0.307 1
Pseudo Benign 0.551 0.547 0.531 0.505 0.547 0.532 0.542 0.505
Protected 0.142 0.552 0.117 0.512 0.147 0.527 0.455 0.481
BreastMNIST Random Noise Benign 0.853 0.878 0.808 0.841 0.846 0.859 0.353 0.429
Protected 0.859 0.878 0.795 0.827 0.833 0.859 0.346 0.436
Adv Oracle Benign 0.853 0.269 0.808 0.718 0.846 0.724 0.353 0.276
Protected 1 1 0.962 0.962 1 1 0.846 1
Pseudo Benign 0.853 0.269 0.808 0.641 0.846 0.455 0.353 0.269
Protected 0.841 0.841 0.769 0.776 0.846 0.865 0.718 0.705
Anti Adv Oracle Benign 0.853 0.737 0.808 0.769 0.846 0.859 0.353 0.269
Protected 0 1 0 0.776 0 0.865 0.269 1
Pseudo Benign 0.853 0.782 0.808 0.712 0.846 0.865 0.353 0.269
Protected 0.161 0.872 0.231 0.724 0.154 0.878 0.269 0.269
OrganAMNIST Random Noise Benign 0.951 0.949 0.952 0.946 0.959 0.959 0.871 0.881
Protected 0.864 0.949 0.921 0.951 0.933 0.959 0.843 0.879
Adv Oracle Benign 0.951 0.849 0.952 0.807 0.959 0.925 0.871 0.709
Protected 1 1 1 0.999 1 1 0.955 1
Pseudo Benign 0.951 0.761 0.952 0.881 0.959 0.902 0.871 0.493
Protected 0.951 0.951 0.952 0.952 0.959 0.959 0.832 0.871
Anti Adv Oracle Benign 0.951 0.656 0.952 0.951 0.959 0.609 0.871 0.095
Protected 0 0.988 0.002 0.941 0.002 0.986 0 1
Pseudo Benign 0.951 0.571 0.952 0.951 0.959 0.603 0.871 0.182
Protected 0.017 0.951 0.014 0.945 0.002 0.958 0.022 0.871
PneumoniaMNIST Random Noise Benign 0.881 0.862 0.843 0.853 0.837 0.831 0.812 0.814
Protected 0.871 0.871 0.849 0.859 0.796 0.856 0.816 0.821
Adv Oracle Benign 0.881 0.625 0.843 0.625 0.837 0.663 0.812 0.812
Protected 1 0.998 1 0.998 1 1 1 1
Pseudo Benign 0.881 0.625 0.843 0.702 0.837 0.856 0.812 0.667
Protected 0.867 0.865 0.833 0.832 0.832 0.832 0.804 0.801
Anti Adv Oracle Benign 0.881 0.577 0.843 0.375 0.837 0.845 0.812 0.511
Protected 0 1 0 0.982 0 0.97 0 1
Pseudo Benign 0.881 0.843 0.843 0.607 0.837 0.814 0.812 0.625
Protected 0.133 0.867 0.167 0.841 0.168 0.837 0.196 0.804
Table A15: Raw Results(AUC) on High-Resolution data. We show original result of High-Resolution datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions and ”Pseudo” refers to using the initial prediction’s highest probability to guide the attack direction. Adv/Anti-adv indicates whether the strategy aims to minimize or maximize the loss function. Dataset (D) represent benign (original) dataset and protected dataset.Using Logistic Probability (Logit) directly as the loss function and Adam as optimizer.
DataSet Method ResNet-18 ResNet-50 VGG-16 ConvNext-t
BloodMNIST A T D Surrogate Protected Surrogate Protected Surrogate Protected Surrogate Protected
Random Noise Benign 0.999 0.999 0.998 0.998 0.999 0.999 0.961 0.968
Protected 0.999 0.999 0.998 0.998 0.999 0.999 0.955 0.965
Adv Oracle Benign 0.999 0.993 0.998 0.985 0.999 0.996 0.961 0.953
Protected 1 1 1 1 1 1 0.998 1
Pseudo Benign 0.999 0.983 0.998 0.983 0.999 0.987 0.961 0.957
Protected 0.992 0.997 0.992 0.996 0.997 0.998 0.956 0.975
Anti Adv Oracle Benign 0.999 0.937 0.998 0.996 0.999 0.982 0.961 0.825
Protected 0.094 1 0.138 0.999 0.066 1 0.496 1
Pseudo Benign 0.999 0.953 0.998 0.996 0.999 0.991 0.961 0.979
Protected 0.106 0.997 0.152 0.998 0.075 0.999 0.566 0.983
RetianMNIST Random Noise Benign 0.708 0.694 0.687 0.696 0.722 0.706 0.719 0.722
Protected 0.713 0.694 0.684 0.691 0.717 0.705 0.721 0.723
Adv Oracle Benign 0.708 0.691 0.687 0.682 0.722 0.689 0.719 0.675
Protected 1 1 1 0.996 1 1 0.971 1
Pseudo Benign 0.708 0.599 0.687 0.656 0.722 0.681 0.719 0.698
Protected 0.702 0.684 0.694 0.673 0.724 0.735 0.722 0.707
Anti Adv Oracle Benign 0.708 0.621 0.687 0.705 0.722 0.729 0.719 0.546
Protected 0 0.969 0.001 0.692 0 0.717 0.282 1
Pseudo Benign 0.708 0.691 0.687 0.709 0.722 0.726 0.719 0.722
Protected 0.414 0.695 0.334 0.705 0.349 0.723 0.634 0.712
BreastMNIST Random Noise Benign 0.846 0.873 0.812 0.861 0.853 0.897 0.558 0.621
Protected 0.841 0.868 0.801 0.846 0.855 0.912 0.563 0.624
Adv Oracle Benign 0.846 0.835 0.813 0.821 0.853 0.856 0.558 0.638
Protected 1 1 1 0.997 1 1 0.957 1
Pseudo Benign 0.846 0.661 0.813 0.851 0.853 0.894 0.558 0.531
Protected 0.791 0.789 0.831 0.841 0.821 0.895 0.588 0.607
Anti Adv Oracle Benign 0.846 0.752 0.813 0.839 0.853 0.901 0.558 0.412
Protected 0 1 0 0.821 0 0.891 0 1
Pseudo Benign 0.846 0.871 0.813 0.861 0.853 0.896 0.558 0.613
Protected 0.222 0.861 0.252 0.861 0.211 0.901 0.404 0.434
OrganAMNIST Random Noise Benign 0.996 0.996 0.996 0.995 0.997 0.997 0.968 0.971
Protected 0.988 0.996 0.993 0.996 0.994 0.997 0.965 0.969
Adv Oracle Benign 0.996 0.993 0.996 0.993 0.997 0.995 0.968 0.969
Protected 1 1 1 1 1 1 1 0.999
Pseudo Benign 0.996 0.985 0.996 0.991 0.997 0.993 0.968 0.935
Protected 0.972 0.991 0.976 0.989 0.988 0.993 0.927 0.956
Anti Adv Oracle Benign 0.996 0.954 0.996 0.996 0.997 0.915 0.968 0.649
Protected 0.093 0.999 0.042 0.995 0.053 0.999 0.011 1
Pseudo Benign 0.996 0.948 0.996 0.996 0.997 0.918 0.968 0.758
Protected 0.137 0.993 0.081 0.995 0.079 0.995 0.088 0.961
PneumoniaMNIST Random Noise Benign 0.963 0.961 0.947 0.963 0.947 0.935 0.876 0.88
Protected 0.968 0.966 0.942 0.961 0.939 0.945 0.864 0.873
Adv Oracle Benign 0.963 0.928 0.947 0.821 0.947 0.914 0.876 0.876
Protected 1 1 1 1 1 1 1 1
Pseudo Benign 0.963 0.925 0.947 0.918 0.947 0.955 0.876 0.623
Protected 0.753 0.921 0.651 0.882 0.636 0.915 0.589 0.907
Anti Adv Oracle Benign 0.963 0.886 0.947 0.773 0.947 0.951 0.876 0.56
Protected 0 1 0 0.999 0 0.998 0 1
Pseudo Benign 0.963 0.894 0.947 0.915 0.947 0.961 0.876 0.571
Protected 0.277 0.846 0.344 0.963 0.407 0.899 0.421 0.902