¹¹institutetext: The Chinese University of Hong Kong Shenzhen, Guangdong, China ²²institutetext: Swiss Federal Institute of Technology Lausanne (EPFL), Lausanne, Switzerland

DataCook: Crafting Anti-Adversarial Examples for Healthcare Data Copyright Protection

Sihan Shang 11 Jiancheng Yang 22 Zhenglong Sun 11 Pascal Fua 22

Abstract

In the realm of healthcare, the challenges of copyright protection and unauthorized third-party misuse are increasingly significant. Traditional methods for data copyright protection are applied prior to data distribution, implying that models trained on these data become uncontrollable. This paper introduces a novel approach, named DataCook, designed to safeguard the copyright of healthcare data during the deployment phase. DataCook operates by “cooking” the raw data before distribution, enabling the development of models that perform normally on this processed data. However, during the deployment phase, the original test data must be also “cooked” through DataCook to ensure normal model performance. This process grants copyright holders control over authorization during the deployment phase. The mechanism behind DataCook is by crafting anti-adversarial examples (AntiAdv), which are designed to enhance model confidence, as opposed to standard adversarial examples (Adv) that aim to confuse models. Similar to Adv, AntiAdv introduces imperceptible perturbations, ensuring that the data processed by DataCook remains easily understandable. We conducted extensive experiments on MedMNIST datasets, encompassing both 2D/3D data and the high-resolution variants. The outcomes indicate that DataCook effectively meets its objectives, preventing models trained on AntiAdv from analyzing unauthorized data effectively, without compromising the validity and accuracy of the data in legitimate scenarios. Code and data are available at https://github.com/MedMNIST/DataCook.

Keywords:

Data Security Copyright Protection Data-Centric ML.

1 Introduction

In the healthcare field, the rapid development and widespread application of deep learning technology have brought the issues of healthcare data misuse and copyright protection into sharp focus. Healthcare data sensitivity and confidentiality necessitate not only the protection of patients’ privacy rights but also the safeguarding of data providers’ and creators’ intellectual property rights [36, 48, 49]. Current data copyright protection mechanisms, such as data encryption[42, 43], anonymization [37] and Digital Rights Management [38, 41], although can prevent illegal copying and dissemination, yet they do not address the potential data extraction and generalization capabilities of deep learning models, making it difficult to effectively tackle the misuse of data in unauthorized training.

As illustrated in Fig. 1, watermarking is a widely adopted approach for copyright protection that embeds distinct watermarks into images to offer visual security. This method includes specific modifications designed to prevent unauthorized deep learning models from analyzing the images effectively [39]. However, this technique reduces the images’ usability for legitimate training purposes which means the model trained on watermarking data has crush perform when deploying the model for original test data or watermarking test data. To restore data value for model deployment, it is necessary to remove the watermarks for model development. However, this leaves the data as vulnerable as those without any watermarking, exposing them to the risk of unauthorized use and data breaches. The challenge lies in protecting the data from misuse in unauthorized training scenarios without compromising its deployment utility. To tackle this issue, we introduce a novel setting DataCook aimed at resolving these challenges.

Refer to caption — Figure 1: A Comparison of Data Copyright Protection. Unprotected: Unprotected raw data can be directly utilized by unauthorized third parties for model development and deployment phases. Watermarking: Users receive watermarked data and authorized removal methods to obtain raw data for model development. yet post-removal, raw data remains at risk of breach and access by unauthorized third parties. DataCook: Users receive DataCook-processed data for model development. If data is breached, Models trained on this processed data are designed to work solely with datasets through DataCook-processed.

DataCook, leverages a novel strategy of generating anti-adversarial examples to safeguard healthcare data copyright during the model deployment. Essentially, DataCook operates by “cooking” the raw data before distribution, enabling the development of models that perform normally on this processed data. However, during the deployment phase, the original test data must be also “cooked” through DataCook to ensure normal model performance. This process grants copyright holders control over authorization during the deployment phase. The analogy inspiring DataCook’s name draws from the preference for cooked over raw food observed in daily life, symbolizing how processed data retains its value for authorized uses but becomes indigestible to models not adapted to its original form. Through experiments on the publicly available MedMNIST [9, 10] medical image dataset, we have demonstrated that DataCook effectively prevents unauthorized deep learning models from accurately recognizing raw data, while preserving the original characteristics and utility of the data. This not only showcases the potential application of anti-adversarial examples in the copyright protection of healthcare data but also paves a new path for future work in the protection of healthcare data privacy.

2 Method

2.1 Background

To clearly present DataCook, we contrast it with unprotected data and watermarking [39] as shown in Fig. 1, highlighting DataCook’s difference and advantages. To better explain the comparison, We define a raw image dataset as $\mathcal{D}^{r}=\{(x^{r}_{i},y_{i})\}_{i=1}^{N}$ , where the input space $\mathcal{X}^{r}=\{x^{r}_{i}\}_{i=1}^{N}\subset\mathbb{R}^{d}$ corresponds to the raw images and the label space $\mathcal{Y}=\{y_{i}\}_{i=1}^{N}=\{1,\ldots,c\}$ represents their associated labels. Here, $N$ denotes the number of samples, drawn from the distribution $\mathcal{D}^{r}$ , covering the combined space of inputs and labels, $\mathcal{X}^{r}\times\mathcal{Y}$ . A classification model $f^{r}=f^{r}_{\theta^{r}}:\mathcal{X}^{r}\rightarrow\mathbb{R}^{c}$ with logits output is parameterized by $\theta^{r}$ , optimized based on $\mathcal{D}^{r}$ . This process is denoted as $\theta^{r}=\theta^{r}(\{\mathcal{X}^{r},\mathcal{Y}\})$ . Additionally, a protected image dataset, derived from the raw image dataset via specific processing steps, introduces a distribution $\mathcal{D}^{p}$ . This protected image dataset maintains the same label space $\mathcal{Y}$ but occupies a modified input space $\mathcal{X}^{p}=\{x^{p}_{i}\}_{i=1}^{N}\subset\mathbb{R}^{d}$ , ensuring that the dataset spans the domain $\mathcal{X}^{p}\times\mathcal{Y}$ . A corresponding classification model $f^{p}_{\theta^{p}}:\mathcal{X}^{p}\rightarrow\mathbb{R}^{c}$ with logits output, parameterized by $\theta^{p}=\theta^{p}(\{\mathcal{X}^{p},\mathcal{Y}\})$ , is trained on $\mathcal{D}^{p}$ and denoted as $f^{p}$ .

Unprotected: Raw data can be directly utilized by unauthorized third parties for model development and deployment phases.

Watermarking [39]: Users receive watermarked data along with authorized methods for watermark removal to access the raw data. In the deployment stage, if model $f^{p}$ is trained on watermark-protected data, model $f^{p}$ exhibits an inability to classify both watermark-test data and raw test data effectively (crush performance). To preserve the model’s classification capability (normal performance), watermarks must be removed before model development. Once the watermark is removed and occurs data breaches, the data is vulnerable without any protective measures.

DataCook: The proposed DataCook transforms raw data into protected data before dataset distribution. In the deployment stage, models $f^{p}$ trained on DataCook-protected data achieve normal performance on DataCook-test data but have crush performance on raw test data. Even occurs a data breach after distribution and unauthorized third-party use leaks data for model development, new data still need DataCook processed to ensure the model’s classification capability in the model development stage where only data from authorized users can undergo DataCook processing. This process grants copyright holders control over authorization during the deployment phase and effectively mitigates data breach risks.

2.2 DataCook: Problem Setting

To achieve DataCook, the primary objective is to minimize the performance of model $f^{p}$ when it is evaluated on the raw image dataset $\mathcal{D}^{r}$ to prioritize data copyright protection. Concurrently, it is vital to ensure that the performance of model $f^{p}$ on the protected image dataset $\mathcal{D}^{p}$ closely matches the performance of model $f^{r}$ on the raw image dataset $\mathcal{D}^{p}$ to preserves model normal performance. Additionally, to ensure similarity, the distance between the raw image space $\mathcal{X}^{r}$ and the protected image space $\mathcal{X}^{p}$ must be minimized. we adopted the Structural Similarity (SSIM) index as a distance metric between the raw image space and the protected image space, aiming to minimize the intrusiveness of the transformation. Thus, The optimization task can be formalized as:

$\displaystyle\underset{\mathcal{X}^{p}}{\text{minimize}}$	$\displaystyle\mathcal{E}(f^{p},\mathcal{D}^{r})$	(1)
w.r.t.	$\displaystyle\|\mathcal{E}(f^{p},\mathcal{D}^{p})-\mathcal{E}(f^{r},\mathcal{D}% ^{r})\|\leq\epsilon,$	(2)
	$\displaystyle d(\mathcal{X}^{r},\mathcal{X}^{p})\leq\alpha$	(3)

where $\mathcal{E}(f,\mathcal{D})$ denotes the evaluation metric for model $f$ on dataset $\mathcal{D}$ ; $\alpha$ is a threshold for the distance between $\mathcal{X}^{r}$ and $\mathcal{X}^{p}$ ; $\epsilon$ Represents an acceptable performance discrepancy, ideally zero; $f^{p}=f^{p}_{\theta^{p}}$ is parameterized by $\theta^{p}=\theta^{p}(\{\mathcal{X}^{p},\mathcal{Y}\})$ .

2.3 Crafting Anti-Adversarial Examples

Our strategy capitalizes on the potential of adversarial examples, which we have found to be particularly suited for resolving this optimization task. Adversarial examples are crafted by introducing a subtle perturbation $\zeta$ to the input raw image $x^{r}_{i}$ as shown in Eq. 4, aimed at minimally yet significantly impacting the model’s prediction accuracy [47].

x^{p}_{i}=\arg\max_{\zeta}\mathcal{L}(f^{r}_{\theta}(x^{r}_{i}+\zeta),\hat{y})\

(4)

Perturbations are informed by data gradients, making them a targeted feature rather than mere noise [28]. Their subtlety and difficulty in detection ensure that adversarial examples satisfy the similarity criteria in Eq. 3 and adversarial examples serving in adversarial training to bolster model robustness [31, 46, 45, 2] which align with Eq. 2 to maintain model performance. However, the potential of adversarial training to reduce model performance led us to develop the concept of anti-adversarial examples [40, 44]. Contrary to traditional adversarial examples that aim to degrade performance, anti-adversarial examples are designed to improve it, directly addressing the criteria of Eq. 2. Our method confronts the complex challenge delineated by Eq. 1, with forthcoming experiments aimed at validating that our anti-adversarial examples not only adhere to the guidelines of Eq. 2 and Eq. 3 but also achieve the overarching goals of Eq. 1.

As illustrated in Fig. 2, we detail the process of generating anti-adversarial examples. Firstly, employing a pre-trained surrogate model $f^{r}$ to predict the output of a raw image $x^{r}_{i}$ and use the highest probability output as a pseudo-label $\hat{y}_{i}$ , anti-adversarial examples are crafted by maximizing the logistic probability of the pseudo label as shown in Eq. 5, contrasting with adversarial examples that aim to minimize this probability in Eq. 4:

x^{p}_{i}=\arg\min_{\zeta}\mathcal{L}(f^{r}_{\theta}(x^{r}_{i}+\zeta),\hat{y})\

(5)

where $\mathcal{L}$ is the loss function tailored to maximize the logistic probability of $\hat{y}_{i}$ directly. We set SSIM threshold of 0.8 to maintain visual similarity.

3 Experiments

In this section, we evaluate our DataCook in 2D datasets( $28\times 28$ ) , 3D datasets( $28\times 28\times 28$ ) and high-resolution datasets( $224\times 224$ ). Our experiment results show that the anti-adversarial examples crafted through DataCook fulfill the objectives of the optimization task. To assess the effectiveness of anti-adversarial examples, we designed comparative experiments using various optimization conditions.

3.1 Setup

Evaluation Metrics.

Our evaluation of anti-adversarial examples hinges on two key metrics: Copyright Protection (CP) and Performance Preservation (PP). CP represent the definition of Eq.1. To facilitate effect comparison, CP is determined as $\mathcal{E}(f^{p},\mathcal{D}^{r})-\mathcal{E}(f^{r},\mathcal{D}^{r})$ and lower values indicate better protection. PP outlined in Eq. 2, we define PP = - $|\mathcal{E}(f^{p},\mathcal{D}^{p})-\mathcal{E}(f^{r},\mathcal{D}^{r})|$ and values nearing to zero denoting optimal model performance preservation. We use the Area Under the ROC Curve (AUC)[26] and accuracy (ACC) for quantitative analysis. Experiment results show alignment between AUC and ACC, but here we focus on ACC due to space limitations, with detailed AUC available in supplementary materials.

Method Comparison.

To quantitatively evaluate DataCook’s performance, we utilize several benchmarking methods for comparison: Random Noise: Introducing random Gaussian noise as a basic benchmark; Adversarial Target: This theoretical benchmark, represented as an oracle state with provided true data labels, aims to delineate the upper limits of performance, particularly infeasible during deployment; Adv/AntiAdv: Differentiating adversarial from anti-adversarial examples to clarify the practical implications of each strategy.

Datasets.

To rigorously evaluate our method, we utilize the MedMNIST dataset, a comprehensive repository of pre-processed medical images available for public use, cited from both medmnistv2 and medmnistv1. MedMNIST is tailored for benchmarking machine learning models on a wide array of medical image analysis tasks, encompassing diverse data types such as 2D, 3D, and high-resolution images. This diversity positions MedMNIST as an exemplary dataset for testing the efficacy and versatility of our approach across varied medical imaging contexts. Our evaluation strategy involves computing the average performance across all MedMNIST subsets, offering a detailed assessment of our method’s capability in the sophisticated domain of medical image analysis, ensuring both logical coherence and succinctness in our validation process.

Configuration.

Our experiments leveraged PyTorch 2.0 [1] for evaluating wide applicability of DataCook across diverse model architectures, including ResNet18, ResNet50 [5], VGG16 [7] and ConvNet [6]. Training for 2D, 3D, and high-resolution datasets involved 200 epochs using Stochastic Gradient Descent [24], with batch size of 128, momentum of 0.9, and learning rate of $1\times 10^{-3}$ . For protected dataset generation, a uniform learning rate of $5\times 10^{-3}$ was applied. All protected data achieved an SSIM over 0.8 to the corresponding raw data.

3.2 Quantitative Experiments on 2D Data

Settings.

The MedMNIST 2D dataset includes 12 subsets, ranging from 100 to over 100,000 images. We analyzed 11 of these datasets—excluding Chest [12] due to its multi-label focus, which our method doesn’t currently support. The datasets include Path [11], Derma [13, 14], OCT [15], Pneumonia, Retina, Breast [16], Blood [17], Tissue [18], OrganA, OrganC, and OrganS. Despite the limitation, we foresee potential future adaptation to multi-label tasks. In addition to standard 2D datasets, MedMNIST also provides high-resolution versions $224\times 224$ of 2D datasets. Given computational constraints, we particularly focused on Blood [17], Retina, OrganA, Breast [16], and Pneumonia for deeper investigation.

Table 1: Protection Results on 2D datasets. We evaluate CP and PP on the 2D datasets comparing various models. The reported values represent the average percentage change in ACC of CP and PP across all 2D datasets; Underline data entries highlight key comparative findings.

		ResNet-18		ResNet-50		VGG-16		ConvNext-t
Method		CP $\downarrow$	PP $\uparrow$	CP $\downarrow$	PP $\uparrow$	CP $\downarrow$	PP $\uparrow$	CP $\downarrow$	PP $\uparrow$
Random Noise		-2.65	-1.61	-4.43	-2.01	-3.43	-1.92	-1.05	-2.08
Adv	Oracle	-35.17	-17.66	-16.52	-7.44	-42.58	-12.79	-39.59	-32.21
Adv	Pseudo	-27.02	-1.58	-11.99	-4.51	-24.95	-2.41	-18.37	-3.54
AntiAdv	Oracle	-13.15	-22.14	-7.68	-16.45	-12.27	-18.33	-7.91	-32.4
AntiAdv	Pseudo	-18.78	-0.67	-9.63	-0.91	-16.1	-0.56	-13.8	-1.59

Results.

As shown in Tab. 1, our analysis reveals three key insights. First, perturbations introduced by DataCook significantly outperform random noise in enhancing copyright protection, effectively demonstrating that Anti Adv examples can achieve the optimization objectives outlined in Eq. 1. Second, the Oracle state, despite offering better copyright protection, substantially impairs model performance as shown in data with underline, underscoring a crucial trade-off. This finding affirms that the use of pseudo-labels does not compromise model performance. Third, the juxtaposition of Adv and Anti Adv examples indicates that while Anti Adv examples may provide marginally less robust copyright protection, they excel in preserving model performance. This aligns with our main goal of copyright protection without impacting model performance.

Ablation on Optimization Procedure.

Our study includes ablation experiments focusing on optimization procedures, which examine the impact of external adversarial targets, various loss functions, and optimizers. The results are depicted in supplementary materials. We show the current method to craft anti-adversarial examples by pseudo label is the best.

Generalize to High-Resolution Data.

We also conducted tests on the DataCook using high-resolution data, with the results detailed in the supplementary materials. These outcomes are consistent with those obtained from 2D data, demonstrating theDataCook robustness and applicability across different data formats.

3.3 Quantitative Experiments on 3D Data

Settings.

The MedMNIST 3D datasets include six subsets, ranging from from 1,500 to 200 images. Given computational constraints, we focused on OrganMNIST3D [19], NoduleMNIST3D, and FractureMNIST3D.

Table 2: Protection Results on 3D data. We evaluate CP and PP on the 3D data datasets comparing various model. The reported values represent the average change in ACC of CP and PP across all 3D datasets. Underline data entries highlight key comparative findings.

		ResNet-18		ResNet-50		VGG-16		ConvNext-t
Method		CP $\downarrow$	PP $\uparrow$	CP $\downarrow$	PP $\uparrow$	CP $\downarrow$	PP $\uparrow$	CP $\downarrow$	PP $\uparrow$
Random Noisy		-2.94	-3.13	-3.6	-1.2	-1.87	-1.43	0.67	-0.53
Adv	Oracle	-6.03	-23.93	-6.97	-20.01	-33.37	-21.93	-17.41	-38.81
Adv	Pseudo	-1.44	-2.16	1.26	-2.11	-13.61	-3.96	-8.81	-2.63
AntiAdv	Oracle	-10.44	-26.23	-9.71	-27.23	-11.11	-24.31	0.43	-38.96
AntiAdv	Pseudo	-18.67	-1.75	-1.81	-1.35	-26.7	-0.91	-15.27	-1.01

Results.

Tab. 2 presents the outcomes from the 3D datasets, revealing trends consistent with those found in the 2D counterparts, yet with distinct nuances. Notably, Anti Adv examples demonstrate superior performance in CP and PP over Adv examples in larger datasets as shown in data with underline, a phenomenon more evident that Anti Adv example can deepen its effects in the context of 3D data. However, this protective efficacy diminishes in deeper architectures like ResNet50, hinting that such networks might overlook perturbations in 3D data. This observation suggests an avenue for enhancement in future research. Intriguingly, pseudo labels yield better results than the Oracle in ResNet18 and ConvNet models, suggesting that providing data labels might lead to reduced performance with 3D datasets, highlighting the complexity of model-data interactions in 3D spaces.

Analysis on DataCook-Generated Perturbation

As illustrated in Fig. 3. The selected model exhibits identical training and testing results across raw, protected, and perturbed data, and this effect persists across other models as well. Firstly, this indicates a stable capacity for performance perseverance. Secondly, it suggests that perturbation serves as an effective feature, impacting both model learning and performance. In comparison to protected data, raw data lacks the effective feature of perturbation, which is essential for validating Eq. 1. The bottom figure shows that despite the introduction of protective perturbations, the visual alterations to protected data are subtle to the human eye. This reveals that DataCook, through its integration of label information via classifiable perturbations, does not degrade model performance or visual perceptibly, contrasting with data compression methods[34, 35] that might compromise the integrity of raw data. Our analysis demonstrates that DataCook is designed to offer complete confidentiality and obscurity, enabling its use without detection by others, thus significantly increasing its practical utility.

4 Conclusion

In conclusion, this study introduces DataCook, a pioneering approach for healthcare data copyright protection during the deployment phase, effectively addressing the challenges of unauthorized third-party misuse. Through the innovative use of anti-adversarial examples, DataCook ensures that data remains protected and only accessible by authorized users, without compromising data integrity and model performance.

References

[1] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” vol. 32, 2019.
[2] J. Yang, Y. Jiang, X. Huang, B. Ni, and C. Zhao, “Learning black-box attackers with transferable priors and query feedback,” vol. 33, pp. 12 288–12 299, 2020.
[3] J. Ma, Y. Zhang, S. Gu, C. Zhu, C. Ge, Y. Zhang, X. An, C. Wang, Q. Wang, X. Liu, S. Cao, Q. Zhang, S. Liu, Y. Wang, Y. Li, J. He, and X. Yang, “Abdomenct-1k: Is abdominal organ segmentation a solved problem?” 2021.
[4] J. Yang, X. Huang, Y. He, J. Xu, C. Yang, G. Xu, and B. Ni, “Reinventing 2d convolutions for 3d images,” vol. 25, no. 8, pp. 3009–3018, 2021.
[5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2016, pp. 770–778.
[6] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” 2022, pp. 11 976–11 986.
[7] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International conference on learning representations, 2015.
[8] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” International Conference on Learning Representations, 2021.
[9] J. Yang, R. Shi, D. Wei, Z. Liu, L. Zhao, B. Ke, H. Pfister, and B. Ni, “Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification,” Scientific Data, vol. 10, no. 1, p. 41, 2023.
[10] J. Yang, R. Shi, and B. Ni, “Medmnist classification decathlon: A lightweight automl benchmark for medical image analysis,” in IEEE 18th International Symposium on Biomedical Imaging (ISBI), 2021, pp. 191–195.
[11] J. N. Kather, J. Krisam et al., “Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study,” PLOS Medicine, vol. 16, no. 1, pp. 1–22, 01 2019.
[12] X. Wang, Y. Peng et al., “Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases,” in CVPR, 2017, pp. 3462–3471.
[13] P. Tschandl, C. Rosendahl, and H. Kittler, “The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions,” Scientific data, p. 180161, 2018.
[14] N. Codella, V. Rotemberg, P. Tschandl, M. E. Celebi, S. Dusza, D. Gutman, B. Helba, A. Kalloo, K. Liopyris, M. Marchetti et al., “Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic),” arXiv preprint arXiv:1902.03368, 2019.
[15] D. S. Kermany, M. Goldbaum et al., “Identifying medical diagnoses and treatable diseases by image-based deep learning,” Cell, vol. 172, no. 5, pp. 1122 – 1131.e9, 2018.
[16] W. Al-Dhabyani, M. Gomaa, H. Khaled, and A. Fahmy, “Dataset of breast ultrasound images,” Data in Brief, vol. 28, p. 104863, 2020.
[17] A. Acevedo, A. Merino, S. Alférez, Ángel Molina, L. Boldú, and J. Rodellar, “A dataset of microscopic peripheral blood cell images for development of automatic recognition systems,” Data in Brief, vol. 30, p. 105474, 2020.
[18] V. Ljosa, K. L. Sokolnicki, and A. E. Carpenter, “Annotated high-throughput microscopy image sets for validation.” Nature methods, vol. 9, no. 7, pp. 637–637, 2012.
[19] P. Bilic, P. F. Christ et al., “The liver tumor segmentation benchmark (lits),” CoRR, vol. abs/1901.04056, 2019.
[20] X. Xu, F. Zhou et al., “Efficient multiple organ localization in ct image using 3d region proposal network,” IEEE Transactions on Medical Imaging, vol. 38, no. 8, pp. 1885–1898, 2019.
[21] S. G. Armato III, G. McLennan, L. Bidaut, M. F. McNitt-Gray, C. R. Meyer, A. P. Reeves, B. Zhao, D. R. Aberle, C. I. Henschke, E. A. Hoffman, E. A. Kazerooni, H. MacMahon, E. J. R. van Beek, D. Yankelevitz, A. M. Biancardi, P. H. Bland, M. S. Brown, R. M. Engelmann, G. E. Laderach, D. Max, R. C. Pais, D. P.-Y. Qing, R. Y. Roberts, A. R. Smith, A. Starkey, P. Batra, P. Caligiuri, A. Farooqi, G. W. Gladish, C. M. Jude, R. F. Munden, I. Petkovska, L. E. Quint, L. H. Schwartz, B. Sundaram, L. E. Dodd, C. Fenimore, D. Gur, N. Petrick, J. Freymann, J. Kirby, B. Hughes, A. Vande Casteele, S. Gupte, M. Sallam, M. D. Heath, M. H. Kuhn, E. Dharaiya, R. Burns, D. S. Fryd, M. Salganicoff, V. Anand, U. Shreter, S. Vastagh, B. Y. Croft, and L. P. Clarke, “The lung image database consortium (lidc) and image database resource initiative (idri): A completed reference database of lung nodules on ct scans,” Medical Physics, vol. 38, no. 2, pp. 915–931, 2011.
[22] L. **, J. Yang, K. Kuang, B. Ni, Y. Gao, Y. Sun, P. Gao, W. Ma, M. Tan, H. Kang, J. Chen, and M. Li, “Deep-learning-assisted detection and segmentation of rib fractures from ct scans: Development and validation of fracnet,” EBioMedicine, vol. 62, p. 103106, 2020.
[23] X. Yang, D. Xia, T. Kin, and T. Igarashi, “Intra: 3d intracranial aneurysm dataset for deep learning,” in CVPR, June 2020.
[24] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance of initialization and momentum in deep learning,” in Proceedings of the 30th International Conference on Machine Learning (ICML-13), 2013, pp. 1139–1147.
[25] R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in IJCAI, vol. 14, no. 2, 1995, pp. 1137–1145.
[26] A. P. Bradley, “The use of the area under the roc curve in the evaluation of machine learning algorithms,” Pattern Recognition, vol. 30, pp. 1145–1159, 1997.
[27] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.
[28] A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry, “Adversarial examples are not bugs, they are features,” in Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc., 2019.
[29] Q. Wang, F. Liu, Y. Zhang, J. Zhang, C. Gong, T. Liu, and B. Han, “Watermarking for out-of-distribution detection,” Advances in Neural Information Processing Systems, vol. 35, pp. 15 545–15 557, 2022.
[30] S. Liang, Y. Li, and R. Srikant, “Enhancing the reliability of out-of-distribution image detection in neural networks,” arXiv preprint arXiv:1706.02690, 2017.
[31] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
[32] T. N. Le, T. Gu, H. H. Nguyen, and I. Echizen, “Rethinking adversarial examples for location privacy protection,” in 2022 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, December 2022, pp. 1–6.
[33] J. Liu, C. P. Lau, and R. Chellappa, “Diffprotect: Generate adversarial examples with diffusion models for facial privacy protection,” arXiv preprint arXiv:2305.13625, 2023.
[34] J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” arXiv preprint arXiv:1802.01436, 2018.
[35] D. Minnen, J. Ballé, and G. D. Toderici, “Joint autoregressive and hierarchical priors for learned image compression,” in Advances in Neural Information Processing Systems, vol. 31, 2018.
[36] W. N. Price and I. G. Cohen, “Privacy in the age of medical big data,” Nature Medicine, vol. 25, no. 1, pp. 37–43, 2019.
[37] J. Yoon, M. Mizrahi, N. F. Ghalaty, T. Jarvinen, A. S. Ravi, P. Brune et al., “Ehr-safe: generating high-fidelity and privacy-preserving synthetic electronic health records,” NPJ Digital Medicine, vol. 6, no. 1, p. 141, 2023.
[38] A. Mohanarathinam, S. Kamalraj, G. K. D. Prasanna Venkatesan, R. V. Ravi, and C. S. Manikandababu, “Digital watermarking techniques for image security: a review,” Journal of Ambient Intelligence and Humanized Computing, vol. 11, pp. 3221–3229, 2020.
[39] X. Wei, B. Pu, S. Zhao, C. Chi, and H. Fu, “Preventing unauthorized ai over-analysis by medical image adversarial watermarking,” 2023.
[40] M. Alfarra, J. C. Pérez, A. Thabet, A. Bibi, P. H. Torr, and B. Ghanem, “Combating adversaries with anti-adversaries,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 6, June 2022, pp. 5992–6000.
[41] Y. Yang et al., “A digital mask to safeguard patient privacy,” Nature Medicine, vol. 28, pp. 1883–1892, 2022.
[42] X. Guo, M. A. Khalid, I. Domingos, A. L. Michala, M. Adriko, C. Rowel et al., “Smartphone-based dna diagnostics for malaria detection using deep learning for local decision support and blockchain technology for security,” Nature Electronics, vol. 4, no. 8, pp. 615–624, 2021.
[43] J. C.-S. Cheung, “Vaccination: keep records secure with blockchain,” Nature, vol. 590, pp. 389–390, 2021.
[44] Q. Wang, F. Liu, Y. Zhang, J. Zhang, C. Gong, T. Liu, and B. Han, “Watermarking for out-of-distribution detection,” in Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 15 545–15 557.
[45] H. Zhang, Y. Yu, J. Jiao, E. Xing, L. El Ghaoui, and M. Jordan, “Theoretically principled trade-off between robustness and accuracy,” in International Conference on Machine Learning. PMLR, May 2019, pp. 7472–7482.
[46] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” arXiv preprint arXiv:1706.06083, 2017.
[47] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013.
[48] M. Ienca, “Don’t pause giant ai for the wrong reasons,” Nat. Mach. Intell., pp. 1–2, 2023.
[49] B. Smith, “Stop talking about tomorrow’s ai doomsday when ai poses risks today,” Nature, vol. 618, pp. 885–886, 2023.

Supplementary Materials
DataCook: Crafting Anti-Adversarial Examples for Healthcare Data Copyright Protection

Sihan ShangJiancheng YangZhenglong Sun Pascal Fua

Table A1: Protection Results(ACC) on 2D data. We evaluate CP and PP on the 2D datasets comparing various models. The reported values represent the average percentage change in ACC of CP and PP across all 2D datasets. Target (T)delineates the adversarial attack aim: ”Oracle” represents the ideal conditions, ”Probability (Max P)” indicates attacks steered by the maximum output probability of the model, and ”Pseudo Labels” refers to using the initial prediction’s highest probability to guide the attack direction. Loss (L) involves two approaches: using Logistic Probability (Logit) directly as the loss function and applying the logarithm of Logistic Probability (

\log P

) as the loss criterion. Adv/Anti-adv indicates whether the strategy aims to minimize or maximize the loss function. Optimizer (O) specifies the choice of optimizer for image updates, enhancing the clarity and logical flow of the evaluation methodology.

Method				ResNet-18		ResNet-50		VGG-16		ConvNext-t
	L	T	O	CP $\downarrow$	PP $\uparrow$	CP $\downarrow$	PP $\uparrow$	CP $\downarrow$	PP $\uparrow$	CP $\downarrow$	PP $\uparrow$
Random Noise				-2.65	-1.61	-4.43	-2.01	-3.43	-1.92	-1.05	-2.08
Adv	Logit	Oracle	Adam	-35.17	-17.66	-16.52	-7.44	-42.58	-12.79	-39.59	-32.21
	$\log P$	Oracle	Adam	-21.31	-14.39	-16.01	-7.85	-31.51	-14.34	-29.21	-27.97
	Logit	Max $P$	Adam	-6.19	-3.74	-6.63	-4.53	-16.98	-3.11	-7.06	-6.25
	$\log P$	Pseudo	Adam	-18.82	-3.04	-12.44	-5.03	-17.01	-3.49	-14.52	-1.29
	Logit	Pseudo	SGD-M	-17.56	-2.41	-8.81	-6.03	-36.37	-2.95	-26.63	-5.19
	Logit	Pseudo	Adam	-27.02	-1.58	-11.99	-4.51	-24.95	-2.41	-18.37	-3.54
Anti Adv	Logit	Oracle	Adam	-13.15	-22.14	-7.68	-16.45	-12.27	-18.33	-7.91	-32.4
	$\log P$	Oracle	Adam	-7.98	-22.19	-8.06	-16.41	-6.58	-18.71	-3.19	-32.39
	Logit	Max $P$	Adam	-18.36	-0.75	-8.03	-1.17	-15.99	-1.01	-13.65	-1.58
	$\log P$	Pseudo	Adam	-12.47	-0.78	-10.29	-0.72	-11.29	-0.53	-8.87	-0.96
	Logit	Pseudo	SGD-M	-15.71	-0.69	-7.41	-1.28	-20.32	-0.74	-15.51	-1.26
	Logit	Pseudo	Adam	-18.78	-0.67	-9.63	-0.91	-16.11	-0.56	-13.81	-1.59

Table A2: Protection Results(AUC) on 2D data. We evaluate CP and PP on the 2D datasets comparing various models. The reported values represent the average percentage change in AUC of CP and PP across all 2D datasets. The bold denoted the best performance on PP.Target (T)delineates the adversarial attack aim: ”Oracle” represents the ideal conditions, ”Probability (Max P)” indicates attacks steered by the maximum output probability of the model, and ”Pseudo Labels” refers to using the initial prediction’s highest probability to guide the attack direction. Loss (L) involves two approaches: using Logistic Probability (Logit) directly as the loss function and applying the logarithm of Logistic Probability (

\log P

Method				ResNet-18		ResNet-50		VGG-16		ConvNext-t
	L	T	O	CP $\downarrow$	PP $\uparrow$	CP $\downarrow$	PP $\uparrow$	CP $\downarrow$	PP $\uparrow$	CP $\downarrow$	PP $\uparrow$
Random Noise				0.01	-1.71	-0.44	-1.67	0.01	-1.38	0.21	-0.72
Adv	Logit	Oracle	Adam	-25.03	-6.88	-8.35	-3.69	-24.86	-6.27	-27.53	-13.51
	$\log P$	Oracle	Adam	-16.73	-6.52	-8.55	-3.83	-17.78	-5.86	-18.18	-11.82
	Logit	Max $P$	Adam	-4.16	-3.75	-2.96	-2.08	-3.96	-1.81	-1.04	-1.81
	$\log P$	Pseudo	Adam	-9.08	-3.52	-6.55	-2.57	-7.88	-2.39	-6.04	-1.26
	Logit	Pseudo	SGD-M	-8.71	-3.83	-2.81	-2.94	-14.08	-2.61	-9.58	-1.67
	Logit	Pseudo	Adam	-12.82	-4.29	-5.55	-2.41	-10.98	-2.21	-6.94	-1.11
Anti Adv	Logit	Oracle	Adam	-4.07	-9.29	-3.14	-7.91	-2.69	-7.66	-3.35	-13.51
	$\log P$	Oracle	Adam	-3.83	-9.11	-3.19	-7.45	-1.31	-7.07	-0.82	-13.51
	Logit	Max $P$	Adam	-7.23	-2.97	-3.35	-2.65	-3.28	-1.36	-4.61	-1.35
	$\log P$	Pseudo	Adam	-7.11	-1.61	-3.52	-2.85	-2.24	-1.29	-2.98	-0.72
	Logit	Pseudo	SGD-M	-5.69	-3.62	-1.02	-1.72	-3.54	-1.49	-5.78	-1.74
	Logit	Pseudo	Adam	-7.19	-2.96	-3.11	-2.99	-3.66	-1.62	-4.62	-1.35

Table A3: Protection Results on High-Resolution datasets. We evaluate CP and PP on the High-Resolution datasets , comparing various models. The reported values represent the average percentage change in CP ACC and PP ACC across all High-Resolution datasets.the results in the High-Resolution Dataset present a quantitative analysis that mirrors our previously discussed findings in 2D datasets.

		ResNet-18		ResNet-50		VGG-16		ConvNext-t
Method		CP $\downarrow$	PP $\uparrow$	CP $\downarrow$	PP $\uparrow$	CP $\downarrow$	PP $\uparrow$	CP $\downarrow$	PP $\uparrow$
Random Noisy		-0.82	-0.47	-0.7	-0.9	-0.8	-0.82	-0.7	-0.93
Adv	Oracle	-26.77	-14.05	-12.45	-4.35	-14.4	-4.52	-49.77	-22.97
Adv	Pseudo	-15.52	-0.4	-6.92	-0.72	-12.81	-0.55	-23.42	-3.5
AntiAdv	Oracle	-19.47	-15.8	-25.25	-16.42	-10.67	-16.72	-6.69	-13.1
AntiAdv	Pseudo	-17.75	-0.42	-9.45	-0.61	-13.2	-0.15	-15.37	-0.62

Table A4: Original Results(ACC) on 2D data. We show original result of 2D datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. The reported values represent the original value in ACC of our method across all 2D datasets. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions, ”Probability (Max P)” indicates attacks steered by the maximum output probability of the model, and ”Pseudo Labels” refers to using the initial prediction’s highest probability to guide the attack direction. Loss (L) involves two approaches: using Logistic Probability (Logit) directly as the loss function and applying the logarithm of Logistic Probability (

\log P

DataSet	Method					ResNet-18		ResNet-50		VGG-16		ConvNext-t
DermanMNIST	A	L	T	O	D	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected
	Random Noise				Benign	0.744	0.747	0.717	0.717	0.702	0.722	0.706	0.714
	Random Noise				Protected	0.728	0.743	0.707	0.712	0.697	0.724	0.702	0.711
	Adv	Logit	Pseudo	SGD-M	Benign	0.744	0.697	0.717	0.682	0.702	0.681	0.706	0.663
		Logit	Pseudo	SGD-M	Protected	0.109	0.713	0.635	0.683	0.146	0.697	0.036	0.705
		$\log P$	Oracle	Adam	Benign	0.744	0.689	0.717	0.694	0.702	0.241	0.706	0.271
		$\log P$	Oracle	Adam	Protected	0.031	0.858	0.292	0.71	0.147	0.886	0	1
		Logit	Oracle	Adam	Benign	0.744	0.701	0.717	0.688	0.702	0.143	0.706	0.089
		Logit	Oracle	Adam	Protected	0.058	0.841	0.447	0.697	0.189	0.905	0	1
		Logit	Max $P$	Adam	Benign	0.744	0.685	0.717	0.691	0.702	0.699	0.706	0.697
		Logit	Max $P$	Adam	Protected	0.303	0.695	0.565	0.701	0.245	0.707	0.151	0.714
		$\log P$	Pseudo	Adam	Benign	0.744	0.694	0.717	0.696	0.702	0.692	0.706	0.697
		$\log P$	Pseudo	Adam	Protected	0.142	0.711	0.371	0.701	0.247	0.706	0.155	0.702
		Logit	Pseudo	Adam	Benign	0.744	0.686	0.717	0.681	0.702	0.695	0.706	0.693
		Logit	Pseudo	Adam	Protected	0.161	0.715	0.505	0.703	0.259	0.715	0.052	0.704
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.744	0.687	0.717	0.694	0.702	0.475	0.706	0.668
		Logit	Pseudo	SGD-M	Protected	0.743	0.742	0.69	0.713	0.689	0.706	0.706	0.714
		$\log P$	Oracle	Adam	Benign	0.744	0.698	0.717	0.701	0.702	0.684	0.706	0.683
		$\log P$	Oracle	Adam	Protected	0.972	0.989	0.839	0.883	0.863	0.923	0.999	1
		Logit	Oracle	Adam	Benign	0.744	0.693	0.717	0.697	0.702	0.658	0.706	0.628
		Logit	Oracle	Adam	Protected	0.968	0.987	0.742	0.851	0.756	0.932	0.998	1
		Logit	Max $P$	Adam	Benign	0.744	0.689	0.717	0.687	0.702	0.654	0.706	0.709
		Logit	Max $P$	Adam	Protected	0.716	0.743	0.678	0.708	0.691	0.713	0.706	0.717
		$\log P$	Pseudo	Adam	Benign	0.744	0.684	0.717	0.692	0.702	0.661	0.706	0.705
		$\log P$	Pseudo	Adam	Protected	0.741	0.743	0.706	0.705	0.702	0.707	0.706	0.711
		Logit	Pseudo	Adam	Benign	0.744	0.686	0.717	0.694	0.702	0.663	0.706	0.709
		Logit	Pseudo	Adam	Protected	0.741	0.745	0.689	0.711	0.691	0.711	0.706	0.717
BreastMNIST	Random Noise				Benign	0.756	0.808	0.744	0.757	0.801	0.801	0.699	0.705
	Random Noise				Protected	0.756	0.821	0.776	0.751	0.801	0.801	0.731	0.737
	Adv	Logit	Pseudo	SGD-M	Benign	0.756	0.731	0.744	0.494	0.801	0.776	0.699	0.282
		Logit	Pseudo	SGD-M	Protected	0.218	0.769	0.628	0.591	0.737	0.801	0.269	0.269
		$\log P$	Oracle	Adam	Benign	0.756	0.718	0.744	0.442	0.801	0.718	0.699	0.436
		$\log P$	Oracle	Adam	Protected	0	0.955	0.038	0.532	0	0.911	0.013	1
		Logit	Oracle	Adam	Benign	0.756	0.705	0.744	0.583	0.801	0.744	0.699	0.532
		Logit	Oracle	Adam	Protected	0	0.923	0.045	0.596	0	0.841	0.212	1
		Logit	Max $P$	Adam	Benign	0.756	0.763	0.744	0.641	0.801	0.756	0.699	0.756
		Logit	Max $P$	Adam	Protected	0.276	0.635	0.314	0.654	0.429	0.801	0.269	0.333
		$\log P$	Pseudo	Adam	Benign	0.756	0.679	0.744	0.481	0.801	0.718	0.699	0.391
		$\log P$	Pseudo	Adam	Protected	0.212	0.724	0.231	0.686	0.186	0.756	0.199	0.647
		Logit	Pseudo	Adam	Benign	0.756	0.724	0.744	0.494	0.801	0.756	0.699	0.558
		Logit	Pseudo	Adam	Protected	0.212	0.763	0.244	0.679	0.186	0.782	0.269	0.397
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.756	0.365	0.744	0.577	0.801	0.519	0.699	0.699
		Logit	Pseudo	SGD-M	Protected	0.788	0.788	0.661	0.686	0.808	0.827	0.801	0.731
		$\log P$	Oracle	Adam	Benign	0.756	0.635	0.744	0.455	0.801	0.596	0.699	0.731
		$\log P$	Oracle	Adam	Protected	1	1	0.974	0.936	1	1	0.987	1
		Logit	Oracle	Adam	Benign	0.756	0.474	0.744	0.455	0.801	0.455	0.699	0.699
		Logit	Oracle	Adam	Protected	0.974	0.944	0.994	0.949	0.994	1	0.917	1
		Logit	Max $P$	Adam	Benign	0.756	0.295	0.744	0.545	0.801	0.462	0.699	0.538
		Logit	Max $P$	Adam	Protected	0.795	0.801	0.737	0.718	0.833	0.827	0.808	0.756
		$\log P$	Pseudo	Adam	Benign	0.756	0.571	0.744	0.455	0.801	0.423	0.699	0.442
		$\log P$	Pseudo	Adam	Protected	0.788	0.795	0.782	0.763	0.814	0.821	0.801	0.661
		Logit	Pseudo	Adam	Benign	0.756	0.288	0.744	0.449	0.801	0.429	0.699	0.538
		Logit	Pseudo	Adam	Protected	0.788	0.795	0.795	0.776	0.814	0.808	0.808	0.756
OctMNIST	Random Noise				Benign	0.655	0.558	0.753	0.694	0.784	0.695	0.639	0.618
	Random Noise				Protected	0.356	0.629	0.474	0.761	0.379	0.71	0.46	0.64
	Adv	Logit	Pseudo	SGD-M	Benign	0.655	0.275	0.753	0.685	0.784	0.458	0.639	0.307
		Logit	Pseudo	SGD-M	Protected	0.083	0.672	0.291	0.664	0.096	0.772	0.054	0.645
		$\log P$	Oracle	Adam	Benign	0.655	0.191	0.753	0.512	0.784	0.454	0.639	0.235
		$\log P$	Oracle	Adam	Protected	0	1	0.036	0.863	0.003	0.962	0	0.999
		Logit	Oracle	Adam	Benign	0.655	0.117	0.753	0.359	0.784	0.326	0.639	0.224
		Logit	Oracle	Adam	Protected	0	1	0.1	0.914	0.001	1	0	1
		Logit	Max $P$	Adam	Benign	0.655	0.672	0.753	0.636	0.784	0.554	0.639	0.633
		Logit	Max $P$	Adam	Protected	0.232	0.633	0.271	0.664	0.241	0.761	0.261	0.655
		$\log P$	Pseudo	Adam	Benign	0.655	0.237	0.753	0.542	0.784	0.46	0.639	0.355
		$\log P$	Pseudo	Adam	Protected	0.135	0.654	0.071	0.675	0.064	0.749	0.109	0.633
		Logit	Pseudo	Adam	Benign	0.655	0.491	0.753	0.523	0.784	0.32	0.639	0.353
		Logit	Pseudo	Adam	Protected	0.07	0.659	0.146	0.693	0.013	0.783	0.047	0.635
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.655	0.546	0.753	0.678	0.784	0.629	0.639	0.499
		Logit	Pseudo	SGD-M	Protected	0.655	0.664	0.304	0.731	0.635	0.786	0.639	0.638
		$\log P$	Oracle	Adam	Benign	0.655	0.701	0.753	0.694	0.784	0.781	0.639	0.644
		$\log P$	Oracle	Adam	Protected	1	1	0.78	0.96	0.987	1	1	1
		Logit	Oracle	Adam	Benign	0.655	0.607	0.753	0.698	0.784	0.807	0.639	0.653
		Logit	Oracle	Adam	Protected	1	1	0.744	0.988	0.999	1	1	1
		Logit	Max $P$	Adam	Benign	0.655	0.591	0.753	0.712	0.784	0.717	0.639	0.52
		Logit	Max $P$	Adam	Protected	0.655	0.658	0.401	0.726	0.716	0.786	0.639	0.642
		$\log P$	Pseudo	Adam	Benign	0.655	0.622	0.753	0.657	0.784	0.671	0.639	0.531
		$\log P$	Pseudo	Adam	Protected	0.655	0.67	0.652	0.745	0.772	0.783	0.639	0.644
		Logit	Pseudo	Adam	Benign	0.655	0.596	0.753	0.632	0.784	0.726	0.639	0.521
		Logit	Pseudo	Adam	Protected	0.655	0.659	0.612	0.749	0.783	0.783	0.639	0.642

Table A5: Original Results(ACC) on 2D data. We show original result of 2D datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. The reported values represent the original value in ACC of our method across all 2D datasets. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions, ”Probability (Max P)” indicates attacks steered by the maximum output probability of the model, and ”Pseudo Labels” refers to using the initial prediction’s highest probability to guide the attack direction. Loss (L) involves two approaches: using Logistic Probability (Logit) directly as the loss function and applying the logarithm of Logistic Probability (

\log P

DataSet	Method					ResNet-18		ResNet-50		VGG-16		ConvNext-t
PneumoniaMNIST	A	L	T	O	D	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected
	Random Noise				Benign	0.875	0.846	0.862	0.861	0.865	0.845	0.875	0.899
	Random Noise				Protected	0.885	0.862	0.857	0.856	0.867	0.861	0.853	0.873
	Adv	Logit	Pseudo	SGD-M	Benign	0.857	0.715	0.862	0.741	0.865	0.561	0.875	0.657
		Logit	Pseudo	SGD-M	Protected	0.357	0.833	0.575	0.835	0.197	0.883	0.162	0.854
		$\log P$	Oracle	Adam	Benign	0.857	0.646	0.862	0.631	0.865	0.441	0.875	0.761
		$\log P$	Oracle	Adam	Protected	0.013	0.991	0.053	0.883	0.042	1	0	0.998
		Logit	Oracle	Adam	Benign	0.857	0.641	0.862	0.67	0.865	0.444	0.875	0.704
		Logit	Oracle	Adam	Protected	0.054	0.974	0.037	0.854	0.064	0.998	0	0.998
		Logit	Max $P$	Adam	Benign	0.857	0.628	0.862	0.736	0.865	0.841	0.875	0.731
		Logit	Max $P$	Adam	Protected	0.462	0.83	0.478	0.851	0.428	0.885	0.375	0.745
		$\log P$	Pseudo	Adam	Benign	0.857	0.667	0.862	0.635	0.865	0.606	0.875	0.806
		$\log P$	Pseudo	Adam	Protected	0.173	0.838	0.191	0.824	0.196	0.888	0.162	0.843
		Logit	Pseudo	Adam	Benign	0.857	0.673	0.862	0.734	0.865	0.758	0.875	0.617
		Logit	Pseudo	Adam	Protected	0.215	0.83	0.197	0.843	0.218	0.877	0.162	0.851
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.857	0.715	0.862	0.702	0.865	0.646	0.875	0.822
		Logit	Pseudo	SGD-M	Protected	0.822	0.84	0.612	0.835	0.846	0.867	0.838	0.838
		$\log P$	Oracle	Adam	Benign	0.857	0.625	0.862	0.739	0.865	0.711	0.875	0.875
		$\log P$	Oracle	Adam	Protected	0.997	0.995	0.982	0.968	1	0.998	1	1
		Logit	Oracle	Adam	Benign	0.857	0.63	0.862	0.761	0.865	0.7	0.875	0.875
		Logit	Oracle	Adam	Protected	0.998	0.995	0.974	0.984	1	0.995	1	1
		Logit	Max $P$	Adam	Benign	0.857	0.701	0.862	0.724	0.865	0.755	0.875	0.817
		Logit	Max $P$	Adam	Protected	0.841	0.841	0.764	0.831	0.846	0.848	0.838	0.812
		$\log P$	Pseudo	Adam	Benign	0.857	0.639	0.862	0.744	0.865	0.832	0.875	0.859
		$\log P$	Pseudo	Adam	Protected	0.84	0.841	0.825	0.835	0.846	0.856	0.838	0.851
		Logit	Pseudo	Adam	Benign	0.857	0.697	0.862	0.747	0.865	0.755	0.875	0.817
		Logit	Pseudo	Adam	Protected	0.841	0.84	0.817	0.838	0.846	0.848	0.838	0.812
RetianMNIST	Random Noise				Benign	0.495	0.528	0.505	0.515	0.505	0.502	0.497	0.517
	Random Noise				Protected	0.467	0.528	0.501	0.535	0.502	0.507	0.507	0.551
	Adv	Logit	Pseudo	SGD-M	Benign	0.495	0.417	0.505	0.471	0.505	0.438	0.497	0.527
		Logit	Pseudo	SGD-M	Protected	0.152	0.487	0.401	0.481	0.312	0.443	0.113	0.521
		$\log P$	Oracle	Adam	Benign	0.495	0.328	0.505	0.492	0.505	0.168	0.497	0.265
		$\log P$	Oracle	Adam	Protected	0	0.811	0.072	0.463	0.107	0.733	0.028	1
		Logit	Oracle	Adam	Benign	0.495	0.312	0.505	0.472	0.505	0.233	0.497	0.102
		Logit	Oracle	Adam	Protected	0	0.797	0.142	0.49	0.233	0.627	0.058	1
		Logit	Max $P$	Adam	Benign	0.495	0.485	0.505	0.453	0.505	0.443	0.497	0.491
		Logit	Max $P$	Adam	Protected	0.241	0.501	0.372	0.501	0.315	0.482	0.228	0.547
		$\log P$	Pseudo	Adam	Benign	0.495	0.263	0.505	0.481	0.505	0.441	0.497	0.522
		$\log P$	Pseudo	Adam	Protected	0.133	0.475	0.198	0.487	0.247	0.502	0.163	0.505
		Logit	Pseudo	Adam	Benign	0.495	0.345	0.505	0.481	0.505	0.438	0.497	0.491
		Logit	Pseudo	Adam	Protected	0.142	0.482	0.28	0.502	0.307	0.465	0.172	0.502
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.495	0.355	0.505	0.448	0.505	0.305	0.497	0.537
		Logit	Pseudo	SGD-M	Protected	0.495	0.492	0.435	0.497	0.505	0.525	0.497	0.497
		$\log P$	Oracle	Adam	Benign	0.495	0.448	0.505	0.463	0.505	0.427	0.497	0.501
		$\log P$	Oracle	Adam	Protected	0.981	0.985	0.745	0.701	0.651	0.748	0.988	1
		Logit	Oracle	Adam	Benign	0.495	0.468	0.505	0.472	0.505	0.23	0.497	0.453
		Logit	Oracle	Adam	Protected	0.985	0.983	0.681	0.691	0.651	0.762	0.971	1
		Logit	Max $P$	Adam	Benign	0.495	0.351	0.505	0.461	0.505	0.333	0.497	0.491
		Logit	Max $P$	Adam	Protected	0.491	0.505	0.445	0.505	0.443	0.521	0.497	0.497
		$\log P$	Pseudo	Adam	Benign	0.495	0.422	0.505	0.445	0.505	0.323	0.497	0.485
		$\log P$	Pseudo	Adam	Protected	0.495	0.497	0.482	0.495	0.505	0.522	0.497	0.497
		Logit	Pseudo	Adam	Benign	0.495	0.378	0.505	0.492	0.505	0.318	0.497	0.491
		Logit	Pseudo	Adam	Protected	0.497	0.497	0.472	0.482	0.505	0.517	0.497	0.497
PathMNIST	Random Noise				Benign	0.875	0.869	0.868	0.759	0.864	0.887	0.72	0.682
	Random Noise				Protected	0.605	0.878	0.552	0.723	0.458	0.839	0.452	0.707
	Adv	Logit	Pseudo	SGD-M	Benign	0.875	0.612	0.868	0.831	0.864	0.284	0.721	0.311
		Logit	Pseudo	SGD-M	Protected	0.159	0.84	0.253	0.823	0.128	0.853	0.077	0.723
		$\log P$	Oracle	Adam	Benign	0.875	0.399	0.868	0.722	0.864	0.584	0.72	0.297
		$\log P$	Oracle	Adam	Protected	0.119	0.891	0.131	0.847	0.081	0.952	0.043	0.988
		Logit	Oracle	Adam	Benign	0.875	0.355	0.868	0.695	0.864	0.214	0.72	0.287
		Logit	Oracle	Adam	Protected	0.122	0.948	0.143	0.811	0.105	0.973	0.071	0.992
		Logit	Max $P$	Adam	Benign	0.875	0.745	0.868	0.819	0.864	0.354	0.721	0.529
		Logit	Max $P$	Adam	Protected	0.222	0.829	0.214	0.805	0.147	0.847	0.175	0.721
		$\log P$	Pseudo	Adam	Benign	0.875	0.426	0.868	0.756	0.864	0.607	0.721	0.404
		$\log P$	Pseudo	Adam	Protected	0.168	0.822	0.176	0.816	0.121	0.838	0.156	0.726
		Logit	Pseudo	Adam	Benign	0.875	0.368	0.868	0.726	0.864	0.284	0.721	0.375
		Logit	Pseudo	Adam	Protected	0.153	0.855	0.173	0.808	0.116	0.851	0.083	0.739
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.875	0.814	0.868	0.873	0.864	0.682	0.72	0.552
		Logit	Pseudo	SGD-M	Protected	0.711	0.881	0.436	0.871	0.751	0.852	0.72	0.734
		$\log P$	Oracle	Adam	Benign	0.875	0.775	0.868	0.781	0.864	0.841	0.721	0.592
		$\log P$	Oracle	Adam	Protected	0.954	0.984	0.914	0.979	0.904	0.981	0.992	0.999
		Logit	Oracle	Adam	Benign	0.875	0.757	0.868	0.771	0.864	0.804	0.721	0.542
		Logit	Oracle	Adam	Protected	0.939	0.985	0.895	0.984	0.933	0.986	0.989	0.998
		Logit	Max $P$	Adam	Benign	0.875	0.805	0.868	0.842	0.864	0.617	0.72	0.562
		Logit	Max $P$	Adam	Protected	0.552	0.875	0.531	0.869	0.742	0.845	0.721	0.736
		$\log P$	Pseudo	Adam	Benign	0.875	0.744	0.868	0.739	0.864	0.756	0.721	0.578
		$\log P$	Pseudo	Adam	Protected	0.856	0.882	0.816	0.868	0.808	0.858	0.721	0.734
		Logit	Pseudo	Adam	Benign	0.875	0.736	0.868	0.741	0.864	0.633	0.72	0.545
		Logit	Pseudo	Adam	Protected	0.841	0.882	0.793	0.868	0.828	0.856	0.721	0.738

Table A6: Original Results(ACC) on 2D data. We show original result of 2D datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. The reported values represent the original value in ACC of our method across all 2D datasets. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions, ”Probability (Max P)” indicates attacks steered by the maximum output probability of the model, and ”Pseudo Labels” refers to using the initial prediction’s highest probability to guide the attack direction. Loss (L) involves two approaches: using Logistic Probability (Logit) directly as the loss function and applying the logarithm of Logistic Probability (

\log P

DataSet	Method					ResNet-18		ResNet-50		VGG-16		ConvNext-t
BloodMNIST	A	L	T	O	D	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected
	Random Noise				Benign	0.937	0.891	0.921	0.812	0.905	0.836	0.845	0.818
	Random Noise				Protected	0.855	0.935	0.821	0.968	0.721	0.932	0.809	0.825
	Adv	Logit	Pseudo	SGD-M	Benign	0.937	0.575	0.921	0.661	0.905	0.198	0.845	0.471
		Logit	Pseudo	SGD-M	Protected	0.028	0.919	0.151	0.836	0.013	0.904	0.011	0.849
		$\log P$	Oracle	Adam	Benign	0.937	0.607	0.921	0.671	0.905	0.486	0.845	0.317
		$\log P$	Oracle	Adam	Protected	0	0.961	0.005	0.879	0.028	0.988	0	1
		Logit	Oracle	Adam	Benign	0.937	0.505	0.92	0.542	0.905	0.231	0.845	0.232
		Logit	Oracle	Adam	Protected	0.011	0.988	0.072	0.901	0.006	0.993	0	1
		Logit	Max $P$	Adam	Benign	0.937	0.768	0.921	0.764	0.905	0.291	0.845	0.605
		Logit	Max $P$	Adam	Protected	0.183	0.894	0.232	0.841	0.124	0.884	0.142	0.834
		$\log P$	Pseudo	Adam	Benign	0.937	0.629	0.921	0.671	0.905	0.717	0.845	0.602
		$\log P$	Pseudo	Adam	Protected	0.044	0.902	0.055	0.849	0.056	0.892	0.058	0.841
		Logit	Pseudo	Adam	Benign	0.937	0.499	0.921	0.614	0.905	0.459	0.845	0.506
		Logit	Pseudo	Adam	Protected	0.035	0.924	0.092	0.851	0.025	0.899	0.013	0.848
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.937	0.557	0.92	0.739	0.905	0.474	0.845	0.539
		Logit	Pseudo	SGD-M	Protected	0.937	0.937	0.747	0.915	0.547	0.91	0.845	0.849
		$\log P$	Oracle	Adam	Benign	0.937	0.734	0.921	0.774	0.905	0.791	0.845	0.836
		$\log P$	Oracle	Adam	Protected	1	1	1	0.999	0.996	1	1	1
		Logit	Oracle	Adam	Benign	0.937	0.786	0.92	0.818	0.905	0.729	0.845	0.838
		Logit	Oracle	Adam	Protected	1	1	0.999	0.999	0.567	0.999	1	1
		Logit	Max $P$	Adam	Benign	0.937	0.507	0.921	0.691	0.905	0.593	0.845	0.586
		Logit	Max $P$	Adam	Protected	0.937	0.937	0.877	0.897	0.521	0.918	0.845	0.844
		$\log P$	Pseudo	Adam	Benign	0.937	0.498	0.921	0.659	0.905	0.782	0.845	0.747
		$\log P$	Pseudo	Adam	Protected	0.937	0.937	0.921	0.921	0.902	0.904	0.845	0.847
		Logit	Pseudo	Adam	Benign	0.937	0.507	0.921	0.728	0.905	0.611	0.845	0.586
		Logit	Pseudo	Adam	Protected	0.937	0.937	0.919	0.919	0.541	0.907	0.845	0.844
OrganCMNIST	Random Noise				Benign	0.886	0.882	0.885	0.875	0.878	0.86	0.645	0.693
	Random Noise				Protected	0.883	0.89	0.874	0.881	0.86	0.874	0.649	0.697
	Adv	Logit	Pseudo	SGD-M	Benign	0.886	0.868	0.885	0.858	0.878	0.325	0.645	0.334
		Logit	Pseudo	SGD-M	Protected	0.085	0.872	0.291	0.84	0.121	0.813	0.061	0.656
		$\log P$	Oracle	Adam	Benign	0.886	0.878	0.885	0.856	0.878	0.686	0.645	0.285
		$\log P$	Oracle	Adam	Protected	0.016	0.858	0.021	0.815	0.17	0.815	0.007	0.995
		Logit	Oracle	Adam	Benign	0.886	0.409	0.885	0.851	0.878	0.585	0.645	0.221
		Logit	Oracle	Adam	Protected	0.014	0.975	0.079	0.802	0.179	0.871	0.012	0.997
		Logit	Max $P$	Adam	Benign	0.886	0.879	0.885	0.871	0.878	0.814	0.645	0.493
		Logit	Max $P$	Adam	Protected	0.144	0.858	0.33	0.856	0.224	0.801	0.114	0.714
		$\log P$	Pseudo	Adam	Benign	0.886	0.88	0.885	0.855	0.878	0.724	0.645	0.561
		$\log P$	Pseudo	Adam	Protected	0.071	0.871	0.082	0.831	0.229	0.794	0.127	0.663
		Logit	Pseudo	Adam	Benign	0.886	0.458	0.885	0.861	0.878	0.644	0.645	0.506
		Logit	Pseudo	Adam	Protected	0.048	0.867	0.114	0.833	0.222	0.822	0.097	0.659
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.886	0.828	0.885	0.845	0.878	0.754	0.645	0.323
		Logit	Pseudo	SGD-M	Protected	0.905	0.905	0.866	0.884	0.878	0.879	0.645	0.66
		$\log P$	Oracle	Adam	Benign	0.886	0.869	0.885	0.861	0.878	0.87	0.645	0.561
		$\log P$	Oracle	Adam	Protected	0.995	0.995	0.993	0.991	0.986	0.989	0.987	0.997
		Logit	Oracle	Adam	Benign	0.886	0.802	0.885	0.871	0.878	0.87	0.645	0.491
		Logit	Oracle	Adam	Protected	0.997	0.996	0.991	0.992	0.974	0.977	0.982	0.998
		Logit	Max $P$	Adam	Benign	0.886	0.753	0.885	0.839	0.878	0.791	0.645	0.508
		Logit	Max $P$	Adam	Protected	0.886	0.886	0.881	0.884	0.878	0.88	0.645	0.656
		$\log P$	Pseudo	Adam	Benign	0.886	0.845	0.885	0.859	0.878	0.836	0.645	0.571
		$\log P$	Pseudo	Adam	Protected	0.886	0.885	0.885	0.886	0.878	0.878	0.645	0.652
		Logit	Pseudo	Adam	Benign	0.886	0.753	0.885	0.846	0.878	0.791	0.645	0.508
		Logit	Pseudo	Adam	Protected	0.886	0.88	0.885	0.886	0.878	0.88	0.645	0.656
OrganAMNIST	Random Noise				Benign	0.867	0.869	0.927	0.922	0.868	0.851	0.763	0.768
	Random Noise				Protected	0.856	0.872	0.901	0.937	0.849	0.871	0.751	0.774
	Adv	Logit	Pseudo	SGD-M	Benign	0.867	0.471	0.927	0.918	0.868	0.326	0.763	0.378
		Logit	Pseudo	SGD-M	Protected	0.052	0.849	0.231	0.883	0.151	0.786	0.043	0.762
		$\log P$	Oracle	Adam	Benign	0.867	0.827	0.927	0.913	0.868	0.764	0.763	0.592
		$\log P$	Oracle	Adam	Protected	0.008	0.803	0.019	0.861	0.248	0.825	0.005	0.969
		Logit	Oracle	Adam	Benign	0.867	0.49	0.927	0.915	0.868	0.593	0.763	0.447
		Logit	Oracle	Adam	Protected	0.004	0.979	0.078	0.858	0.195	0.855	0.008	0.988
		Logit	Max $P$	Adam	Benign	0.867	0.881	0.927	0.939	0.868	0.776	0.763	0.792
		Logit	Max $P$	Adam	Protected	0.141	0.862	0.317	0.898	0.239	0.798	0.098	0.786
		$\log P$	Pseudo	Adam	Benign	0.867	0.787	0.927	0.911	0.868	0.761	0.763	0.641
		$\log P$	Pseudo	Adam	Protected	0.075	0.836	0.063	0.876	0.319	0.792	0.114	0.767
		Logit	Pseudo	Adam	Benign	0.867	0.509	0.927	0.922	0.868	0.606	0.763	0.476
		Logit	Pseudo	Adam	Protected	0.05	0.862	0.111	0.879	0.251	0.804	0.041	0.763
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.867	0.712	0.927	0.919	0.868	0.757	0.763	0.399
		Logit	Pseudo	SGD-M	Protected	0.867	0.867	0.916	0.927	0.868	0.867	0.763	0.762
		$\log P$	Oracle	Adam	Benign	0.867	0.862	0.927	0.921	0.868	0.857	0.763	0.761
		$\log P$	Oracle	Adam	Protected	0.999	0.999	0.999	0.998	0.991	0.989	0.998	0.998
		Logit	Oracle	Adam	Benign	0.867	0.809	0.927	0.919	0.868	0.855	0.763	0.764
		Logit	Oracle	Adam	Protected	0.999	0.998	1	0.998	0.978	0.979	0.998	0.998
		Logit	Max $P$	Adam	Benign	0.867	0.719	0.927	0.912	0.868	0.802	0.763	0.482
		Logit	Max $P$	Adam	Protected	0.867	0.867	0.927	0.926	0.868	0.867	0.763	0.762
		$\log P$	Pseudo	Adam	Benign	0.867	0.827	0.927	0.912	0.868	0.842	0.763	0.681
		$\log P$	Pseudo	Adam	Protected	0.867	0.867	0.928	0.927	0.868	0.868	0.763	0.761
		Logit	Pseudo	Adam	Benign	0.867	0.719	0.927	0.912	0.868	0.802	0.763	0.482
		Logit	Pseudo	Adam	Protected	0.867	0.867	0.928	0.927	0.868	0.867	0.763	0.762

Table A7: Original Results(ACC) on 2D data. We show original result of 2D datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. The reported values represent the original value in ACC of our method across all 2D datasets. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions, ”Probability (Max P)” indicates attacks steered by the maximum output probability of the model, and ”Pseudo Labels” refers to using the initial prediction’s highest probability to guide the attack direction. Loss (L) involves two approaches: using Logistic Probability (Logit) directly as the loss function and applying the logarithm of Logistic Probability (

\log P

DataSet	Method					ResNet-18		ResNet-50		VGG-16		ConvNext-t
OrganSMNIST	A	L	T	O	D	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected
	Random Noise				Benign	0.723	0.701	0.751	0.737	0.737	0.744	0.481	0.456
	Random Noise				Protected	0.711	0.715	0.771	0.747	0.748	0.769	0.469	0.466
	Adv	Logit	Pseudo	SGD-M	Benign	0.723	0.675	0.751	0.713	0.737	0.225	0.481	0.278
		Logit	Pseudo	SGD-M	Protected	0.09	0.677	0.213	0.693	0.104	0.698	0.053	0.499
		$\log P$	Oracle	Adam	Benign	0.723	0.639	0.751	0.657	0.737	0.441	0.481	0.199
		$\log P$	Oracle	Adam	Protected	0.01	0.723	0.022	0.661	0.064	0.866	0.008	0.997
		Logit	Oracle	Adam	Benign	0.723	0.261	0.751	0.732	0.737	0.301	0.481	0.163
		Logit	Oracle	Adam	Protected	0.014	0.977	0.065	0.685	0.096	0.872	0.013	0.998
		Logit	Max $P$	Adam	Benign	0.723	0.733	0.751	0.75	0.737	0.681	0.481	0.406
		Logit	Max $P$	Adam	Protected	0.139	0.709	0.28	0.726	0.158	0.689	0.101	0.493
		$\log P$	Pseudo	Adam	Benign	0.723	0.664	0.751	0.697	0.737	0.543	0.481	0.413
		$\log P$	Pseudo	Adam	Protected	0.149	0.683	0.144	0.687	0.192	0.681	0.161	0.485
		Logit	Pseudo	Adam	Benign	0.723	0.302	0.751	0.726	0.737	0.469	0.481	0.373
		Logit	Pseudo	Adam	Protected	0.094	0.709	0.122	0.702	0.154	0.705	0.101	0.494
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.723	0.641	0.751	0.716	0.737	0.636	0.481	0.28
		Logit	Pseudo	SGD-M	Protected	0.76	0.723	0.736	0.748	0.727	0.736	0.481	0.494
		$\log P$	Oracle	Adam	Benign	0.723	0.681	0.751	0.721	0.737	0.714	0.481	0.401
		$\log P$	Oracle	Adam	Protected	0.998	0.998	0.998	0.992	0.985	0.987	0.989	0.998
		Logit	Oracle	Adam	Benign	0.723	0.556	0.751	0.731	0.737	0.672	0.481	0.274
		Logit	Oracle	Adam	Protected	0.994	0.994	0.987	0.982	0.912	0.945	0.983	0.999
		Logit	Max $P$	Adam	Benign	0.723	0.479	0.751	0.711	0.737	0.631	0.481	0.347
		Logit	Max $P$	Adam	Protected	0.722	0.722	0.749	0.747	0.732	0.731	0.481	0.486
		$\log P$	Pseudo	Adam	Benign	0.723	0.647	0.751	0.701	0.737	0.673	0.481	0.402
		$\log P$	Pseudo	Adam	Protected	0.722	0.721	0.751	0.748	0.737	0.737	0.481	0.481
		Logit	Pseudo	Adam	Benign	0.723	0.481	0.751	0.701	0.737	0.621	0.481	0.347
		Logit	Pseudo	Adam	Protected	0.722	0.722	0.751	0.746	0.732	0.735	0.481	0.486
TissueMNIST	Random Noise				Benign	0.649	0.468	0.645	0.475	0.645	0.481	0.561	0.445
	Random Noise				Protected	0.436	0.631	0.498	0.641	0.316	0.601	0.441	0.538
	Adv	Logit	Pseudo	SGD-M	Benign	0.649	0.373	0.645	0.572	0.645	0.328	0.56	0.425
		Logit	Pseudo	SGD-M	Protected	0.134	0.627	0.337	0.583	0.052	0.638	0.026	0.562
		$\log P$	Oracle	Adam	Benign	0.649	0.181	0.645	0.227	0.645	0.108	0.561	0.181
		$\log P$	Oracle	Adam	Protected	0.016	0.978	0.027	0.829	0.027	0.983	0	1
		Logit	Oracle	Adam	Benign	0.649	0.08	0.645	0.253	0.645	0.057	0.561	0.075
		Logit	Oracle	Adam	Protected	0.037	0.986	0.048	0.819	0.048	0.986	0	1
		Logit	Max $P$	Adam	Benign	0.649	0.524	0.645	0.54	0.645	0.48	0.561	0.521
		Logit	Max $P$	Adam	Protected	0.238	0.597	0.265	0.583	0.181	0.607	0.123	0.561
		$\log P$	Pseudo	Adam	Benign	0.649	0.46	0.645	0.484	0.645	0.416	0.561	0.442
		$\log P$	Pseudo	Adam	Protected	0.149	0.62	0.153	0.592	0.138	0.627	0.133	0.563
		Logit	Pseudo	Adam	Benign	0.649	0.417	0.645	0.498	0.645	0.381	0.56	0.462
		Logit	Pseudo	Adam	Protected	0.13	0.626	0.141	0.586	0.095	0.637	0.026	0.563
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.649	0.553	0.645	0.617	0.645	0.431	0.561	0.367
		Logit	Pseudo	SGD-M	Protected	0.641	0.647	0.524	0.642	0.633	0.644	0.561	0.565
		$\log P$	Oracle	Adam	Benign	0.649	0.535	0.645	0.582	0.645	0.561	0.561	0.497
		$\log P$	Oracle	Adam	Protected	0.956	0.996	0.912	0.976	0.946	0.996	1	1
		Logit	Oracle	Adam	Benign	0.649	0.416	0.645	0.541	0.645	0.424	0.561	0.345
		Logit	Oracle	Adam	Protected	0.941	0.998	0.808	0.969	0.893	0.996	0.994	1
		Logit	Max $P$	Adam	Benign	0.649	0.537	0.645	0.572	0.645	0.44	0.56	0.369
		Logit	Max $P$	Adam	Protected	0.62	0.643	0.542	0.639	0.601	0.645	0.56	0.565
		$\log P$	Pseudo	Adam	Benign	0.649	0.574	0.645	0.582	0.645	0.515	0.56	0.453
		$\log P$	Pseudo	Adam	Protected	0.642	0.646	0.626	0.644	0.636	0.645	0.56	0.566
		Logit	Pseudo	Adam	Benign	0.649	0.539	0.645	0.576	0.645	0.434	0.56	0.369
		Logit	Pseudo	Adam	Protected	0.639	0.646	0.587	0.642	0.617	0.646	0.561	0.564

Table A8: Original Results(AUC) on 2D data. We show original result of 2D datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. The reported values represent the original value in AUC of our method across all 2D datasets. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions, ”Probability (Max P)” indicates attacks steered by the maximum output probability of the model, and ”Pseudo Labels” refers to using the initial prediction’s highest probability to guide the attack direction. Loss (L) involves two approaches: using Logistic Probability (Logit) directly as the loss function and applying the logarithm of Logistic Probability (

\log P

DataSet	Method					ResNet-18		ResNet-50		VGG-16		ConvNext-t
DermanMNIST	A	L	T	O	D	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected
	Random Noise				Benign	0.851	0.854	0.851	0.851	0.791	0.809	0.821	0.821
	Random Noise				Protected	0.831	0.851	0.831	0.861	0.792	0.811	0.819	0.821
	Adv	Logit	Pseudo	SGD-M	Benign	0.851	0.825	0.851	0.807	0.791	0.763	0.821	0.767
		Logit	Pseudo	SGD-M	Protected	0.411	0.818	0.565	0.818	0.595	0.781	0.386	0.828
		$\log P$	Oracle	Adam	Benign	0.851	0.775	0.851	0.806	0.791	0.671	0.821	0.611
		$\log P$	Oracle	Adam	Protected	0.138	0.892	0.331	0.817	0.449	0.855	0.029	1
		Logit	Oracle	Adam	Benign	0.851	0.772	0.851	0.793	0.791	0.529	0.821	0.473
		Logit	Oracle	Adam	Protected	0.072	0.882	0.202	0.802	0.369	0.885	0.005	1
		Logit	Max $P$	Adam	Benign	0.851	0.811	0.851	0.824	0.791	0.786	0.821	0.803
		Logit	Max $P$	Adam	Protected	0.591	0.819	0.608	0.836	0.713	0.791	0.455	0.815
		$\log P$	Pseudo	Adam	Benign	0.851	0.811	0.851	0.823	0.791	0.778	0.821	0.807
		$\log P$	Pseudo	Adam	Protected	0.463	0.832	0.508	0.833	0.645	0.785	0.529	0.83
		Logit	Pseudo	Adam	Benign	0.851	0.798	0.851	0.796	0.791	0.781	0.821	0.81
		Logit	Pseudo	Adam	Protected	0.427	0.825	0.516	0.831	0.672	0.788	0.444	0.835
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.851	0.781	0.851	0.829	0.791	0.759	0.821	0.789
		Logit	Pseudo	SGD-M	Protected	0.781	0.825	0.691	0.845	0.784	0.773	0.718	0.822
		$\log P$	Oracle	Adam	Benign	0.851	0.779	0.851	0.779	0.791	0.792	0.821	0.805
		$\log P$	Oracle	Adam	Protected	0.991	0.995	0.921	0.941	0.871	0.876	0.999	1
		Logit	Oracle	Adam	Benign	0.851	0.761	0.851	0.779	0.791	0.764	0.821	0.765
		Logit	Oracle	Adam	Protected	0.985	0.994	0.896	0.933	0.882	0.875	0.999	1
		Logit	Max $P$	Adam	Benign	0.851	0.781	0.851	0.819	0.791	0.787	0.821	0.804
		Logit	Max $P$	Adam	Protected	0.774	0.836	0.662	0.841	0.778	0.788	0.795	0.82
		$\log P$	Pseudo	Adam	Benign	0.851	0.778	0.851	0.825	0.791	0.776	0.821	0.797
		$\log P$	Pseudo	Adam	Protected	0.791	0.829	0.738	0.844	0.785	0.789	0.735	0.818
		Logit	Pseudo	Adam	Benign	0.851	0.781	0.851	0.831	0.791	0.787	0.821	0.804
		Logit	Pseudo	Adam	Protected	0.785	0.836	0.708	0.848	0.781	0.791	0.795	0.821
BreastMNIST	Random Noise				Benign	0.827	0.925	0.773	0.878	0.833	0.833	0.769	0.778
	Random Noise				Protected	0.803	0.938	0.819	0.897	0.831	0.835	0.772	0.777
	Adv	Logit	Pseudo	SGD-M	Benign	0.827	0.709	0.773	0.716	0.833	0.753	0.769	0.691
		Logit	Pseudo	SGD-M	Protected	0.334	0.699	0.624	0.661	0.501	0.798	0.331	0.724
		$\log P$	Oracle	Adam	Benign	0.827	0.573	0.773	0.752	0.833	0.548	0.769	0.441
		$\log P$	Oracle	Adam	Protected	0.001	0.988	0.028	0.561	0.001	0.981	0.001	1
		Logit	Oracle	Adam	Benign	0.827	0.579	0.773	0.775	0.833	0.623	0.769	0.445
		Logit	Oracle	Adam	Protected	0.001	0.978	0.017	0.617	0.001	0.917	0.011	1
		Logit	Max $P$	Adam	Benign	0.827	0.749	0.773	0.688	0.833	0.796	0.769	0.778
		Logit	Max $P$	Adam	Protected	0.431	0.601	0.567	0.714	0.524	0.841	0.553	0.717
		$\log P$	Pseudo	Adam	Benign	0.827	0.577	0.773	0.698	0.833	0.588	0.769	0.731
		$\log P$	Pseudo	Adam	Protected	0.292	0.712	0.363	0.691	0.268	0.775	0.364	0.767
		Logit	Pseudo	Adam	Benign	0.827	0.647	0.773	0.684	0.833	0.621	0.769	0.755
		Logit	Pseudo	Adam	Protected	0.329	0.712	0.336	0.706	0.301	0.775	0.344	0.765
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.827	0.671	0.773	0.807	0.833	0.746	0.769	0.689
		Logit	Pseudo	SGD-M	Protected	0.738	0.741	0.613	0.761	0.795	0.837	0.739	0.718
		$\log P$	Oracle	Adam	Benign	0.827	0.781	0.773	0.719	0.833	0.813	0.769	0.766
		$\log P$	Oracle	Adam	Protected	1	1	0.999	0.982	1	1	1	1
		Logit	Oracle	Adam	Benign	0.827	0.799	0.773	0.684	0.833	0.754	0.769	0.775
		Logit	Oracle	Adam	Protected	1	1	0.999	0.989	1	1	0.988	1
		Logit	Max $P$	Adam	Benign	0.827	0.554	0.773	0.721	0.833	0.751	0.769	0.705
		Logit	Max $P$	Adam	Protected	0.717	0.762	0.611	0.715	0.782	0.801	0.758	0.739
		$\log P$	Pseudo	Adam	Benign	0.827	0.562	0.773	0.711	0.833	0.766	0.769	0.705
		$\log P$	Pseudo	Adam	Protected	0.816	0.811	0.741	0.694	0.804	0.814	0.755	0.766
		Logit	Pseudo	Adam	Benign	0.827	0.559	0.773	0.721	0.833	0.696	0.769	0.705
		Logit	Pseudo	Adam	Protected	0.716	0.765	0.728	0.711	0.781	0.776	0.758	0.739
OctMNIST	Random Noise				Benign	0.899	0.861	0.943	0.927	0.954	0.919	0.876	0.877
	Random Noise				Protected	0.752	0.881	0.847	0.941	0.759	0.931	0.785	0.874
	Adv	Logit	Pseudo	SGD-M	Benign	0.899	0.607	0.943	0.927	0.954	0.849	0.876	0.597
		Logit	Pseudo	SGD-M	Protected	0.278	0.876	0.523	0.893	0.203	0.915	0.221	0.856
		$\log P$	Oracle	Adam	Benign	0.899	0.343	0.943	0.819	0.954	0.815	0.876	0.471
		$\log P$	Oracle	Adam	Protected	0.051	1	0.119	0.975	0.147	0.983	0.006	1
		Logit	Oracle	Adam	Benign	0.899	0.258	0.943	0.731	0.954	0.622	0.876	0.478
		Logit	Oracle	Adam	Protected	0	1	0.029	0.991	0.012	0.998	0	1
		Logit	Max $P$	Adam	Benign	0.899	0.893	0.943	0.911	0.954	0.841	0.876	0.862
		Logit	Max $P$	Adam	Protected	0.499	0.876	0.527	0.895	0.503	0.919	0.518	0.872
		$\log P$	Pseudo	Adam	Benign	0.899	0.661	0.943	0.854	0.954	0.847	0.876	0.621
		$\log P$	Pseudo	Adam	Protected	0.334	0.887	0.225	0.892	0.251	0.911	0.263	0.858
		Logit	Pseudo	Adam	Benign	0.899	0.721	0.943	0.819	0.954	0.784	0.876	0.619
		Logit	Pseudo	Adam	Protected	0.263	0.879	0.179	0.886	0.136	0.921	0.198	0.852
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.899	0.849	0.943	0.931	0.954	0.891	0.876	0.763
		Logit	Pseudo	SGD-M	Protected	0.832	0.887	0.678	0.901	0.893	0.933	0.801	0.862
		$\log P$	Oracle	Adam	Benign	0.899	0.901	0.943	0.928	0.954	0.933	0.876	0.875
		$\log P$	Oracle	Adam	Protected	1	1	0.991	0.996	0.994	0.999	1	1
		Logit	Oracle	Adam	Benign	0.899	0.837	0.943	0.921	0.954	0.944	0.876	0.871
		Logit	Oracle	Adam	Protected	1	1	0.999	1	0.999	0.999	1	1
		Logit	Max $P$	Adam	Benign	0.899	0.859	0.943	0.911	0.954	0.911	0.876	0.765
		Logit	Max $P$	Adam	Protected	0.826	0.883	0.718	0.891	0.871	0.935	0.797	0.847
		$\log P$	Pseudo	Adam	Benign	0.899	0.862	0.943	0.909	0.954	0.931	0.876	0.808
		$\log P$	Pseudo	Adam	Protected	0.807	0.898	0.876	0.897	0.898	0.929	0.772	0.876
		Logit	Pseudo	Adam	Benign	0.899	0.861	0.943	0.897	0.954	0.926	0.876	0.765
		Logit	Pseudo	Adam	Protected	0.826	0.883	0.866	0.892	0.898	0.933	0.797	0.847

Table A9: Original Results(AUC) on 2D data. We show original result of 2D datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. The reported values represent the original value in AUC of our method across all 2D datasets. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions, ”Probability (Max P)” indicates attacks steered by the maximum output probability of the model, and ”Pseudo Labels” refers to using the initial prediction’s highest probability to guide the attack direction. Loss (L) involves two approaches: using Logistic Probability (Logit) directly as the loss function and applying the logarithm of Logistic Probability (

\log P

DataSet	Method					ResNet-18		ResNet-50		VGG-16		ConvNext-t
PneumoniaMNIST	A	L	T	O	D	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected
	Random Noise				Benign	0.955	0.921	0.947	0.918	0.952	0.963	0.951	0.956
	Random Noise				Protected	0.946	0.924	0.939	0.931	0.939	0.959	0.932	0.946
	Adv	Logit	Pseudo	SGD-M	Benign	0.955	0.851	0.947	0.869	0.952	0.807	0.951	0.891
		Logit	Pseudo	SGD-M	Protected	0.321	0.826	0.656	0.911	0.314	0.923	0.308	0.905
		$\log P$	Oracle	Adam	Benign	0.955	0.805	0.947	0.538	0.952	0.571	0.951	0.843
		$\log P$	Oracle	Adam	Protected	0	0.999	0.006	0.955	0.008	1	0	1
		Logit	Oracle	Adam	Benign	0.955	0.799	0.947	0.717	0.952	0.584	0.951	0.843
		Logit	Oracle	Adam	Protected	0	0.999	0.003	0.915	0.018	1	0	1
		Logit	Max $P$	Adam	Benign	0.955	0.751	0.947	0.829	0.952	0.925	0.951	0.945
		Logit	Max $P$	Adam	Protected	0.739	0.911	0.443	0.917	0.381	0.949	0.723	0.943
		$\log P$	Pseudo	Adam	Benign	0.955	0.819	0.947	0.587	0.952	0.812	0.951	0.891
		$\log P$	Pseudo	Adam	Protected	0.302	0.825	0.348	0.914	0.309	0.951	0.327	0.903
		Logit	Pseudo	Adam	Benign	0.955	0.817	0.947	0.735	0.952	0.839	0.951	0.888
		Logit	Pseudo	Adam	Protected	0.359	0.824	0.361	0.916	0.323	0.925	0.329	0.911
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.955	0.847	0.947	0.881	0.952	0.949	0.951	0.941
		Logit	Pseudo	SGD-M	Protected	0.716	0.818	0.816	0.881	0.898	0.951	0.861	0.944
		$\log P$	Oracle	Adam	Benign	0.955	0.833	0.947	0.836	0.952	0.921	0.951	0.951
		$\log P$	Oracle	Adam	Protected	1	1	1	0.999	1	1	1	1
		Logit	Oracle	Adam	Benign	0.955	0.867	0.947	0.853	0.952	0.908	0.951	0.951
		Logit	Oracle	Adam	Protected	1	1	1	0.999	1	1	1	1
		Logit	Max $P$	Adam	Benign	0.955	0.838	0.947	0.839	0.952	0.959	0.951	0.932
		Logit	Max $P$	Adam	Protected	0.687	0.836	0.741	0.864	0.887	0.949	0.859	0.937
		$\log P$	Pseudo	Adam	Benign	0.955	0.802	0.947	0.833	0.952	0.948	0.951	0.936
		$\log P$	Pseudo	Adam	Protected	0.907	0.913	0.861	0.857	0.937	0.951	0.961	0.939
		Logit	Pseudo	Adam	Benign	0.955	0.845	0.947	0.832	0.952	0.959	0.951	0.932
		Logit	Pseudo	Adam	Protected	0.687	0.834	0.692	0.838	0.887	0.949	0.859	0.937
RetianMNIST	Random Noise				Benign	0.665	0.679	0.665	0.688	0.671	0.664	0.723	0.725
	Random Noise				Protected	0.654	0.675	0.645	0.671	0.674	0.665	0.721	0.722
	Adv	Logit	Pseudo	SGD-M	Benign	0.665	0.611	0.665	0.646	0.671	0.673	0.723	0.711
		Logit	Pseudo	SGD-M	Protected	0.424	0.661	0.479	0.658	0.537	0.651	0.432	0.708
		$\log P$	Oracle	Adam	Benign	0.665	0.543	0.665	0.643	0.671	0.345	0.723	0.548
		$\log P$	Oracle	Adam	Protected	0.021	0.884	0.135	0.642	0.297	0.731	0.031	1
		Logit	Oracle	Adam	Benign	0.665	0.568	0.665	0.646	0.671	0.492	0.723	0.375
		Logit	Oracle	Adam	Protected	0.015	0.888	0.087	0.646	0.182	0.833	0.019	1
		Logit	Max $P$	Adam	Benign	0.665	0.641	0.665	0.656	0.671	0.663	0.723	0.718
		Logit	Max $P$	Adam	Protected	0.531	0.648	0.538	0.668	0.544	0.642	0.486	0.673
		$\log P$	Pseudo	Adam	Benign	0.665	0.582	0.665	0.627	0.671	0.561	0.723	0.703
		$\log P$	Pseudo	Adam	Protected	0.406	0.645	0.398	0.679	0.371	0.662	0.476	0.712
		Logit	Pseudo	Adam	Benign	0.665	0.609	0.665	0.691	0.671	0.661	0.723	0.711
		Logit	Pseudo	Adam	Protected	0.433	0.633	0.432	0.677	0.516	0.644	0.491	0.715
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.665	0.561	0.665	0.661	0.671	0.611	0.723	0.688
		Logit	Pseudo	SGD-M	Protected	0.625	0.631	0.612	0.667	0.677	0.695	0.701	0.68
		$\log P$	Oracle	Adam	Benign	0.665	0.562	0.665	0.627	0.671	0.637	0.723	0.725
		$\log P$	Oracle	Adam	Protected	0.999	0.993	0.951	0.876	0.803	0.792	0.999	1
		Logit	Oracle	Adam	Benign	0.665	0.625	0.665	0.664	0.671	0.651	0.723	0.712
		Logit	Oracle	Adam	Protected	0.993	0.995	0.955	0.924	0.861	0.861	0.999	1
		Logit	Max $P$	Adam	Benign	0.665	0.533	0.665	0.629	0.671	0.596	0.723	0.695
		Logit	Max $P$	Adam	Protected	0.681	0.687	0.611	0.684	0.632	0.657	0.715	0.694
		$\log P$	Pseudo	Adam	Benign	0.665	0.548	0.665	0.616	0.671	0.603	0.723	0.694
		$\log P$	Pseudo	Adam	Protected	0.631	0.663	0.659	0.668	0.673	0.653	0.712	0.697
		Logit	Pseudo	Adam	Benign	0.665	0.538	0.665	0.652	0.671	0.609	0.723	0.695
		Logit	Pseudo	Adam	Protected	0.684	0.687	0.646	0.627	0.661	0.657	0.715	0.694
PathMNIST	Random Noise				Benign	0.976	0.981	0.971	0.951	0.969	0.968	0.941	0.944
	Random Noise				Protected	0.891	0.981	0.891	0.967	0.911	0.962	0.891	0.942
	Adv	Logit	Pseudo	SGD-M	Benign	0.976	0.942	0.971	0.959	0.969	0.784	0.941	0.859
		Logit	Pseudo	SGD-M	Protected	0.351	0.963	0.597	0.954	0.346	0.957	0.228	0.938
		$\log P$	Oracle	Adam	Benign	0.976	0.862	0.971	0.956	0.969	0.886	0.941	0.782
		$\log P$	Oracle	Adam	Protected	0.214	0.986	0.309	0.973	0.408	0.982	0.181	0.998
		Logit	Oracle	Adam	Benign	0.976	0.741	0.971	0.953	0.969	0.636	0.941	0.684
		Logit	Oracle	Adam	Protected	0.176	0.986	0.233	0.969	0.242	0.981	0.117	0.999
		Logit	Max $P$	Adam	Benign	0.976	0.961	0.971	0.968	0.969	0.824	0.941	0.924
		Logit	Max $P$	Adam	Protected	0.634	0.966	0.634	0.966	0.451	0.953	0.587	0.944
		$\log P$	Pseudo	Adam	Benign	0.976	0.897	0.971	0.962	0.969	0.905	0.941	0.891
		$\log P$	Pseudo	Adam	Protected	0.295	0.961	0.387	0.967	0.459	0.954	0.381	0.938
		Logit	Pseudo	Adam	Benign	0.976	0.836	0.971	0.961	0.969	0.743	0.941	0.899
		Logit	Pseudo	Adam	Protected	0.257	0.964	0.321	0.967	0.297	0.956	0.234	0.945
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.976	0.963	0.971	0.975	0.969	0.924	0.941	0.911
		Logit	Pseudo	SGD-M	Protected	0.913	0.968	0.843	0.964	0.938	0.964	0.921	0.941
		$\log P$	Oracle	Adam	Benign	0.976	0.958	0.971	0.957	0.969	0.968	0.941	0.927
		$\log P$	Oracle	Adam	Protected	0.991	0.993	0.988	0.993	0.968	0.991	0.998	0.999
		Logit	Oracle	Adam	Benign	0.976	0.957	0.971	0.959	0.969	0.963	0.941	0.883
		Logit	Oracle	Adam	Protected	0.992	0.995	0.991	0.995	0.985	0.991	0.995	0.999
		Logit	Max $P$	Adam	Benign	0.976	0.964	0.971	0.954	0.969	0.922	0.941	0.921
		Logit	Max $P$	Adam	Protected	0.792	0.967	0.844	0.957	0.922	0.964	0.921	0.944
		$\log P$	Pseudo	Adam	Benign	0.976	0.952	0.971	0.953	0.969	0.948	0.941	0.936
		$\log P$	Pseudo	Adam	Protected	0.925	0.957	0.935	0.951	0.924	0.954	0.887	0.942
		Logit	Pseudo	Adam	Benign	0.976	0.946	0.971	0.956	0.969	0.917	0.941	0.917
		Logit	Pseudo	Adam	Protected	0.938	0.965	0.938	0.959	0.951	0.961	0.921	0.944

Table A10: Original Results(AUC) on 2D data. We show original result of 2D datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. The reported values represent the original value in AUC of our method across all 2D datasets. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions, ”Probability (Max P)” indicates attacks steered by the maximum output probability of the model, and ”Pseudo Labels” refers to using the initial prediction’s highest probability to guide the attack direction. Loss (L) involves two approaches: using Logistic Probability (Logit) directly as the loss function and applying the logarithm of Logistic Probability (

\log P

DataSet	Method					ResNet-18		ResNet-50		VGG-16		ConvNext-t
BloodMNIST	A	L	T	O	D	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected
	Random Noise				Benign	0.993	0.991	0.991	0.948	0.895	0.961	0.961	0.957
	Random Noise				Protected	0.976	0.991	0.938	0.919	0.882	0.964	0.951	0.956
	Adv	Logit	Pseudo	SGD-M	Benign	0.993	0.896	0.991	0.941	0.895	0.579	0.961	0.846
		Logit	Pseudo	SGD-M	Protected	0.076	0.978	0.284	0.969	0.165	0.961	0.069	0.961
		$\log P$	Oracle	Adam	Benign	0.993	0.939	0.991	0.939	0.895	0.875	0.961	0.736
		$\log P$	Oracle	Adam	Protected	0.023	0.996	0.156	0.981	0.324	0.989	0.023	1
		Logit	Oracle	Adam	Benign	0.993	0.874	0.991	0.893	0.895	0.677	0.961	0.644
		Logit	Oracle	Adam	Protected	0.037	0.999	0.089	0.988	0.131	0.971	0	1
		Logit	Max $P$	Adam	Benign	0.993	0.965	0.991	0.971	0.895	0.898	0.961	0.929
		Logit	Max $P$	Adam	Protected	0.552	0.979	0.622	0.973	0.469	0.945	0.492	0.966
		$\log P$	Pseudo	Adam	Benign	0.993	0.953	0.991	0.945	0.895	0.924	0.961	0.896
		$\log P$	Pseudo	Adam	Protected	0.087	0.977	0.232	0.974	0.372	0.957	0.147	0.961
		Logit	Pseudo	Adam	Benign	0.993	0.907	0.991	0.928	0.895	0.805	0.961	0.857
		Logit	Pseudo	Adam	Protected	0.102	0.978	0.161	0.974	0.189	0.942	0.078	0.961
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.993	0.963	0.991	0.964	0.895	0.903	0.961	0.879
		Logit	Pseudo	SGD-M	Protected	0.971	0.979	0.926	0.974	0.881	0.956	0.947	0.961
		$\log P$	Oracle	Adam	Benign	0.993	0.973	0.991	0.971	0.895	0.901	0.961	0.961
		$\log P$	Oracle	Adam	Protected	1	1	1	1	0.922	0.931	1	1
		Logit	Oracle	Adam	Benign	0.993	0.991	0.991	0.977	0.895	0.885	0.961	0.961
		Logit	Oracle	Adam	Protected	1	1	1	1	0.907	0.959	1	1
		Logit	Max $P$	Adam	Benign	0.993	0.955	0.991	0.951	0.895	0.916	0.961	0.898
		Logit	Max $P$	Adam	Protected	0.969	0.978	0.951	0.974	0.861	0.957	0.948	0.961
		$\log P$	Pseudo	Adam	Benign	0.993	0.939	0.991	0.955	0.895	0.925	0.961	0.939
		$\log P$	Pseudo	Adam	Protected	0.969	0.971	0.968	0.973	0.891	0.938	0.925	0.959
		Logit	Pseudo	Adam	Benign	0.993	0.955	0.991	0.963	0.895	0.911	0.961	0.898
		Logit	Pseudo	Adam	Protected	0.969	0.978	0.961	0.977	0.881	0.952	0.948	0.961
OrganCMNIST	Random Noise				Benign	0.983	0.983	0.982	0.974	0.951	0.938	0.875	0.913
	Random Noise				Protected	0.982	0.984	0.977	0.976	0.949	0.948	0.875	0.915
	Adv	Logit	Pseudo	SGD-M	Benign	0.983	0.982	0.982	0.981	0.951	0.775	0.875	0.788
		Logit	Pseudo	SGD-M	Protected	0.249	0.981	0.572	0.977	0.527	0.931	0.248	0.878
		$\log P$	Oracle	Adam	Benign	0.983	0.983	0.982	0.978	0.951	0.896	0.875	0.722
		$\log P$	Oracle	Adam	Protected	0.301	0.978	0.359	0.968	0.703	0.938	0.255	0.999
		Logit	Oracle	Adam	Benign	0.983	0.754	0.982	0.978	0.951	0.864	0.875	0.678
		Logit	Oracle	Adam	Protected	0.058	0.994	0.227	0.965	0.618	0.961	0.142	1
		Logit	Max $P$	Adam	Benign	0.983	0.984	0.982	0.983	0.951	0.943	0.875	0.871
		Logit	Max $P$	Adam	Protected	0.675	0.978	0.809	0.977	0.738	0.935	0.608	0.925
		$\log P$	Pseudo	Adam	Benign	0.983	0.983	0.982	0.978	0.951	0.908	0.875	0.859
		$\log P$	Pseudo	Adam	Protected	0.365	0.981	0.419	0.974	0.752	0.926	0.504	0.881
		Logit	Pseudo	Adam	Benign	0.983	0.797	0.982	0.981	0.951	0.891	0.875	0.836
		Logit	Pseudo	Adam	Protected	0.151	0.955	0.297	0.976	0.678	0.942	0.385	0.884
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.983	0.976	0.982	0.977	0.951	0.923	0.875	0.816
		Logit	Pseudo	SGD-M	Protected	0.968	0.972	0.971	0.975	0.953	0.951	0.879	0.893
		$\log P$	Oracle	Adam	Benign	0.983	0.983	0.982	0.978	0.951	0.952	0.875	0.853
		$\log P$	Oracle	Adam	Protected	0.999	0.999	0.998	0.998	0.988	0.989	0.996	0.999
		Logit	Oracle	Adam	Benign	0.983	0.978	0.982	0.979	0.951	0.951	0.875	0.831
		Logit	Oracle	Adam	Protected	1	1	0.996	0.998	0.982	0.983	0.996	0.999
		Logit	Max $P$	Adam	Benign	0.983	0.969	0.982	0.974	0.951	0.929	0.875	0.841
		Logit	Max $P$	Adam	Protected	0.961	0.965	0.961	0.974	0.956	0.951	0.891	0.884
		$\log P$	Pseudo	Adam	Benign	0.983	0.977	0.982	0.977	0.951	0.941	0.875	0.848
		$\log P$	Pseudo	Adam	Protected	0.971	0.972	0.972	0.973	0.951	0.951	0.874	0.876
		Logit	Pseudo	Adam	Benign	0.983	0.969	0.982	0.976	0.951	0.929	0.875	0.841
		Logit	Pseudo	Adam	Protected	0.961	0.965	0.961	0.973	0.956	0.951	0.891	0.884
OrganAMNIST	Random Noise				Benign	0.979	0.982	0.992	0.981	0.959	0.953	0.945	0.947
	Random Noise				Protected	0.976	0.982	0.987	0.991	0.951	0.958	0.943	0.951
	Adv	Logit	Pseudo	SGD-M	Benign	0.979	0.846	0.992	0.991	0.959	0.801	0.945	0.823
		Logit	Pseudo	SGD-M	Protected	0.158	0.961	0.485	0.986	0.543	0.933	0.187	0.925
		$\log P$	Oracle	Adam	Benign	0.979	0.966	0.992	0.991	0.959	0.928	0.945	0.883
		$\log P$	Oracle	Adam	Protected	0.345	0.963	0.405	0.981	0.737	0.951	0.173	0.995
		Logit	Oracle	Adam	Benign	0.979	0.829	0.992	0.991	0.959	0.884	0.945	0.817
		Logit	Oracle	Adam	Protected	0.032	0.996	0.222	0.981	0.611	0.958	0.051	0.998
		Logit	Max $P$	Adam	Benign	0.979	0.984	0.992	0.994	0.959	0.932	0.945	0.955
		Logit	Max $P$	Adam	Protected	0.636	0.981	0.816	0.988	0.751	0.943	0.554	0.951
		$\log P$	Pseudo	Adam	Benign	0.979	0.962	0.992	0.991	0.959	0.929	0.945	0.909
		$\log P$	Pseudo	Adam	Protected	0.353	0.963	0.448	0.985	0.791	0.936	0.353	0.918
		Logit	Pseudo	Adam	Benign	0.979	0.851	0.992	0.992	0.959	0.889	0.945	0.842
		Logit	Pseudo	Adam	Protected	0.144	0.952	0.278	0.986	0.681	0.941	0.201	0.934
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.979	0.957	0.992	0.992	0.959	0.939	0.945	0.874
		Logit	Pseudo	SGD-M	Protected	0.957	0.962	0.981	0.984	0.957	0.951	0.898	0.914
		$\log P$	Oracle	Adam	Benign	0.979	0.979	0.992	0.992	0.959	0.955	0.945	0.945
		$\log P$	Oracle	Adam	Protected	1	1	1	1	0.991	0.991	1	1
		Logit	Oracle	Adam	Benign	0.979	0.971	0.992	0.992	0.959	0.955	0.945	0.945
		Logit	Oracle	Adam	Protected	1	1	0.999	1	0.991	0.991	1	1
		Logit	Max $P$	Adam	Benign	0.979	0.958	0.992	0.991	0.959	0.943	0.945	0.886
		Logit	Max $P$	Adam	Protected	0.953	0.963	0.977	0.984	0.961	0.956	0.908	0.918
		$\log P$	Pseudo	Adam	Benign	0.979	0.973	0.992	0.991	0.959	0.951	0.945	0.926
		$\log P$	Pseudo	Adam	Protected	0.962	0.967	0.981	0.983	0.949	0.948	0.902	0.919
		Logit	Pseudo	Adam	Benign	0.979	0.958	0.992	0.991	0.959	0.943	0.945	0.886
		Logit	Pseudo	Adam	Protected	0.953	0.963	0.977	0.984	0.961	0.956	0.908	0.918

Table A11: Original Results(AUC) on 2D data. We show original result of 2D datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. The reported values represent the original value in AUC of our method across all 2D datasets. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions, ”Probability (Max P)” indicates attacks steered by the maximum output probability of the model, and ”Pseudo Labels” refers to using the initial prediction’s highest probability to guide the attack direction. Loss (L) involves two approaches: using Logistic Probability (Logit) directly as the loss function and applying the logarithm of Logistic Probability (

\log P

DataSet	Method					ResNet-18		ResNet-50		VGG-16		ConvNext-t
OrganSMNIST	A	L	T	O	D	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected
	Random Noise				Benign	0.952	0.945	0.961	0.936	0.929	0.924	0.829	0.821
	Random Noise				Protected	0.946	0.949	0.933	0.941	0.924	0.928	0.827	0.826
	Adv	Logit	Pseudo	SGD-M	Benign	0.952	0.945	0.961	0.957	0.929	0.712	0.829	0.762
		Logit	Pseudo	SGD-M	Protected	0.307	0.943	0.512	0.953	0.422	0.924	0.313	0.836
		$\log P$	Oracle	Adam	Benign	0.952	0.941	0.961	0.946	0.929	0.835	0.829	0.657
		$\log P$	Oracle	Adam	Protected	0.277	0.948	0.433	0.942	0.629	0.971	0.197	1
		Logit	Oracle	Adam	Benign	0.952	0.739	0.961	0.961	0.929	0.747	0.829	0.608
		Logit	Oracle	Adam	Protected	0.079	0.998	0.223	0.951	0.422	0.972	0.111	1
		Logit	Max $P$	Adam	Benign	0.952	0.958	0.961	0.964	0.929	0.915	0.829	0.807
		Logit	Max $P$	Adam	Protected	0.636	0.951	0.777	0.957	0.642	0.918	0.566	0.837
		$\log P$	Pseudo	Adam	Benign	0.952	0.945	0.961	0.953	0.929	0.881	0.829	0.797
		$\log P$	Pseudo	Adam	Protected	0.438	0.943	0.545	0.951	0.714	0.914	0.545	0.816
		Logit	Pseudo	Adam	Benign	0.952	0.817	0.961	0.959	0.929	0.826	0.829	0.789
		Logit	Pseudo	Adam	Protected	0.288	0.911	0.343	0.953	0.519	0.921	0.437	0.832
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.952	0.946	0.961	0.955	0.929	0.904	0.829	0.763
		Logit	Pseudo	SGD-M	Protected	0.949	0.943	0.945	0.952	0.931	0.921	0.811	0.838
		$\log P$	Oracle	Adam	Benign	0.952	0.951	0.961	0.959	0.929	0.927	0.829	0.801
		$\log P$	Oracle	Adam	Protected	1	1	0.998	0.999	0.984	0.988	0.995	1
		Logit	Oracle	Adam	Benign	0.952	0.921	0.961	0.958	0.929	0.907	0.829	0.726
		Logit	Oracle	Adam	Protected	0.999	1	0.989	0.996	0.972	0.973	0.995	1
		Logit	Max $P$	Adam	Benign	0.952	0.922	0.961	0.952	0.929	0.901	0.829	0.805
		Logit	Max $P$	Adam	Protected	0.924	0.932	0.936	0.951	0.932	0.921	0.829	0.835
		$\log P$	Pseudo	Adam	Benign	0.952	0.941	0.961	0.951	0.929	0.919	0.829	0.815
		$\log P$	Pseudo	Adam	Protected	0.931	0.938	0.943	0.951	0.921	0.925	0.812	0.831
		Logit	Pseudo	Adam	Benign	0.952	0.922	0.961	0.951	0.929	0.896	0.829	0.805
		Logit	Pseudo	Adam	Protected	0.924	0.933	0.936	0.951	0.932	0.914	0.829	0.835
TissueMNIST	Random Noise				Benign	0.882	0.824	0.886	0.801	0.863	0.831	0.821	0.798
	Random Noise				Protected	0.811	0.857	0.822	0.872	0.761	0.851	0.778	0.811
	Adv	Logit	Pseudo	SGD-M	Benign	0.882	0.749	0.886	0.853	0.863	0.673	0.821	0.718
		Logit	Pseudo	SGD-M	Protected	0.298	0.857	0.571	0.844	0.271	0.858	0.258	0.819
		$\log P$	Oracle	Adam	Benign	0.882	0.389	0.886	0.651	0.863	0.442	0.821	0.489
		$\log P$	Oracle	Adam	Protected	0.119	0.997	0.205	0.943	0.165	0.991	0.111	1
		Logit	Oracle	Adam	Benign	0.882	0.296	0.886	0.603	0.863	0.374	0.821	0.439
		Logit	Oracle	Adam	Protected	0.075	0.999	0.105	0.948	0.101	0.979	0.017	1
		Logit	Max $P$	Adam	Benign	0.882	0.821	0.886	0.847	0.863	0.808	0.821	0.806
		Logit	Max $P$	Adam	Protected	0.557	0.844	0.618	0.845	0.453	0.848	0.483	0.813
		$\log P$	Pseudo	Adam	Benign	0.882	0.791	0.886	0.822	0.863	0.766	0.821	0.745
		$\log P$	Pseudo	Adam	Protected	0.353	0.856	0.373	0.846	0.396	0.856	0.378	0.823
		Logit	Pseudo	Adam	Benign	0.882	0.753	0.886	0.805	0.863	0.722	0.821	0.743
		Logit	Pseudo	Adam	Protected	0.291	0.857	0.285	0.845	0.294	0.862	0.252	0.817
	Anti Adv	Logit	Pseudo	SGD-M	Benign	0.882	0.839	0.886	0.871	0.863	0.794	0.821	0.763
		Logit	Pseudo	SGD-M	Protected	0.841	0.871	0.812	0.871	0.811	0.864	0.801	0.821
		$\log P$	Oracle	Adam	Benign	0.882	0.847	0.886	0.863	0.863	0.826	0.821	0.813
		$\log P$	Oracle	Adam	Protected	0.991	1	0.968	0.996	0.961	0.988	1	1
		Logit	Oracle	Adam	Benign	0.882	0.809	0.886	0.848	0.863	0.791	0.821	0.726
		Logit	Oracle	Adam	Protected	0.977	1	0.94	0.994	0.924	0.978	0.998	1
		Logit	Max $P$	Adam	Benign	0.882	0.835	0.886	0.851	0.863	0.791	0.821	0.756
		Logit	Max $P$	Adam	Protected	0.832	0.871	0.774	0.871	0.798	0.865	0.802	0.821
		$\log P$	Pseudo	Adam	Benign	0.882	0.848	0.886	0.852	0.863	0.814	0.821	0.781
		$\log P$	Pseudo	Adam	Protected	0.831	0.865	0.828	0.861	0.795	0.858	0.771	0.817
		Logit	Pseudo	Adam	Benign	0.882	0.838	0.886	0.851	0.863	0.792	0.821	0.757
		Logit	Pseudo	Adam	Protected	0.837	0.871	0.803	0.871	0.803	0.864	0.802	0.821

Table A12: Original Results(ACC) on 3D data. We show original result of 3D datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions and ”Pseudo” refers to using the initial prediction’s highest probability to guide the attack direction. Adv/Anti-adv indicates whether the strategy aims to minimize or maximize the loss function. Dataset (D) represent benign (original) dataset and protected dataset.Using Logistic Probability (Logit) directly as the loss function and Adam as optimizer.

DataSet	Method			ResNet-18		ResNet-50		VGG-16		ConvNext-t
OrganMNIST3d	A	T	D	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected
	Random Noise		Benign	0.926	0.908	0.895	0.795	0.921	0.895	0.598	0.597
	Random Noise		Protected	0.795	0.921	0.664	0.902	0.744	0.923	0.551	0.593
	Adv	Oracle	Benign	0.923	0.901	0.915	0.918	0.926	0.734	0.598	0.303
		Oracle	Protected	0	0.915	0	0.892	0	0.916	0	0.995
		Pseudo	Benign	0.926	0.911	0.895	0.911	0.921	0.616	0.598	0.398
		Pseudo	Protected	0.043	0.889	0.031	0.913	0.038	0.885	0.046	0.613
	Anti Adv	Oracle	Benign	0.923	0.716	0.915	0.828	0.926	0.682	0.598	0.598
		Oracle	Protected	1	1	1	1	0.933	1	1	1
		Pseudo	Benign	0.923	0.544	0.915	0.821	0.926	0.533	0.598	0.428
		Pseudo	Protected	0.923	0.923	0.931	0.931	0.862	0.925	0.598	0.598
NoduleMNIST3d	Random Noise		Benign	0.848	0.845	0.835	0.848	0.871	0.874	0.816	0.829
	Random Noise		Protected	0.858	0.826	0.829	0.848	0.848	0.865	0.801	0.813
	Adv	Oracle	Benign	0.832	0.794	0.835	0.848	0.845	0.319	0.816	0.794
		Oracle	Protected	0.197	1	0	0.845	0	0.997	0.784	1
		Pseudo	Benign	0.848	0.845	0.835	0.865	0.871	0.813	0.816	0.794
		Pseudo	Protected	0.123	0.845	0.155	0.839	0.352	0.858	0.794	0.794
	Anti Adv	Oracle	Benign	0.832	0.813	0.835	0.868	0.845	0.806	0.816	0.842
		Oracle	Protected	1	1	1	1	1	1	0.958	1
		Pseudo	Benign	0.832	0.855	0.835	0.829	0.845	0.842	0.816	0.529
		Pseudo	Protected	0.829	0.852	0.858	0.852	0.861	0.861	0.848	0.206
FractureMNIST3d	Random Noise		Benign	0.479	0.412	0.533	0.512	0.517	0.483	0.417	0.425
	Random Noise		Protected	0.479	0.412	0.533	0.517	0.517	0.483	0.417	0.425
	Adv	Oracle	Benign	0.458	0.338	0.433	0.208	0.5 0	217 0	417 0	212
		Oracle	Protected	0	1	0.433	1	0	0.996	0	1
		Pseudo	Benign	0.479	0.454	0.533	0.525	0.517	0.471	0.417	0.375
		Pseudo	Protected	0.263	0.454	0.308	0.492	0.501	0.446	0.388	0.375
	Anti Adv	Oracle	Benign	0.458	0.371	0.433	0.196	0.501	0.451	0.417	0.404
		Oracle	Protected	1	1	0.433	1	1	1	1	1
		Pseudo	Benign	0.458	0.254	0.433	0.458	0.501	0.396	0.417	0.329
		Pseudo	Protected	0.517	0.508	0.433	0.454	0.501	0.501	0.417	0.392

Table A13: Original Results(AUC) on 3D data. We show original result of 3D datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions and ”Pseudo” refers to using the initial prediction’s highest probability to guide the attack direction. Adv/Anti-adv indicates whether the strategy aims to minimize or maximize the loss function. Dataset (D) represent benign (original) dataset and protected dataset.Using Logistic Probability (Logit) directly as the loss function and Adam as optimizer.

DataSet	Method			ResNet-18		ResNet-50		VGG-16		ConvNext-t
OrganMNIST3d	A	T	D	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected
	Random Noise		Benign	0.995	0.994	0.989	0.973	0.995	0.992	0.877	0.878
	Random Noise		Protected	0.979	0.996	0.953	0.989	0.979	0.995	0.865	0.871
	Adv	Oracle	Benign	0.993	0.992	0.994	0.992	0.988	0.952	0.877	0.756
		Oracle	Protected	0.008	0.992	0	0.988	0.045	0.991	0	0.998
		Pseudo	Benign	0.995	0.993	0.989	0.996	0.995	0.925	0.877	0.838
		Pseudo	Protected	0.096	0.991	0.109	0.994	0.147	0.982	0.206	0.879
	Anti Adv	Oracle	Benign	0.993	0.991	0.994	0.992	0.988	0.973	0.877	0.877
		Oracle	Protected	1	1	1	1	0.994	1	1	1
		Pseudo	Benign	0.993	0.967	0.994	0.985	0.988	0.955	0.877	0.867
		Pseudo	Protected	0.965	0.982	0.981	0.987	0.974	0.982	0.812	0.861
NoduleMNIST3d	Random Noise		Benign	0.803	0.871	0.822	0.841	0.917	0.912	0.857	0.847
	Random Noise		Protected	0.799	0.853	0.791	0.864	0.909	0.896	0.861	0.844
	Adv	Oracle	Benign	0.737	0.761	0.822	0.841	0.862	0.461	0.857	0.429
		Oracle	Protected	0	1	0	0.832	0	1	0	1
		Pseudo	Benign	0.803	0.868	0.822	0.871	0.917	0.748	0.857	0.765
		Pseudo	Protected	0.318	0.866	0.334	0.857	0.359	0.815	0.324	0.765
	Anti Adv	Oracle	Benign	0.737	0.812	0.822	0.865	0.862	0.835	0.857	0.86
		Oracle	Protected	1	1	1	1	1	1	1	1
		Pseudo	Benign	0.737	0.865	0.822	0.821	0.862	0.819	0.857	0.851
		Pseudo	Protected	0.771	0.874	0.771	0.811	0.745	0.836	0.846	0.817
FractureMNIST3d	Random Noise		Benign	0.577	0.654	0.691	0.716	0.619	0.641	0.564	0.547
	Random Noise		Protected	0.577	0.655	0.691	0.716	0.619	0.639	0.564	0.547
	Adv	Oracle	Benign	0.583	0.524	0.473	0.536	0.629	0.531	0.564	0.485
		Oracle	Protected	0	1	0	1	0.007	1	0	1
		Pseudo	Benign	0.577	0.611	0.691	0.665	0.619	0.647	0.564	0.534
		Pseudo	Protected	0.433	0.609	0.444	0.649	0.586	0.651	0.538	0.544
	Anti Adv	Oracle	Benign	0.583	0.539	0.473	0.571	0.629	0.581	0.564	0.563
		Oracle	Protected	1	1	1	1	1	1	1	1
		Pseudo	Benign	0.583	0.512	0.473	0.577	0.629	0.602	0.564	0.546
		Pseudo	Protected	0.633	0.603	0.456	0.579	0.628	0.644	0.552	0.529

Table A14: Raw Results(ACC) on High-Resolution data. We show original result of High-Resolution datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions and ”Pseudo” refers to using the initial prediction’s highest probability to guide the attack direction. Adv/Anti-adv indicates whether the strategy aims to minimize or maximize the loss function. Dataset (D) represent benign (original) dataset and protected dataset.Using Logistic Probability (Logit) directly as the loss function and Adam as optimizer.

DataSet	Method			ResNet-18		ResNet-50		VGG-16		ConvNext-t
BloodMNIST	A	T	D	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected
	Random Noise		Benign	0.985	0.982	0.983	0.985	0.986	0.983	0.856	0.857
	Random Noise		Protected	0.977	0.983	0.981	0.984	0.974	0.988	0.855	0.864
	Adv	Oracle	Benign	0.985	0.593	0.983	0.543	0.986	0.869	0.856	0.786
		Oracle	Protected	1	1	1	1	1	1	0.989	1
		Pseudo	Benign	0.985	0.833	0.983	0.866	0.986	0.875	0.856	0.803
		Pseudo	Protected	0.985	0.985	0.983	0.983	0.986	0.986	0.856	0.867
	Anti Adv	Oracle	Benign	0.985	0.701	0.983	0.956	0.986	0.778	0.856	0.433
		Oracle	Protected	0	1	0.007	0.992	0.009	0.997	0.146	0.999
		Pseudo	Benign	0.985	0.784	0.983	0.968	0.986	0.868	0.856	0.881
		Pseudo	Protected	0.011	0.985	0.013	0.981	0.011	0.985	0.172	0.926
RetianMNIST	Random Noise		Benign	0.551	0.541	0.531	0.521	0.547	0.525	0.542	0.527
	Random Noise		Protected	0.541	0.545	0.515	0.512	0.555	0.535	0.541	0.531
	Adv	Oracle	Benign	0.551	0.521	0.531	0.323	0.547	0.445	0.542	0.461
		Oracle	Protected	1	1	0.995	0.968	1	0.998	0.841	1
		Pseudo	Benign	0.551	0.438	0.531	0.482	0.547	0.505	0.542	0.502
		Pseudo	Protected	0.551	0.552	0.531	0.517	0.547	0.547	0.542	0.545
	Anti Adv	Oracle	Benign	0.551	0.362	0.531	0.532	0.547	0.537	0.542	0.05
		Oracle	Protected	0	0.94	0	0.515	0	0.537	0.307	1
		Pseudo	Benign	0.551	0.547	0.531	0.505	0.547	0.532	0.542	0.505
		Pseudo	Protected	0.142	0.552	0.117	0.512	0.147	0.527	0.455	0.481
BreastMNIST	Random Noise		Benign	0.853	0.878	0.808	0.841	0.846	0.859	0.353	0.429
	Random Noise		Protected	0.859	0.878	0.795	0.827	0.833	0.859	0.346	0.436
	Adv	Oracle	Benign	0.853	0.269	0.808	0.718	0.846	0.724	0.353	0.276
		Oracle	Protected	1	1	0.962	0.962	1	1	0.846	1
		Pseudo	Benign	0.853	0.269	0.808	0.641	0.846	0.455	0.353	0.269
		Pseudo	Protected	0.841	0.841	0.769	0.776	0.846	0.865	0.718	0.705
	Anti Adv	Oracle	Benign	0.853	0.737	0.808	0.769	0.846	0.859	0.353	0.269
		Oracle	Protected	0	1	0	0.776	0	0.865	0.269	1
		Pseudo	Benign	0.853	0.782	0.808	0.712	0.846	0.865	0.353	0.269
		Pseudo	Protected	0.161	0.872	0.231	0.724	0.154	0.878	0.269	0.269
OrganAMNIST	Random Noise		Benign	0.951	0.949	0.952	0.946	0.959	0.959	0.871	0.881
	Random Noise		Protected	0.864	0.949	0.921	0.951	0.933	0.959	0.843	0.879
	Adv	Oracle	Benign	0.951	0.849	0.952	0.807	0.959	0.925	0.871	0.709
		Oracle	Protected	1	1	1	0.999	1	1	0.955	1
		Pseudo	Benign	0.951	0.761	0.952	0.881	0.959	0.902	0.871	0.493
		Pseudo	Protected	0.951	0.951	0.952	0.952	0.959	0.959	0.832	0.871
	Anti Adv	Oracle	Benign	0.951	0.656	0.952	0.951	0.959	0.609	0.871	0.095
		Oracle	Protected	0	0.988	0.002	0.941	0.002	0.986	0	1
		Pseudo	Benign	0.951	0.571	0.952	0.951	0.959	0.603	0.871	0.182
		Pseudo	Protected	0.017	0.951	0.014	0.945	0.002	0.958	0.022	0.871
PneumoniaMNIST	Random Noise		Benign	0.881	0.862	0.843	0.853	0.837	0.831	0.812	0.814
	Random Noise		Protected	0.871	0.871	0.849	0.859	0.796	0.856	0.816	0.821
	Adv	Oracle	Benign	0.881	0.625	0.843	0.625	0.837	0.663	0.812	0.812
		Oracle	Protected	1	0.998	1	0.998	1	1	1	1
		Pseudo	Benign	0.881	0.625	0.843	0.702	0.837	0.856	0.812	0.667
		Pseudo	Protected	0.867	0.865	0.833	0.832	0.832	0.832	0.804	0.801
	Anti Adv	Oracle	Benign	0.881	0.577	0.843	0.375	0.837	0.845	0.812	0.511
		Oracle	Protected	0	1	0	0.982	0	0.97	0	1
		Pseudo	Benign	0.881	0.843	0.843	0.607	0.837	0.814	0.812	0.625
		Pseudo	Protected	0.133	0.867	0.167	0.841	0.168	0.837	0.196	0.804

Table A15: Raw Results(AUC) on High-Resolution data. We show original result of High-Resolution datasets comparing various models. CP is Protected model on Benign data and PP is on Protected model on Protected data. Target (T) delineates the adversarial attack aim: ”Oracle” represents the ideal conditions and ”Pseudo” refers to using the initial prediction’s highest probability to guide the attack direction. Adv/Anti-adv indicates whether the strategy aims to minimize or maximize the loss function. Dataset (D) represent benign (original) dataset and protected dataset.Using Logistic Probability (Logit) directly as the loss function and Adam as optimizer.

DataSet	Method			ResNet-18		ResNet-50		VGG-16		ConvNext-t
BloodMNIST	A	T	D	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected	Surrogate	Protected
	Random Noise		Benign	0.999	0.999	0.998	0.998	0.999	0.999	0.961	0.968
	Random Noise		Protected	0.999	0.999	0.998	0.998	0.999	0.999	0.955	0.965
	Adv	Oracle	Benign	0.999	0.993	0.998	0.985	0.999	0.996	0.961	0.953
		Oracle	Protected	1	1	1	1	1	1	0.998	1
		Pseudo	Benign	0.999	0.983	0.998	0.983	0.999	0.987	0.961	0.957
		Pseudo	Protected	0.992	0.997	0.992	0.996	0.997	0.998	0.956	0.975
	Anti Adv	Oracle	Benign	0.999	0.937	0.998	0.996	0.999	0.982	0.961	0.825
		Oracle	Protected	0.094	1	0.138	0.999	0.066	1	0.496	1
		Pseudo	Benign	0.999	0.953	0.998	0.996	0.999	0.991	0.961	0.979
		Pseudo	Protected	0.106	0.997	0.152	0.998	0.075	0.999	0.566	0.983
RetianMNIST	Random Noise		Benign	0.708	0.694	0.687	0.696	0.722	0.706	0.719	0.722
	Random Noise		Protected	0.713	0.694	0.684	0.691	0.717	0.705	0.721	0.723
	Adv	Oracle	Benign	0.708	0.691	0.687	0.682	0.722	0.689	0.719	0.675
		Oracle	Protected	1	1	1	0.996	1	1	0.971	1
		Pseudo	Benign	0.708	0.599	0.687	0.656	0.722	0.681	0.719	0.698
		Pseudo	Protected	0.702	0.684	0.694	0.673	0.724	0.735	0.722	0.707
	Anti Adv	Oracle	Benign	0.708	0.621	0.687	0.705	0.722	0.729	0.719	0.546
		Oracle	Protected	0	0.969	0.001	0.692	0	0.717	0.282	1
		Pseudo	Benign	0.708	0.691	0.687	0.709	0.722	0.726	0.719	0.722
		Pseudo	Protected	0.414	0.695	0.334	0.705	0.349	0.723	0.634	0.712
BreastMNIST	Random Noise		Benign	0.846	0.873	0.812	0.861	0.853	0.897	0.558	0.621
	Random Noise		Protected	0.841	0.868	0.801	0.846	0.855	0.912	0.563	0.624
	Adv	Oracle	Benign	0.846	0.835	0.813	0.821	0.853	0.856	0.558	0.638
		Oracle	Protected	1	1	1	0.997	1	1	0.957	1
		Pseudo	Benign	0.846	0.661	0.813	0.851	0.853	0.894	0.558	0.531
		Pseudo	Protected	0.791	0.789	0.831	0.841	0.821	0.895	0.588	0.607
	Anti Adv	Oracle	Benign	0.846	0.752	0.813	0.839	0.853	0.901	0.558	0.412
		Oracle	Protected	0	1	0	0.821	0	0.891	0	1
		Pseudo	Benign	0.846	0.871	0.813	0.861	0.853	0.896	0.558	0.613
		Pseudo	Protected	0.222	0.861	0.252	0.861	0.211	0.901	0.404	0.434
OrganAMNIST	Random Noise		Benign	0.996	0.996	0.996	0.995	0.997	0.997	0.968	0.971
	Random Noise		Protected	0.988	0.996	0.993	0.996	0.994	0.997	0.965	0.969
	Adv	Oracle	Benign	0.996	0.993	0.996	0.993	0.997	0.995	0.968	0.969
		Oracle	Protected	1	1	1	1	1	1	1	0.999
		Pseudo	Benign	0.996	0.985	0.996	0.991	0.997	0.993	0.968	0.935
		Pseudo	Protected	0.972	0.991	0.976	0.989	0.988	0.993	0.927	0.956
	Anti Adv	Oracle	Benign	0.996	0.954	0.996	0.996	0.997	0.915	0.968	0.649
		Oracle	Protected	0.093	0.999	0.042	0.995	0.053	0.999	0.011	1
		Pseudo	Benign	0.996	0.948	0.996	0.996	0.997	0.918	0.968	0.758
		Pseudo	Protected	0.137	0.993	0.081	0.995	0.079	0.995	0.088	0.961
PneumoniaMNIST	Random Noise		Benign	0.963	0.961	0.947	0.963	0.947	0.935	0.876	0.88
	Random Noise		Protected	0.968	0.966	0.942	0.961	0.939	0.945	0.864	0.873
	Adv	Oracle	Benign	0.963	0.928	0.947	0.821	0.947	0.914	0.876	0.876
		Oracle	Protected	1	1	1	1	1	1	1	1
		Pseudo	Benign	0.963	0.925	0.947	0.918	0.947	0.955	0.876	0.623
		Pseudo	Protected	0.753	0.921	0.651	0.882	0.636	0.915	0.589	0.907
	Anti Adv	Oracle	Benign	0.963	0.886	0.947	0.773	0.947	0.951	0.876	0.56
		Oracle	Protected	0	1	0	0.999	0	0.998	0	1
		Pseudo	Benign	0.963	0.894	0.947	0.915	0.947	0.961	0.876	0.571
		Pseudo	Protected	0.277	0.846	0.344	0.963	0.407	0.899	0.421	0.902