GenDet: Towards Good Generalizations for AI-Generated Image Detection

Detection method	Generative Adversarial Networks	Deep fakes	Low level vision	Perceptual loss	Guided	LDM	Glide	DALL-E	Total
CNNDet [43]	100.0	93.47	84.50	99.54	89.49	98.15	89.02	73.75	59.47	98.24	98.40	73.72	70.62	71.00	70.54	80.65	84.91	82.07	70.59	83.58
Patchfor [10]	80.88	72.84	71.66	85.75	65.99	69.25	76.55	76.19	76.34	74.52	68.52	75.03	87.10	86.72	86.40	85.37	83.73	78.38	75.67	77.73
Co-occurence [28]	99.74	80.95	50.61	98.63	53.11	67.99	59.14	68.98	60.42	73.06	87.21	70.20	91.21	89.02	92.39	89.32	88.35	82.79	80.96	78.11
Spec [47]	55.39	100.0	75.08	55.11	66.08	100.0	45.18	47.46	57.12	53.61	50.98	57.72	77.72	77.25	76.47	68.58	64.58	61.92	67.77	66.21
DIRE ${}^{*}$ [44]	100.0	76.73	72.8	97.06	68.44	100.0	98.55	54.51	65.62	97.10	93.74	94.29	95.17	95.43	95.77	96.18	97.30	97.53	68.73	87.63
Ojha et al. ${}^{*}$ [30]	100.0	99.46	99.59	97.24	99.98	99.60	82.45	61.32	79.02	96.72	99.00	87.77	99.14	92.15	99.17	94.74	95.34	94.57	97.15	93.38
Teacher	100.0	95.65	98.38	96.10	99.94	92.86	77.68	72.18	81.63	94.43	99.00	92.26	98.98	92.25	99.16	97.11	96.77	96.38	93.80	93.40
Teacher-Student AD	99.86	99.87	99.79	99.69	99.73	98.64	84.97	62.21	54.21	90.98	98.71	98.89	99.53	98.92	99.51	99.17	99.16	98.99	98.69	93.76
Teacher-Student AD w/fake	99.92	99.83	99.70	99.65	99.69	97.79	91.71	56.08	56.68	98.61	98.87	98.96	99.32	99.12	99.33	98.99	99.03	98.92	99.01	94.27
GenDet	99.95	99.95	99.92	99.92	99.92	99.25	91.38	61.23	72.66	97.90	98.88	99.30	99.85	99.51	99.85	99.50	99.46	99.19	99.47	95.64

Detection method	Generative Adversarial Networks	Deep fakes	Low level vision	Perceptual loss	Guided	LDM	Glide	DALL-E	Total
CNNDet [43]	99.99	85.20	70.20	85.70	78.95	91.70	53.47	66.67	48.69	86.31	86.26	60.07	54.03	54.96	54.14	60.78	63.80	65.66	55.58	69.58
Patchfor [10]	75.03	68.97	68.47	79.16	64.23	63.94	75.54	75.14	75.28	72.33	55.30	67.41	76.50	76.10	75.77	74.81	73.28	68.52	67.91	71.24
Co-occurence [28]	97.70	63.15	53.75	92.50	51.10	54.70	57.10	63.06	55.85	65.65	65.80	60.50	70.70	70.55	71.00	70.25	69.60	69.90	67.55	66.86
Spec [47]	49.90	99.90	50.50	49.90	50.30	99.70	50.10	50.00	48.00	50.60	50.10	50.90	50.40	50.40	50.30	51.70	51.40	50.40	50.00	55.45
DIRE ${}^{*}$ [44]	100.0	67.73	64.78	83.08	65.30	100.0	94.75	57.62	60.96	62.36	62.31	83.20	82.70	84.05	84.25	87.10	90.80	90.25	58.75	77.89
Ojha et al. ${}^{*}$ [30]	100.0	98.50	94.50	82.00	99.50	97.00	66.60	63.00	57.50	59.50	72.00	70.03	94.19	73.76	94.36	79.07	79.85	78.14	86.78	81.38
Teacher	100.0	88.05	93.40	87.55	99.00	84.40	74.50	71.50	75.00	86.90	95.15	83.40	95.20	83.45	95.15	91.70	91.35	89.70	85.60	87.95
Teacher-Student AD	98.50	99.00	98.75	98.80	98.75	94.65	81.85	64.50	53.50	85.85	97.55	97.40	97.25	98.70	98.75	98.75	98.70	98.70	95.75	92.41
Teacher-Student AD w/fake	98.50	98.95	98.75	98.85	98.75	93.35	86.75	57.00	55.50	96.70	98.75	98.35	98.75	98.40	98.75	98.75	98.75	98.75	97.50	93.15
GenDet	99.00	99.50	99.30	99.05	99.00	96.75	88.20	63.50	67.50	93.90	98.75	98.70	98.80	98.60	98.75	98.75	98.75	98.75	98.45	94.42

Method	Testing Subset	Avg Acc.(%)
ResNet-50 [20]	54.9	99.9	99.7	53.5	61.9	98.2	56.6	52.0	72.1
DeiT-S [39]	55.6	99.9	99.8	49.8	58.1	98.9	56.9	53.5	71.6
Swin-T [27]	62.1	99.9	99.8	49.8	67.6	99.1	62.3	57.6	74.8
CNNDet [43]	52.8	96.3	95.9	50.1	39.8	78.6	53.4	46.8	64.2
Spec [47]	52.0	99.4	99.2	49.7	49.8	94.8	55.6	49.8	68.8
F3Net [32]	50.1	99.9	99.9	49.9	50.0	99.9	49.9	49.9	68.7
GramNet [26]	54.2	99.2	99.1	50.3	54.6	98.9	50.8	51.7	69.9
DIRE ${}^{*}$ [44]	60.2	99.9	99.8	50.9	55.0	99.2	50.1	50.2	70.7
Ojha et al. ${}^{*}$ [30]	73.2	84.2	84.0	55.2	76.9	75.6	56.9	80.3	73.3
GenDet	89.6	96.1	96.1	58.0	78.4	92.8	66.5	75.0	81.6

Method	Testing Subset	Avg Acc.(%)
ResNet-50 [20]	96.2	57.4	51.9	51.2	97.9	69.4	70.6
DeiT-S [39]	97.1	54.0	55.6	50.5	94.4	67.2	69.8
Swin-T [27]	97.4	54.6	52.5	50.9	94.5	52.5	67.0
CNNDet [43]	50.0	50.0	97.3	97.3	97.4	77.9	78.3
Spec [47]	50.0	49.9	50.8	50.4	49.9	49.9	50.1
F3Net [32]	50.0	50.0	89.0	74.4	57.9	51.7	62.1
GramNet [26]	98.8	94.9	68.8	53.4	95.9	81.6	82.2
DIRE ${}^{*}$ [44]	64.1	53.5	85.4	65.0	88.8	56.5	68.9
Ojha et al. ${}^{*}$ [30]	88.2	78.5	85.8	83.0	69.7	65.7	78.3
GenDet	85.4	84.1	94.4	94.3	84.6	82.9	87.6

$M$	$\alpha$	Avg. Acc (%)	mAP (%)
2.0	0.01	94.14	95.23
2.0	0.1	94.42	95.64
2.0	1.0	93.31	94.71
1.0	0.1	93.78	95.07
2.0	0.1	94.42	95.64
4.0	0.1	93.72	94.96

	CogView2	StyleGAN	IF	mAP (%)
ResNet50	95.9	90.0	95.3	93.7
Ojha et al. ${}^{*}$	93.6	43.8	95.4	77.6
GenDet	95.1	92.3	98.3	95.2

[1] Brain mri images for brain tumor detection. https://www.kaggle.com/datasets/navoneel/brain-mri-images-for-brain-tumor-detection.

IF [2022] if. In https://github.com/deep-floyd/IF/tree/develop, 2022.

hem [2022] Head ct - hemorrhage. In https://www.kaggle.com/datasets/felipekitamura/head-ct-hemorrhage, 2022.

mid [2022] Midjourney. In https://www.midjourney.com/home/, 2022.

wuk [2022] Wukong. In https://xihe.mindspore.cn/modelzoo/wukong, 2022.

Bergmann et al. [2019] Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger. Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9592–9600, 2019.

Bergmann et al. [2020] Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger. Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4183–4192, 2020.

Bird and Lotfi [2023] Jordan J Bird and Ahmad Lotfi. Cifake: Image classification and explainable identification of ai-generated synthetic images. arXiv preprint arXiv:2303.14126, 2023.

Brock et al. [2018] Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.

Chai et al. [2020] Lucy Chai, David Bau, Ser-Nam Lim, and Phillip Isola. What makes fake images detectable? understanding properties that generalize. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVI 16, pages 103–120. Springer, 2020.

Chen et al. [2018] Chen Chen, Qifeng Chen, Jia Xu, and Vladlen Koltun. Learning to see in the dark. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3291–3300, 2018.

Chen and Koltun [2017] Qifeng Chen and Vladlen Koltun. Photographic image synthesis with cascaded refinement networks. In Proceedings of the IEEE international conference on computer vision, pages 1511–1520, 2017.

Choi et al. [2018] Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8789–8797, 2018.

Dai et al. [2019] Tao Dai, Jianrui Cai, Yongbing Zhang, Shu-Tao Xia, and Lei Zhang. Second-order attention network for single image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11065–11074, 2019.

Deng and Li [2022] Hanqiu Deng and Xingyu Li. Anomaly detection via reverse distillation from one-class embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9737–9746, 2022.

Dhariwal and Nichol [2021] Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.

Ding et al. [2022] Ming Ding, Wendi Zheng, Wenyi Hong, and Jie Tang. Cogview2: Faster and better text-to-image generation via hierarchical transformers. Advances in Neural Information Processing Systems, 35:16890–16902, 2022.

Gan et al. [2020] Zhe Gan, Yen-Chun Chen, Linjie Li, Chen Zhu, Yu Cheng, and **g**g Liu. Large-scale adversarial training for vision-and-language representation learning. Advances in Neural Information Processing Systems, 33:6616–6628, 2020.

Gu et al. [2022] Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, and Baining Guo. Vector quantized diffusion model for text-to-image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10696–10706, 2022.

He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.

https://www.forbes.com/sites/mattnovak/2023/03/27/ai-creates-photo-evidence-of-2001-earthquake-that-never happened/?sh=250435c83985 [2023] https://www.forbes.com/sites/mattnovak/2023/03/27/ai-creates-photo-evidence-of-2001-earthquake-that-never happened/?sh=250435c83985. Ai creates photo evidence of 2001 earthquake that never happened. 2023.

Karras et al. [2017] Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017.

Karras et al. [2019] Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.

Li et al. [2019] Ke Li, Tianhao Zhang, and Jitendra Malik. Diverse image synthesis from semantic layouts via conditional imle. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4220–4229, 2019.

Li et al. [2022] Zhi Li, Rizhao Cai, Haoliang Li, Kwok-Yan Lam, Yongjian Hu, and Alex C Kot. One-class knowledge distillation for face presentation attack detection. IEEE Transactions on Information Forensics and Security, 17:2137–2150, 2022.

Liu et al. [2020] Zhengzhe Liu, Xiaojuan Qi, and Philip HS Torr. Global texture enhancement for fake face detection in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8060–8069, 2020.

Liu et al. [2021] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.

Nataraj et al. [2019] Lakshmanan Nataraj, Tajuddin Manhar Mohammed, Shivkumar Chandrasekaran, Arjuna Flenner, Jawadul H Bappy, Amit K Roy-Chowdhury, and BS Manjunath. Detecting gan generated fake images using co-occurrence matrices. arXiv preprint arXiv:1903.06836, 2019.

Nichol et al. [2021] Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.

Ojha et al. [2023] Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. Towards universal fake image detectors that generalize across generative models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24480–24489, 2023.

Park et al. [2019] Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2337–2346, 2019.

Qian et al. [2020] Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and **g Shao. Thinking in frequency: Face forgery detection by mining frequency-aware clues. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII, pages 86–103. Springer, 2020.

Radford et al. [2021] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.

Ramesh et al. [2021] Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.

Rombach et al. [2021] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models, 2021.

Rombach et al. [2022] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.

Rossler et al. [2019] Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1–11, 2019.

Salehi et al. [2021] Mohammadreza Salehi, Niousha Sadjadi, Soroosh Baselizadeh, Mohammad H Rohban, and Hamid R Rabiee. Multiresolution knowledge distillation for anomaly detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14902–14912, 2021.

Touvron et al. [2021] Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR, 2021.

Tran et al. [2021] Ngoc-Trung Tran, Viet-Hung Tran, Ngoc-Bao Nguyen, Trung-Kien Nguyen, and Ngai-Man Cheung. On data augmentation for gan training. IEEE Transactions on Image Processing, 30:1882–1897, 2021.

[41] Luisa Verdoliva, Davide Cozzolino, and Koki Nagano. 2022 ieee image and video processing cup synthetic image detection.

Wang et al. [2019] Dilin Wang, Chengyue Gong, and Qiang Liu. Improving neural language modeling via adversarial training. In International Conference on Machine Learning, pages 6555–6565. PMLR, 2019.

Wang et al. [2020] Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A Efros. Cnn-generated images are surprisingly easy to spot… for now. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8695–8704, 2020.

Wang et al. [2023] Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, and Houqiang Li. Dire for diffusion-generated image detection. arXiv preprint arXiv:2303.09295, 2023.

Xi et al. [2023] Ziyi Xi, Wenmin Huang, Kangkang Wei, Weiqi Luo, and Peijia Zheng. Ai-generated image detection using a cross-attention enhanced dual-stream network. arXiv preprint arXiv:2306.07005, 2023.

Xie et al. [2020] Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan L Yuille, and Quoc V Le. Adversarial examples improve image recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 819–828, 2020.

Zhang et al. [2019] Xu Zhang, Svebor Karaman, and Shih-Fu Chang. Detecting and simulating artifacts in gan fake images. In 2019 IEEE international workshop on information forensics and security (WIFS), pages 1–6. IEEE, 2019.

Zhu et al. [2017] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 2223–2232, 2017.

Zhu et al. [2023] Mingjian Zhu, Hanting Chen, Qiangyu Yan, Xudong Huang, Guanyu Lin, Wei Li, Zhijun Tu, Hailin Hu, Jie Hu, and Yunhe Wang. Genimage: A million-scale benchmark for detecting ai-generated image. arXiv preprint arXiv:2306.08571, 2023.

GenDet: Towards Good Generalizations for AI-Generated Image Detection

Abstract

1 Introduction

2 Related Work

2.1 AI-Generated Image Detection

2.2 Teacher-Student Anomaly Detection

3 Method

3.1 Problem Definition and Method Overview

3.2 Teacher-Student Discrepancy-Aware Learning

3.3 Generalized Feature Augmentation

4 Experiments

4.1 Datasets and Experimental Settings

Datasets.

Implementation Details.

Evaluation Metrics.

4.2 GenDet on UniversalFakeDetect Dataset

Comparision to existing methods.

Effectiveness of GenDet.

Effectiveness of hyperparameter $M$ and $\alpha$ .

4.3 GenDet on GenImage Dataset

Comparision to existing methods.

Degraded Image Classification.

Cross-Dataset Evaluation.

Visulization.

5 Conclusion

References

Detection method	Generative Adversarial Networks						Deep fakes	Low level vision		Perceptual loss		Guided	LDM			Glide			DALL-E	Total
Detection method	Pro- GAN	Cycle- GAN	Big- GAN	Style- GAN	Gau- GAN	Star- GAN	Deep fakes	SITD	SAN	CRN	IMLE	Guided	200 steps	200 w/ CFG	100 steps	100 27	50 27	100 10	DALL-E	mAP(%)
CNNDet [43]	100.0	93.47	84.50	99.54	89.49	98.15	89.02	73.75	59.47	98.24	98.40	73.72	70.62	71.00	70.54	80.65	84.91	82.07	70.59	83.58
Patchfor [10]	80.88	72.84	71.66	85.75	65.99	69.25	76.55	76.19	76.34	74.52	68.52	75.03	87.10	86.72	86.40	85.37	83.73	78.38	75.67	77.73
Co-occurence [28]	99.74	80.95	50.61	98.63	53.11	67.99	59.14	68.98	60.42	73.06	87.21	70.20	91.21	89.02	92.39	89.32	88.35	82.79	80.96	78.11
Spec [47]	55.39	100.0	75.08	55.11	66.08	100.0	45.18	47.46	57.12	53.61	50.98	57.72	77.72	77.25	76.47	68.58	64.58	61.92	67.77	66.21
DIRE ${}^{*}$ [44]	100.0	76.73	72.8	97.06	68.44	100.0	98.55	54.51	65.62	97.10	93.74	94.29	95.17	95.43	95.77	96.18	97.30	97.53	68.73	87.63
Ojha et al. ${}^{*}$ [30]	100.0	99.46	99.59	97.24	99.98	99.60	82.45	61.32	79.02	96.72	99.00	87.77	99.14	92.15	99.17	94.74	95.34	94.57	97.15	93.38
Teacher	100.0	95.65	98.38	96.10	99.94	92.86	77.68	72.18	81.63	94.43	99.00	92.26	98.98	92.25	99.16	97.11	96.77	96.38	93.80	93.40
Teacher-Student AD	99.86	99.87	99.79	99.69	99.73	98.64	84.97	62.21	54.21	90.98	98.71	98.89	99.53	98.92	99.51	99.17	99.16	98.99	98.69	93.76
Teacher-Student AD w/fake	99.92	99.83	99.70	99.65	99.69	97.79	91.71	56.08	56.68	98.61	98.87	98.96	99.32	99.12	99.33	98.99	99.03	98.92	99.01	94.27
GenDet	99.95	99.95	99.92	99.92	99.92	99.25	91.38	61.23	72.66	97.90	98.88	99.30	99.85	99.51	99.85	99.50	99.46	99.19	99.47	95.64

Method	Testing Subset								Avg Acc.(%)
Method	Midjourney	SD V1.4	SD V1.5	ADM	GLIDE	Wukong	VQDM	BigGAN	Avg Acc.(%)
ResNet-50 [20]	54.9	99.9	99.7	53.5	61.9	98.2	56.6	52.0	72.1
DeiT-S [39]	55.6	99.9	99.8	49.8	58.1	98.9	56.9	53.5	71.6
Swin-T [27]	62.1	99.9	99.8	49.8	67.6	99.1	62.3	57.6	74.8
CNNDet [43]	52.8	96.3	95.9	50.1	39.8	78.6	53.4	46.8	64.2
Spec [47]	52.0	99.4	99.2	49.7	49.8	94.8	55.6	49.8	68.8
F3Net [32]	50.1	99.9	99.9	49.9	50.0	99.9	49.9	49.9	68.7
GramNet [26]	54.2	99.2	99.1	50.3	54.6	98.9	50.8	51.7	69.9
DIRE ${}^{*}$ [44]	60.2	99.9	99.8	50.9	55.0	99.2	50.1	50.2	70.7
Ojha et al. ${}^{*}$ [30]	73.2	84.2	84.0	55.2	76.9	75.6	56.9	80.3	73.3
GenDet	89.6	96.1	96.1	58.0	78.4	92.8	66.5	75.0	81.6

Method	Testing Subset						Avg Acc.(%)
Method	LR (112)	LR (64)	JPEG (q=65)	JPEG (q=30)	Blur ( $\sigma$ =3)	Blur ( $\sigma$ =5)	Avg Acc.(%)
ResNet-50 [20]	96.2	57.4	51.9	51.2	97.9	69.4	70.6
DeiT-S [39]	97.1	54.0	55.6	50.5	94.4	67.2	69.8
Swin-T [27]	97.4	54.6	52.5	50.9	94.5	52.5	67.0
CNNDet [43]	50.0	50.0	97.3	97.3	97.4	77.9	78.3
Spec [47]	50.0	49.9	50.8	50.4	49.9	49.9	50.1
F3Net [32]	50.0	50.0	89.0	74.4	57.9	51.7	62.1
GramNet [26]	98.8	94.9	68.8	53.4	95.9	81.6	82.2
DIRE ${}^{*}$ [44]	64.1	53.5	85.4	65.0	88.8	56.5	68.9
Ojha et al. ${}^{*}$ [30]	88.2	78.5	85.8	83.0	69.7	65.7	78.3
GenDet	85.4	84.1	94.4	94.3	84.6	82.9	87.6

GenDet: Towards Good Generalizations for AI-Generated Image Detection

Abstract

1 Introduction

2 Related Work

2.1 AI-Generated Image Detection

2.2 Teacher-Student Anomaly Detection

3 Method

3.1 Problem Definition and Method Overview

3.2 Teacher-Student Discrepancy-Aware Learning

3.3 Generalized Feature Augmentation

4 Experiments

4.1 Datasets and Experimental Settings

Datasets.

Implementation Details.

Evaluation Metrics.

4.2 GenDet on UniversalFakeDetect Dataset

Comparision to existing methods.

Effectiveness of GenDet.

Effectiveness of hyperparameter M𝑀Mitalic_M and α𝛼\alphaitalic_α.

4.3 GenDet on GenImage Dataset

Comparision to existing methods.

Degraded Image Classification.

Cross-Dataset Evaluation.

Visulization.

5 Conclusion

References

Effectiveness of hyperparameter $M$ and $\alpha$ .