11institutetext: Indian Institute of Technology Bhilai

Generalized Deepfake Attribution

Sowdagar Mahammad Shahid    Sudev Kumar Padhi    Umesh Kashyap and Sk. Subidh Ali
Abstract

The landscape of fake media creation changed with the introduction of Generative Adversarial Networks (GAN𝐺𝐴𝑁GANitalic_G italic_A italic_Ns). Fake media creation has been on the rise with the rapid advances in generation technology, leading to new challenges in Detecting fake media. A fundamental characteristic of GAN𝐺𝐴𝑁GANitalic_G italic_A italic_Ns is their sensitivity to parameter initialization, known as seeds. Each distinct seed utilized during training leads to the creation of unique model instances, resulting in divergent image outputs despite employing the same architecture. This means that even if we have one GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture, it can produce countless variations of GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N models depending on the seed used. Existing methods for attributing deepfakes work well only if they have seen the specific GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N model during training. If the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures are retrained with a different seed, these methods struggle to attribute the fakes. This seed dependency issue made it difficult to attribute deepfakes with existing methods. We proposed a generalized deepfake attribution network (GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t) to attribute fake images to their respective GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures, even if they are generated from a retrained version of the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture with a different seed (cross-seed) or from the fine-tuned version of the existing GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N model. Extensive experiments on cross-seed and fine-tuned data of GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N models show that our method is highly effective compared to existing methods. We have provided the source code to validate our results.

Keywords:
Model attribution, GAN fingerprints, Generative Adversarial Networks, Deep fake, Contrastive Learning

1 Introduction

Deepfakes are fake media (images, videos, audio, etc𝑒𝑡𝑐etcitalic_e italic_t italic_c.) generated using deep learning methods [14, 19]. Visual forensics has been confronted with several serious issues due to the development of deepfake technology. Deepfake leverages AI𝐴𝐼AIitalic_A italic_I to generate realistic media capable of deceiving people, which prompts concerns about its potential for spreading misinformation and infringing on privacy rights [1, 5, 22]. Deepfake detection is the process of identifying manipulated media information, which is often accomplished by analyzing abnormalities or inconsistencies in the generated fake media. Various approaches have arisen to address the issue of deepfake, including traditional forensic procedures, machine learning algorithms, and deep neural networks [10, 29, 4, 11, 7, 23]. Even though efforts have been made to identify generated fake media, the process of discriminating between real and fake is just getting started. Along the same line, attributing fake media generated by Generative Adversarial Networks (GAN𝐺𝐴𝑁GANitalic_G italic_A italic_Ns) is an important step in combating misinformation and identifying the architectures involved in the generation of fake media [16, 28, 27, 18, 24, 12, 8, 26, 6, 25]. GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N is a type of deep learning model that has gained popularity for its capability to produce realistic images and videos that are often indistinguishable from real media. However, this capability also raises concerns regarding the proliferation of fake images and its potential societal impact. Law enforcement agencies face significant challenges in determining and validating the genuineness of any image. At the same time, it is also hard to train GAN𝐺𝐴𝑁GANitalic_G italic_A italic_Ns as it demands substantial computational resources, data, and expertise from seasoned engineers, making it a high-value intellectual property. Therefore, trained GAN𝐺𝐴𝑁GANitalic_G italic_A italic_Ns should be protected through patents, licensing, copyrights, and trademarks. This led to an increased interest in fingerprinting and attributing generative models (GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N), where the generator or its output images are labeled based on its fingerprint. This field is still in its early stages and needs extensive investigation and resolution towards its maturity.

Attributing GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N-generated images involves identifying the fingerprint or the pattern of a specific model in its generated image. This can be thought of as map** the ballistic fingerprint of a fired bullet to its gun. Previous research has mainly focused on two aspects for identifying the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N generated images: The first one involves embedding a unique fingerprint in the training data such that the generated image will contain the same fingerprint [16, 28, 27]. The second approach focuses on identifying the unique -patterns left behind by different GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures on generated images [18, 24, 12, 8, 26, 6, 11]. The first approach needs white box access to the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_Ns for training attribution networks. The second approach is more popular as GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N fingerprints may include distinctive patterns in the generated images, such as artifacts, textures, or stylistic features, which can be indicative of the underlying model’s characteristics without needing access to the model.

In terms of practical implementation, prior research on GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N attribution has solely addressed model-level attribution [18, 24, 12, 8, 26, 6, 11]. This means that both training and testing images are generated from a single model (a GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture trained with a specific seed), thereby limiting the attribution model’s ability to attribute fake images as it overfits with the training data of seen models and fails to attribute images generated from unseen models. Overfitting highlights the inability of the attribution network to extract architecture-dependent features. One can bypass this method by retraining or fine-tuning the generator for extra epochs, which will alter its fingerprint, making it a new/unseen generator model to the attribution network. Thus, there are an infinite number of unseen generator models possible that too for a given architecture. Therefore, model-level attribution becomes infeasible and impractical. This will also make it difficult to protect the intellectual property of GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures, as just retraining or fine-tuning the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N for a few extra epochs will lead to a new model. This motivated us to address the problem of fake image attribution in a broader context by attributing such images to the underlying architecture. In this paper, we introduce a novel approach utilizing the Generalized Deepfake Attribution Network (GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t).

Architecture-level attribution necessitates the attribution of fake images to the architectures of the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N models, irrespective of any modifications made to the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N models, such as retraining it with a new seed or fine-tuning it for certain extra epochs. Despite being more general in scope compared to model-level attribution, architecture-level attribution remains a formidable task. In the experiments, we observe that GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture will likely leave consistent architecture-dependent patterns in all its generated images. To highlight the traces of the architecture on the images, we have used supervised contrastive learning [15] and formed a Feature Extraction Network (FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N), which, with the help of a classifier network, can successfully attribute the fake image to its corresponding GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture. To capture high-level content independent of architecture-level traces, we employed a denoising autoencoder. In summary, we make the following contributions:

  • We proposed a GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t, which aims to attribute fake images to their source architectures, irrespective of whether the models producing those images have been retrained with an alternative seed or fine-tuned for additional epochs.

  • We devise FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N along with a denoising autoencoder to find the data-independent and architecture-dependent traces using supervised contrastive learning.

  • We conducted comprehensive evaluations of our attribution network on various GAN𝐺𝐴𝑁GANitalic_G italic_A italic_Ns to demonstrate the accuracy and robustness of our approach in attribution.

Refer to caption
Figure 1: The key difference between the existing and the proposed method. Existing methods focus on model-level attribution, while our proposed method focuses on architecture-level attribution. Thus the existing methods fails when attribution is performed on the images generated from retrained or fine-tuned version of GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N having the same architecture.

2 Related Work

2.1 Deepfake Attribution

Many methods are proposed to tackle the increased use of deepfakes by differentiating real and fake images [10, 29, 3, 4, 11, 7, 23]. These methods solve one set of challenges, i.e.formulae-sequence𝑖𝑒i.e.italic_i . italic_e ., identifying fake images. Deepfakes are used to perform various malicious activities like scamming, blackmailing, etc. Along with that, training and fine-tuning models to generate high-quality deepfakes need a lot of resources and domain knowledge. Hence, there is a high chance that deepfake creators use existing methods to generate fake data. Thus, the original fake creator can be backtracked successfully through an attribution network. Identifying the source of these deepfakes is a huge help for law enforcement agencies. This highlights the need to attribute the models responsible for generating deepfake, which will protect the generative model and restrict its misuse.

Existing Deepfake attribution methods can be classified into two categories. Methods used in the first category attribute fake images by retrieving the fingerprint from the fake images [16, 28, 27]. Generally, these fingerprints are inserted in the training dataset of the generative models while training. Thus, fingerprints from the generated image are extracted to find a particular generative model. The primary issue in this approach is the need for white box access to the generative model. Another issue is that attribution cannot be performed on pre-trained models where the training is not performed on fingerprint-embedded datasets, and retraining existing models to embed fingerprints is a time and resource-consuming procedure. Methods in the second category attribute fake images by finding unique patterns in the images generated by different GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N models [18, 24, 12, 8, 26, 6, 25]. These methods do not require access to generative models and align with real-world scenarios. These methods are mainly based on statistical and deep learning methods. In statistical methods, the focus is finding residual noise in the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N generated image using the denoising filters [18] and performing frequency analysis by transforming the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N generated image using discrete cosine transform [8] and discrete Fourier transform [12]. The deep learning methods use different image transformation techniques, filters, residual images, and loss functions to find the subtle features, which are passed into a classifier for attribution. These methods can attribute GAN𝐺𝐴𝑁GANitalic_G italic_A italic_Ns with high accuracy [24, 26, 25].

Still, almost all the approaches focus on performing attribution of seen models, i.e.formulae-sequence𝑖𝑒i.e.italic_i . italic_e ., both training and testing data are generated from the same trained generative model. These methods fail to generalize if the testing data is generated from the retrained version of the generative model (cross-seed data and fine-tuned). The authors of [25] addressed this issue and proposed a Patchwise Contrastive Learning approach called DNA𝐷𝑁𝐴DNAitalic_D italic_N italic_A-Det𝐷𝑒𝑡Detitalic_D italic_e italic_t. In this work, they focused on patches of the images to identify the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N traces. As their model focuses on patch level, any change in test image size will result in drastic failure in attribution. Hence, the persistence of seed dependency poses a significant challenge in performing GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N attribution. This issue arises due to the fact that infinite possible models can be generated by varying the seed. Consequently, this variability complicates the process of attributing GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N-generated image to its origin. In this paper, we proposed a GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t to attribute fake images to their respective GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture, even if the fake images used in testing are generated from the retrained version of GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture with different seed or fine-tuned version of the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture as shown in Fig 1.

2.2 Supervised Contrastive Learning

Supervised Contrastive Learning [15] is a technique through which a pair of data points belonging to the same class (positive samples) are drawn close together within the embedding space. In contrast, a pair of data points belonging to different classes (negative samples) are pushed farther apart as shown in Fig 2. Here, each sample (positive or negative) is passed through a convolutional neural network to extract high-level features. The features extracted by the neural network are used to compute supervised contrastive loss. Optimization of this loss function brings positive samples close together in the embedding space and negative samples far apart. The authors of  [25] utilized supervised contrastive loss within their patch-wise contrastive learning technique, demonstrating enhanced results in fake attribution compared to their baseline approach. Similarly, in the work by the authors of [9], an unsupervised version of contrastive loss was employed to train their feature extraction network, leading to improved deepfake detection performance. Contrastive learning proves to be highly effective as it maintains the consistency of extracted features. Leveraging this technique, we trained our feature extraction model to obtain data-independent, seed-independent features corresponding to images generated from a specific GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture as shown in Fig 2. The supervised contrastive loss is calculated using the following equation:

Lsupcontr=iI(1|P(i)|pP(i)log(exp(zizp/τ)aA(i)exp(ziza/τ)))subscript𝐿𝑠𝑢𝑝𝑐𝑜𝑛𝑡𝑟subscript𝑖𝐼1𝑃𝑖subscript𝑝𝑃𝑖subscript𝑧𝑖subscript𝑧𝑝𝜏subscript𝑎𝐴𝑖subscript𝑧𝑖subscript𝑧𝑎𝜏L_{supcontr}=\sum_{i\in I}\left(-\frac{1}{|P(i)|}\sum_{p\in P(i)}\log\left(% \frac{\exp(z_{i}\cdot z_{p}/\tau)}{\sum_{a\in A(i)}\exp(z_{i}\cdot z_{a}/\tau)% }\right)\right)italic_L start_POSTSUBSCRIPT italic_s italic_u italic_p italic_c italic_o italic_n italic_t italic_r end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT ( - divide start_ARG 1 end_ARG start_ARG | italic_P ( italic_i ) | end_ARG ∑ start_POSTSUBSCRIPT italic_p ∈ italic_P ( italic_i ) end_POSTSUBSCRIPT roman_log ( divide start_ARG roman_exp ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_z start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT / italic_τ ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_a ∈ italic_A ( italic_i ) end_POSTSUBSCRIPT roman_exp ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_z start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT / italic_τ ) end_ARG ) ) (1)

where iI{1,,2N}𝑖𝐼12𝑁i\in I\equiv\{1,\ldots,2N\}italic_i ∈ italic_I ≡ { 1 , … , 2 italic_N } be the index of an arbitrary augmented sample in a minibatch and A(i) I{i}absent𝐼𝑖\equiv I\setminus\{i\}≡ italic_I ∖ { italic_i }, P(i){pA(i):y~p=y~i}𝑃𝑖conditional-set𝑝𝐴𝑖subscript~𝑦𝑝subscript~𝑦𝑖P(i)\equiv\{p\in A(i):\tilde{y}_{p}=\tilde{y}_{i}\}italic_P ( italic_i ) ≡ { italic_p ∈ italic_A ( italic_i ) : over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } is the set of indices of all positives samples in the mini-batch distinct from i𝑖iitalic_i, and |P(i)|𝑃𝑖|P(i)|| italic_P ( italic_i ) | its cardinality. The notations are used by referring  [15].

Refer to caption
Figure 2: Supervised Contrastive learning brings embeddings of positive samples (image augmentation pair and image pair from the same class) closer and pushes negative samples (image pair from different classes) farther apart. In our case, image augmentations and the images generated from the same GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture are brought closer, while images generated from different GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures are pushed farther apart. The anchor image is generated from SNGAN𝑆𝑁𝐺𝐴𝑁SNGANitalic_S italic_N italic_G italic_A italic_N. Thus, the embeddings of anchor’s augmented image, and the image generated from retrained SNGAN𝑆𝑁𝐺𝐴𝑁SNGANitalic_S italic_N italic_G italic_A italic_N (seed-2222) are brought close. In the same line, the embeddings of images generated from SNGAN𝑆𝑁𝐺𝐴𝑁SNGANitalic_S italic_N italic_G italic_A italic_N and ProGAN𝑃𝑟𝑜𝐺𝐴𝑁ProGANitalic_P italic_r italic_o italic_G italic_A italic_N are pushed further apart.

3 Proposed Approach

3.1 Problem definition

The popularity of GAN𝐺𝐴𝑁GANitalic_G italic_A italic_Ns has inspired the research community to use it in various applications. This has led to the development of diverse GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures with enhanced generation capabilities. GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture attribution can be formulated as a multi-class classification task, where the aim is to attribute each image to its source GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture or label it as real. Given an image xysuperscript𝑥𝑦x^{y}italic_x start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT with source yY={real,G1,G2,,GN}𝑦𝑌𝑟𝑒𝑎𝑙subscript𝐺1subscript𝐺2subscript𝐺𝑁y\in Y=\{real,G_{1},G_{2},\ldots,G_{N}\}italic_y ∈ italic_Y = { italic_r italic_e italic_a italic_l , italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_G start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT }, where {G1,G2,,GN}subscript𝐺1subscript𝐺2subscript𝐺𝑁\{G_{1},G_{2},...,G_{N}\}{ italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_G start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT }, are different GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures. Our goal is to learn a map** f(xy)y𝑓superscript𝑥𝑦𝑦f(x^{y})\rightarrow yitalic_f ( italic_x start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT ) → italic_y.

3.2 Overview

The framework of GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t can be seen in Fig 3 and Fig 4. GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t contains two sub-networks: one is a Feature Extraction Network (FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N), and the other is a multi-class classification network. Both networks are trained separately. First, the FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N is trained to extract seed-independent features from fake images, which focus more on the architecture of the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N rather than the data generated by it. These features essentially represent the fingerprints of the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture. Secondly, the class classification network is trained using the features obtained from the FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N for final attribution. There are two different variations of FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N used in our proposed approach.

3.2.1 Feature extraction network(FEN)

Using deep learning to perform multi-class classification is a well-known approach. The same approach can be followed in GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N attribution, where the training set will contain the images generated from different GAN𝐺𝐴𝑁GANitalic_G italic_A italic_Ns along with the real images. A regular deep learning-based classifier can be trained using this training data and GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N attribution can be performed. This approach seems intuitive and can be used to address the problem of GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N attribution. Although using a regular deep learning-based classifier will work to a certain extent, there is a drawback in training it using the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N generated images directly. The drawback is that the classifier learns the semantic features (low-level features) that are specific to the content of the image. This can impact the accuracy of the classifier due to the fact that different GAN𝐺𝐴𝑁GANitalic_G italic_A italic_Ns can be trained to generate similar types of images. Thus, the classification based on semantic features will extract similar features from different GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N generated images, which will result in low accuracy. The incorporation of FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N solves this issue by extracting the semantic invariant features. The goal of FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N is to extract features related to the architecture of the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N that are least dependent on the generated content. Essentially, FEN acts as a fingerprint identifier for the underlying GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture. We have proposed two different variations for the FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N network, which are Vanilla-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N and Denoiser-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N.

3.2.2 Vanilla-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N:

The input to Vanilla-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N (Fig 3) is real (CelebA𝐶𝑒𝑙𝑒𝑏𝐴CelebAitalic_C italic_e italic_l italic_e italic_b italic_A) and fake image generated from different GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures. It outputs a 2048204820482048-dimensional feature embedding that is again downscaled to 1×12811281\times 1281 × 128-dimensional feature embedding using a deep neural network. The 2048204820482048-dimensional feature embedding is called as classification head, which is used to train the classification network for attribution. The 128128128128-dimensional feature embedding is called as projection head, which is used to calculate the supervised contrastive loss for training the FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N. The idea behind using FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N and training it using supervised contrastive loss is to get content-independent feature embeddings such that the similarity between these embeddings should be very high if the embeddings correspond to the same-seed or cross-seed images of the same GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture.

Refer to caption
Figure 3: GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t architecture using Vanilla FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N for attributing GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures. It consists of two networks:::: Feature Extraction Network (FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N) and Classification Network. FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N is trained by applying supervised contrastive loss on its 128-dimension embedding output. The intermediate layer output (2048-dimensional) of FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N is used to train the classifier network for attribution.

3.2.3 Denoiser-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N:

To enhance the capability of Vanilla-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N in extracting content-independent features, we have to reduce the semantic dependency arising by directly giving the image as an input to FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N. Previous works [18, 8] have shown that high-level features are content-independent and can be used as a fingerprint of GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N by extracting unique patterns using residual filers and frequency analysis. Motivated by the work of [8, 18], we have trained a denoising autoencoder on the real images. Once the denoising autoencoder is trained, a generic image X𝑋Xitalic_X, which can be a real or a GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N generated image, is given as an input. The high-level content of X𝑋Xitalic_X is calculated as X1=h(X)subscript𝑋1𝑋X_{1}=h(X)italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_h ( italic_X ) where hhitalic_h represents the trained denoising autoencoder (DAE𝐷𝐴𝐸DAEitalic_D italic_A italic_E). Now, we calculated the residual (Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT) for each image Ri=abs(XXi)subscript𝑅𝑖𝑎𝑏𝑠𝑋subscript𝑋𝑖R_{i}=abs(X-X_{i})italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_a italic_b italic_s ( italic_X - italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) where abs𝑎𝑏𝑠absitalic_a italic_b italic_s represents absolute value. These residuals are semi-content dependent (unique for each input image), as shown in Fig 4. Hence, we can’t directly consider these residuals as fingerprints corresponding to a particular GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N. To extract the hidden fingerprint from these residuals, FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N is trained on these residuals using supervised contrastive loss. Unlike Vanilla-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N, instead of directly feeding the images as input to FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N, the extracted residuals for all the images (real and GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N generated images) are fed as input to FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N of Denoiser-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N. Similar to Vanilla-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N, there is a projection head and a classification head in Denoiser-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N, which have 1×12811281\times 1281 × 128 and 1×2048120481\times 20481 × 2048-dimensional feature embedding, respectively.

Refer to caption
Figure 4: GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t architecture using Denoiser FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N for attributing GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures. It consists of three networks:::: Denoising autoencoder (DAE𝐷𝐴𝐸DAEitalic_D italic_A italic_E), Feature Extraction Network (FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N), and Classification Network. DAE𝐷𝐴𝐸DAEitalic_D italic_A italic_E and FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N together are referred as Denoiser-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N.

3.2.4 Multi-class classification network:

Our multi-class classification network is a deep neural network with fully connected layers. We trained this network using the 1×2048120481\times 20481 × 2048-dimensional feature embeddings generated from the classification head of FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N. This classification network makes the final attribution of the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture. It is to be noted that we have trained different classifiers for Vanilla-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N and Denoiser-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N.

4 Experiments

4.1 Setup

In this section, we validate the effectiveness of GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t by conducting in-depth experiments. We performed all our experiments using the machine with 14141414-core𝑐𝑜𝑟𝑒coreitalic_c italic_o italic_r italic_e Intel𝐼𝑛𝑡𝑒𝑙Intelitalic_I italic_n italic_t italic_e italic_l i9𝑖9i9italic_i 9 10940X10940𝑋10940X10940 italic_X CPU𝐶𝑃𝑈CPUitalic_C italic_P italic_U, 128128128128 GB𝐺𝐵GBitalic_G italic_B RAM𝑅𝐴𝑀RAMitalic_R italic_A italic_M, and two Nvidia𝑁𝑣𝑖𝑑𝑖𝑎Nvidiaitalic_N italic_v italic_i italic_d italic_i italic_a RTX𝑅𝑇𝑋RTXitalic_R italic_T italic_X-5000500050005000 GPU𝐺𝑃𝑈GPUitalic_G italic_P italic_Us with 16161616 GB𝐺𝐵GBitalic_G italic_B VRAM𝑉𝑅𝐴𝑀VRAMitalic_V italic_R italic_A italic_M each.

4.1.1 Dataset:

In the case of Vanilla-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N training data consists of real images from the CelebA𝐶𝑒𝑙𝑒𝑏𝐴CelebAitalic_C italic_e italic_l italic_e italic_b italic_A [17] dataset and fake images generated from trained GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N instances based on four different GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures (DCGAN𝐷𝐶𝐺𝐴𝑁DCGANitalic_D italic_C italic_G italic_A italic_N [21], WGAN𝑊𝐺𝐴𝑁WGANitalic_W italic_G italic_A italic_N [2], ProGAN𝑃𝑟𝑜𝐺𝐴𝑁ProGANitalic_P italic_r italic_o italic_G italic_A italic_N [13] and SNGAN𝑆𝑁𝐺𝐴𝑁SNGANitalic_S italic_N italic_G italic_A italic_N [20]) trained on CelebA𝐶𝑒𝑙𝑒𝑏𝐴CelebAitalic_C italic_e italic_l italic_e italic_b italic_A referred to as {G1\{G_{1}{ italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, G2subscript𝐺2G_{2}italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, G3subscript𝐺3G_{3}italic_G start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, G4}G_{4}\}italic_G start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT }, respectively. The classification network of Vanilla-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N, used to attribute features to respective GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture, is trained with the output of the classification head of Vanilla-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N (1×2048120481\times 20481 × 2048-dimensional feature embedding Fig 3). For Denoiser-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N, the denoising autoencoder is trained on the CelebA𝐶𝑒𝑙𝑒𝑏𝐴CelebAitalic_C italic_e italic_l italic_e italic_b italic_A dataset. The FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N of Denoiser-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N is trained with the residual images of real as well as fake images from four GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N models of DCGAN𝐷𝐶𝐺𝐴𝑁DCGANitalic_D italic_C italic_G italic_A italic_N, WGAN𝑊𝐺𝐴𝑁WGANitalic_W italic_G italic_A italic_N, ProGAN𝑃𝑟𝑜𝐺𝐴𝑁ProGANitalic_P italic_r italic_o italic_G italic_A italic_N and SNGAN𝑆𝑁𝐺𝐴𝑁SNGANitalic_S italic_N italic_G italic_A italic_N. The classifier used to attribute features to respective GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture is trained with the output of the classification head of Denoiser-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N (1×2048120481\times 20481 × 2048-dimensional feature embedding Fig 4).

4.1.2 Model Architecture:

We considered the encoder network and the projection network used by [15] as our FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N network for both Vanilla-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N and Denoiser-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N. The projection network is concatenated with the encoder network such that the output of the encoder network is input to the projection network. The classification network contains 4444 fully connected layers with 128128128128,64646464,16161616, and 5555 neurons in each layer, respectively, for both Vanilla-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N and Denoiser-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N. ReLu𝑅𝑒𝐿𝑢ReLuitalic_R italic_e italic_L italic_u activation is used in the intermediate layers, and softmax is in the final layer. In the case of Denoiser-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N, the denoising autoencoder consists of encoder and decoder architecture based on a convolution neural network. The encoder, decoder contains 3333, 4444 convolutional layers respectively excluding pooling and normalization layers.

Refer to caption
Figure 5: TSNE𝑇𝑆𝑁𝐸TSNEitalic_T italic_S italic_N italic_E plot of feature embeddings of training and testing data(cross-seed) generated from FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N of Denoiser-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N. Fig A represents feature embedding space for training data which contains data generated from multiple instances of GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures (trained with multiple seeds). Fig B represents feature embedding space of testing data which contains data generated from a completely new instance of GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures(cross-seed).

4.1.3 Training Details:

We trained our FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N, of both Vanilla-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N and Denoiser-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N, using supervised contrastive loss with SGD𝑆𝐺𝐷SGDitalic_S italic_G italic_D optimizer (learning rate 0.003) and classifier network, attached to both Vanilla-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N and Denoiser-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N, using cross-entropy loss with Adam𝐴𝑑𝑎𝑚Adamitalic_A italic_d italic_a italic_m optimizer (learning rate 0.000030.000030.000030.00003). While the denoising autoencoder is trained using mean absolute error with Adam𝐴𝑑𝑎𝑚Adamitalic_A italic_d italic_a italic_m optimizer (learning rate 0.00010.00010.00010.0001). To generate fake images, we trained five GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N instances corresponding to each of the four architectures {G1,G2,G3,G4}subscript𝐺1subscript𝐺2subscript𝐺3subscript𝐺4\{G_{1},G_{2},G_{3},G_{4}\}{ italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT } using CelebA𝐶𝑒𝑙𝑒𝑏𝐴CelebAitalic_C italic_e italic_l italic_e italic_b italic_A. We refer these GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N instances as Gjisuperscriptsubscript𝐺𝑗𝑖G_{j}^{i}italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, where Gjsubscript𝐺𝑗G_{j}italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT (1j41𝑗41\leq j\leq 41 ≤ italic_j ≤ 4) is the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture corresponding to which five GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N instances (Gj1,Gj2,Gj3,Gj4,Gj5superscriptsubscript𝐺𝑗1superscriptsubscript𝐺𝑗2superscriptsubscript𝐺𝑗3superscriptsubscript𝐺𝑗4superscriptsubscript𝐺𝑗5G_{j}^{1},G_{j}^{2},G_{j}^{3},G_{j}^{4},G_{j}^{5}italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT , italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT , italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT) are trained with different seeds. Out of these five instances (Gjisubscriptsuperscript𝐺𝑖𝑗G^{i}_{j}italic_G start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT) for each architecture (Gjsubscript𝐺𝑗G_{j}italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT), we generate images from four instances (Gj1,Gj2,Gj3,Gj4superscriptsubscript𝐺𝑗1superscriptsubscript𝐺𝑗2superscriptsubscript𝐺𝑗3superscriptsubscript𝐺𝑗4G_{j}^{1},G_{j}^{2},G_{j}^{3},G_{j}^{4}italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT , italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT) for training and from the fifth instance (Gj5subscriptsuperscript𝐺5𝑗G^{5}_{j}italic_G start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT) for testing our GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t. While training our GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t, we clubbed the generated images from four instances of each GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture and gave four labels based on GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures and one label for real images. The benefit of clubbing the images with different seeds of the same GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture during training is that we can get similar embeddings for images generated from a particular GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture, although trained with different seeds, as shown in Fig 5.

4.2 Results

Initially, we started our experiments with simple setups, with only two instances Gj1subscriptsuperscript𝐺1𝑗G^{1}_{j}italic_G start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and Gj2subscriptsuperscript𝐺2𝑗G^{2}_{j}italic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT of each GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture Gjsubscript𝐺𝑗G_{j}italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, (1j41𝑗41\leq j\leq 41 ≤ italic_j ≤ 4). We first trained a simple multiclass classifier with real images of CelebA𝐶𝑒𝑙𝑒𝑏𝐴CelebAitalic_C italic_e italic_l italic_e italic_b italic_A and fake images from Gj1superscriptsubscript𝐺𝑗1G_{j}^{1}italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT models. When we tested this classifier with fake images of Gj1subscriptsuperscript𝐺1𝑗G^{1}_{j}italic_G start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, i.e.,formulae-sequence𝑖𝑒i.e.,italic_i . italic_e . , the same GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N instance used to generate the training data, it gave a high accuracy of 96%percent9696\%96 %. However, when tested with the fake images from the second instance Gj2subscriptsuperscript𝐺2𝑗G^{2}_{j}italic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT of the same GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures Gjsubscript𝐺𝑗G_{j}italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, the classifier accuracy drastically dropped 72%percent7272\%72 % (Second column of Table 1). The result shows that the training methodology and the complexity of architecture used for attribution are not sufficient to extract architecture-level features from GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N-generated images. Subsequently, inspired by the work of [18, 8], we trained a denoising autoencoder on the CelebA𝐶𝑒𝑙𝑒𝑏𝐴CelebAitalic_C italic_e italic_l italic_e italic_b italic_A dataset. Now we passed images generated from the two instances Gj1subscriptsuperscript𝐺1𝑗G^{1}_{j}italic_G start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and Gj2subscriptsuperscript𝐺2𝑗G^{2}_{j}italic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT of each GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture Gjsubscript𝐺𝑗G_{j}italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, (1j41𝑗41\leq j\leq 41 ≤ italic_j ≤ 4) through denoising autoencoder and obtained their residuals (difference between input and output of denoising autoencoder). We used residuals from one GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N instance (Gj1subscriptsuperscript𝐺1𝑗G^{1}_{j}italic_G start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT) for training and the residuals from the second instance of the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N (Gj2subscriptsuperscript𝐺2𝑗G^{2}_{j}italic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT) for testing the multiclass classifier. We obtained slight improvement in test accuracy and macro F1𝐹1F1italic_F 1 score, but still, the model is incapable of extracting architecture-dependent traces.

Table 1: Attribution of real and fake image through different experimental setups. We have shown accuracy and F1𝐹1F1italic_F 1 score for the different combinations. Closed-set represents both training and testing images are generated from the same instance of GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures. Cross-seed represents both training and testing images are generated from different instances of GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures. From the experiments, it is found that an experimental set-up with DAE𝐷𝐴𝐸DAEitalic_D italic_A italic_E and FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N and a simple classification network trained using real data(CelebA𝐶𝑒𝑙𝑒𝑏𝐴CelebAitalic_C italic_e italic_l italic_e italic_b italic_A) and images generated from multiple instances of GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures yields the highest accuracy on cross-seed test data.
Method
/Metric
Single-seed
+ classifer
Single-seed
+ DAE𝐷𝐴𝐸DAEitalic_D italic_A italic_E
+ classifer
Multi-seed
+ DAE𝐷𝐴𝐸DAEitalic_D italic_A italic_E
+ classifer
Multi-seed
+ FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N
+ classifer
Multi-seed
+ DAE𝐷𝐴𝐸DAEitalic_D italic_A italic_E
+ FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N
+ classifer
Closed-Set
Accuracy
96 97 96.5 100 100
Cross-Seed
Accuracy
72 73.3 89 95.2 99.2
Cross-Seed
Macro F1𝐹1F1italic_F 1 Score
0.68 0.73 0.88 0.95 0.99

To further improve our results on cross-seed data (training and testing data are generated from different instances of GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures), we took five instances of each GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture, where images from four out of the five GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N instances along with real images are used for training, and the images from the fifth instances are used for testing. Residual images are generated from the test and training images using the denoising autoencoder as described above. The multi-class classifier is trained and tested with the residuals generated from the denoising autoencoder. Following this experimental setup gave us a significant improvement (16%absentpercent16\approx 16\%≈ 16 %) in the test results compared to the previous experimental setup. The results of this setup motivated us to use data from multiple instances of GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures during training.

Subsequently, we incorporated a FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N to build two variations of GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t which are Vanilla-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N (Fig 3) and Denoiser-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N (Fig 4). We trained both GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t variants using supervised contrastive loss. With the inclusion of FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N, our model’s generalisation capability is increased, and it is perfectly attributing the cross-seed fake images and real images, as shown in the last two columns of Table 1. The results shows that Denoiser-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N is the most suitable setup for GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t and we finalysed Denoiser-FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N variant as our final GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t. To observe the significance of data generated from multiple instances of GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures used in training, we trained our GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t with data generated from a single instance of GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures and tested with data generated from a new instance of GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures, we observed reduction in attribution performance. The comparative results are shown in Table 3 and the confusion matrices are shown in Fig 6.

4.2.1 Evaluation on cross-seed data:

We tested our GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t with cross-seed data (images generated from the fifth instance of GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures and unseen CelebA𝐶𝑒𝑙𝑒𝑏𝐴CelebAitalic_C italic_e italic_l italic_e italic_b italic_A data). Our cross-seed data contains images generated fifth instance of all the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures used in training the GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t. From the five instances of data, generated from different GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures, we took multiple combinations of GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N instances to generate data for training and testing our GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t. We observed similar results on all these combinations. Authors of SOTA𝑆𝑂𝑇𝐴SOTAitalic_S italic_O italic_T italic_A [15] performed testing only on cross-seed data of ProGAN𝑃𝑟𝑜𝐺𝐴𝑁ProGANitalic_P italic_r italic_o italic_G italic_A italic_N, while the training data for their model (DNA𝐷𝑁𝐴DNAitalic_D italic_N italic_A-Det𝐷𝑒𝑡Detitalic_D italic_e italic_t) is generated from ProGAN𝑃𝑟𝑜𝐺𝐴𝑁ProGANitalic_P italic_r italic_o italic_G italic_A italic_N, SNGAN𝑆𝑁𝐺𝐴𝑁SNGANitalic_S italic_N italic_G italic_A italic_N, MMDGAN𝑀𝑀𝐷𝐺𝐴𝑁MMDGANitalic_M italic_M italic_D italic_G italic_A italic_N, InfoMaxGAN𝐼𝑛𝑓𝑜𝑀𝑎𝑥𝐺𝐴𝑁InfoMaxGANitalic_I italic_n italic_f italic_o italic_M italic_a italic_x italic_G italic_A italic_N. To test the SOTA𝑆𝑂𝑇𝐴SOTAitalic_S italic_O italic_T italic_A on cross-seed data of all the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures used in training we trained the SOTA𝑆𝑂𝑇𝐴SOTAitalic_S italic_O italic_T italic_A model using their experimental setup with our training and tested with our testing data. The test results are discussed in Section 4.3.

Table 2: Attribution result of GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t on fine-tuned GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N data generated by fine-tuning the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N models for 10101010 additional epochs
GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t
Epoch:
x
Epoch:
x+10
DCGAN 99.96 99.3
SNGAN 97.13 96.8
WGAN 99.55 98
Table 3: Attribution results of GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t trained with single seed and multiple seed data.
GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t Single-Seed Multiple-Seed
Closed-Set
accuracy
100 100
Cross-Seed
accuracy
77 99.2
Cross-Seed
Macro F1𝐹1F1italic_F 1 Score
0.76 0.99

4.2.2 Effect of fine-tuning

Whenever a GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture is fine-tuned for additional epochs, its generation capability will change, and the images generated by it for the same noise will be different. A robust attribution network should not be sensitive to fine-tuning GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N models until the underlying GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture remains the same. To check whether this fine-tuning of the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures affects the attribution capability of our attribution network (GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t), we did the following experiment. We fine-tuned GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture (DCGAN𝐷𝐶𝐺𝐴𝑁DCGANitalic_D italic_C italic_G italic_A italic_N, WGAN𝑊𝐺𝐴𝑁WGANitalic_W italic_G italic_A italic_N, SNGAN𝑆𝑁𝐺𝐴𝑁SNGANitalic_S italic_N italic_G italic_A italic_N) instances with the training data for 10101010 additional epochs and considered it as fine-tuned GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N data. We tested our GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t, with the images generated from fine-tuned GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N data and obtained satisfactory results as shown in Table 3. These results show the robustness of our GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t for fine-tuning GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures. We also tested the existing attribution methods with this fine-tuned GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N data and the results are discussed in Section 4.3. Here, fine-tuning means taking a pre-trained GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N model and resuming its training with the same training data, used in the initial training.

Refer to caption
Figure 6: Fig A represents the confusion matrix for the GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t trained on a single seed, and Fig B represents the GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t trained on multiple seeds. Fig A and Fig B shows that GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t trained with single seed data is more confused in attribution task and GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t trained with multiple seed data is attributing GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architectures with high accuracy and very less confusion.

4.3 Comparision

We compared our GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t with existing deepfake attribution methods of LeveFreq𝐿𝑒𝑣𝑒𝐹𝑟𝑒𝑞LeveFreqitalic_L italic_e italic_v italic_e italic_F italic_r italic_e italic_q [8], AttNet𝐴𝑡𝑡𝑁𝑒𝑡AttNetitalic_A italic_t italic_t italic_N italic_e italic_t [26] and the SOTA𝑆𝑂𝑇𝐴SOTAitalic_S italic_O italic_T italic_A method, DNA𝐷𝑁𝐴DNAitalic_D italic_N italic_A-Det𝐷𝑒𝑡Detitalic_D italic_e italic_t [25]. Except DNA𝐷𝑁𝐴DNAitalic_D italic_N italic_A-Det𝐷𝑒𝑡Detitalic_D italic_e italic_t, all other methods focused on GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N attribution of seen models during training (both training and testing data generated from the same trained GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N model). Authors of SOTA𝑆𝑂𝑇𝐴SOTAitalic_S italic_O italic_T italic_A tested their model DNA𝐷𝑁𝐴DNAitalic_D italic_N italic_A-Det𝐷𝑒𝑡Detitalic_D italic_e italic_t with only cross-seed data of ProGAN𝑃𝑟𝑜𝐺𝐴𝑁ProGANitalic_P italic_r italic_o italic_G italic_A italic_N. To make a proper comparison, we trained the existing methods of AttNet𝐴𝑡𝑡𝑁𝑒𝑡AttNetitalic_A italic_t italic_t italic_N italic_e italic_t, LeveFreq𝐿𝑒𝑣𝑒𝐹𝑟𝑒𝑞LeveFreqitalic_L italic_e italic_v italic_e italic_F italic_r italic_e italic_q, and DNA𝐷𝑁𝐴DNAitalic_D italic_N italic_A-Det𝐷𝑒𝑡Detitalic_D italic_e italic_t with real images from CelebA𝐶𝑒𝑙𝑒𝑏𝐴CelebAitalic_C italic_e italic_l italic_e italic_b italic_A dataset and fake images generated from ProGAN𝑃𝑟𝑜𝐺𝐴𝑁ProGANitalic_P italic_r italic_o italic_G italic_A italic_N, SNGAN𝑆𝑁𝐺𝐴𝑁SNGANitalic_S italic_N italic_G italic_A italic_N, DCGAN𝐷𝐶𝐺𝐴𝑁DCGANitalic_D italic_C italic_G italic_A italic_N, and WGAN𝑊𝐺𝐴𝑁WGANitalic_W italic_G italic_A italic_N following the same experimental setup used in these existing methods. The test results are shown in Table 4. From these results, it is clear that the existing attribution models performed well on closed-set data and gave satisfactory results on cross-seed data of ProGAN𝑃𝑟𝑜𝐺𝐴𝑁ProGANitalic_P italic_r italic_o italic_G italic_A italic_N and WGAN𝑊𝐺𝐴𝑁WGANitalic_W italic_G italic_A italic_N as shown in the second, third, and fifth columns of Table 4, respectively. Even though all the models perform satisfactorily on cross-seed data of PROGAN𝑃𝑅𝑂𝐺𝐴𝑁PROGANitalic_P italic_R italic_O italic_G italic_A italic_N and WGAN𝑊𝐺𝐴𝑁WGANitalic_W italic_G italic_A italic_N, a substantial drop in accuracy is observed for AttNet𝐴𝑡𝑡𝑁𝑒𝑡AttNetitalic_A italic_t italic_t italic_N italic_e italic_t, LeveFreq𝐿𝑒𝑣𝑒𝐹𝑟𝑒𝑞LeveFreqitalic_L italic_e italic_v italic_e italic_F italic_r italic_e italic_q, and DNA𝐷𝑁𝐴DNAitalic_D italic_N italic_A-Det𝐷𝑒𝑡Detitalic_D italic_e italic_t on the cross-seed data of SNGAN𝑆𝑁𝐺𝐴𝑁SNGANitalic_S italic_N italic_G italic_A italic_N and DCGAN𝐷𝐶𝐺𝐴𝑁DCGANitalic_D italic_C italic_G italic_A italic_N. We also tested the existing methods with the fine-tuned GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N data. The accuracy of the existing methods dropped significantly (below 79797979%), as shown in the last column of Table 4. This implies that our method GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t outperforms the existing methods in GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture attribution under cross-seed as well as fine-tuning scenarios.

Table 4: Comparison of GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t with existing methods. The table shows accuracies obtained under closed-set and cross-seed and fine-tune scenarios. Column 2222 represents the net accuracy obtained for closed-set data which includes real and fake data from all four GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N models. Columns 3333 to 6666 represent accuracies obtained for individual GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N generated images. Column 7777 represents the net accuracy obtained for fine-tuned GAN data. The results highlight the generalization capability and efficiency of GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t in correctly attributing the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N generated images.
Method Closed-Set
Cross-Seed
ProGAN
Cross-Seed
SNGAN
Cross-Seed
DCGAN
Cross-Seed
WGAN
Fine
tune
LeveFreq 99.50 83.50 15.78 38.56 76.45 33.38
AttNet 98.88 89.44 17.83 14.69 92.46 47.50
DNA-Det 100.00 95.05 05.26 69.14 94.12 78.01
GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t 100.00 99.90 97.13 99.96 99.55 98.57

5 Conclusion

In this paper, we proposed a generalized deepfake attribution network (GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t) for attributing the GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N generated images to its original architecture. The main goal of our method is to find the traces from the generated images that are GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N architecture-specific. To achieve this goal, we have introduced a Feature extraction network that can extract architectural-level traces from the generated images using supervised contrastive learning. To further ensure that GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t focuses on the model architecture traces, we have added a denoising autoencoder such that FEN𝐹𝐸𝑁FENitalic_F italic_E italic_N will receive a feature map with the least content-dependent traces. To show the generalization of GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t, we have used four different GAN𝐺𝐴𝑁GANitalic_G italic_A italic_Ns of DCGAN𝐷𝐶𝐺𝐴𝑁DCGANitalic_D italic_C italic_G italic_A italic_N, WGAN𝑊𝐺𝐴𝑁WGANitalic_W italic_G italic_A italic_N, ProGAN𝑃𝑟𝑜𝐺𝐴𝑁ProGANitalic_P italic_r italic_o italic_G italic_A italic_N and SNGAN𝑆𝑁𝐺𝐴𝑁SNGANitalic_S italic_N italic_G italic_A italic_N and show that our method can correctly attribute the generated images. We have also compared our method with the existing attribution networks to highlight the effectiveness of GDA𝐺𝐷𝐴GDAitalic_G italic_D italic_A-Net𝑁𝑒𝑡Netitalic_N italic_e italic_t.

References

  • [1] Amezaga, N., Hajek, J.: Availability of voice deepfake technology and its impact for good and evil. In: Proceedings of the 23rd Annual Conference on Information Technology Education. pp. 23–28 (2022)
  • [2] Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International conference on machine learning. pp. 214–223. PMLR (2017)
  • [3] Asnani, V., Yin, X., Hassner, T., Liu, X.: Reverse engineering of generative models: Inferring model hyperparameters from generated images. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023)
  • [4] Chai, L., Bau, D., Lim, S.N., Isola, P.: What makes fake images detectable? understanding properties that generalize. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVI 16. pp. 103–120. Springer (2020)
  • [5] Chauhan, A.: Deepfakes strike deep in gujarat too .. Times of India (2023), https://timesofindia.indiatimes.com/city/ahmedabad/deepfakes-strike-deep-in-gujarat-too/articleshow/105136316.cms
  • [6] Ding, Y., Thakur, N., Li, B.: Does a gan leave distinct model-specific fingerprints. In: Proceedings of the BMVC (2021)
  • [7] Durall, R., Keuper, M., Keuper, J.: Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 7890–7899 (2020)
  • [8] Frank, J., Eisenhofer, T., Schönherr, L., Fischer, A., Kolossa, D., Holz, T.: Leveraging frequency analysis for deep fake image recognition. In: International conference on machine learning. pp. 3247–3258. PMLR (2020)
  • [9] Fung, S., Lu, X., Zhang, C., Li, C.T.: Deepfakeucl: Deepfake detection via unsupervised contrastive learning. In: 2021 International Joint Conference on Neural Networks (IJCNN). pp. 1–8 (2021). https://doi.org/10.1109/IJCNN52387.2021.9534089
  • [10] Guarnera, L., Giudice, O., Battiato, S.: Deepfake detection by analyzing convolutional traces. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. pp. 666–667 (2020)
  • [11] Jeon, H., Bang, Y., Kim, J., Woo, S.S.: T-gd: Transferable gan-generated images detection framework. arXiv preprint arXiv:2008.04115 (2020)
  • [12] Joslin, M., Hao, S.: Attributing and detecting fake images generated by known gans. In: 2020 IEEE Security and Privacy Workshops (SPW). pp. 8–14. IEEE (2020)
  • [13] Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)
  • [14] Karras, T., Aittala, M., Laine, S., Härkönen, E., Hellsten, J., Lehtinen, J., Aila, T.: Alias-free generative adversarial networks. Advances in neural information processing systems 34, 852–863 (2021)
  • [15] Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., Krishnan, D.: Supervised contrastive learning. Advances in neural information processing systems 33, 18661–18673 (2020)
  • [16] Kim, C., Ren, Y., Yang, Y.: Decentralized attribution of generative models. arXiv preprint arXiv:2010.13974 (2020)
  • [17] Liu, Z., Luo, P., Wang, X., Tang, X.: Large-scale celebfaces attributes (celeba) dataset. Retrieved August 15(2018),  11 (2018)
  • [18] Marra, F., Gragnaniello, D., Verdoliva, L., Poggi, G.: Do gans leave artificial fingerprints? In: 2019 IEEE conference on multimedia information processing and retrieval (MIPR). pp. 506–511. IEEE (2019)
  • [19] Mirsky, Y., Lee, W.: The creation and detection of deepfakes: A survey. ACM computing surveys (CSUR) 54(1), 1–41 (2021)
  • [20] Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018)
  • [21] Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
  • [22] Sjouwerman, S.: Deepfake phishing: The dangerous new face of cybercrime. Forbes (2024), https://www.forbes.com/sites/forbestechcouncil/2024/01/23/deepfake-phishing-the-dangerous-new-face-of-cybercrime/?sh=59f64e1f4aed
  • [23] Wang, S.Y., Wang, O., Zhang, R., Owens, A., Efros, A.A.: Cnn-generated images are surprisingly easy to spot… for now. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8695–8704 (2020)
  • [24] Xuan, X., Peng, B., Wang, W., Dong, J.: Scalable fine-grained generated image classification based on deep metric learning. arXiv preprint arXiv:1912.11082 (2019)
  • [25] Yang, T., Huang, Z., Cao, J., Li, L., Li, X.: Deepfake network architecture attribution. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 36, pp. 4662–4670 (2022)
  • [26] Yu, N., Davis, L.S., Fritz, M.: Attributing fake images to gans: Learning and analyzing gan fingerprints. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 7556–7566 (2019)
  • [27] Yu, N., Skripniuk, V., Abdelnabi, S., Fritz, M.: Artificial fingerprinting for generative models: Rooting deepfake attribution in training data. In: Proceedings of the IEEE/CVF International conference on computer vision. pp. 14448–14457 (2021)
  • [28] Yu, N., Skripniuk, V., Chen, D., Davis, L., Fritz, M.: Responsible disclosure of generative models using scalable fingerprinting. arXiv preprint arXiv:2012.08726 (2020)
  • [29] Zhao, H., Zhou, W., Chen, D., Wei, T., Zhang, W., Yu, N.: Multi-attentional deepfake detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2185–2194 (2021)