\addbibresource

main.bib

AIGC
AI-generated content
FNR
false negative rate
FDR
false discovery rate
TPR
true positive rate
FPR
false positive rate
GAN
generative adversarial network
TPDNE
thispersondoesnotexist.com
TF-IDF
term frequency–inverse document frequency

AI-Generated Faces in the Real World:
A Large-Scale Case Study of Twitter Profile Images

Jonas Ricker 0000-0002-7186-3634 Ruhr University BochumBochumGermany Dennis Assenmacher 0000-0001-9219-1956 GESIS - Leibniz Institute for the Social SciencesCologneGermany Thorsten Holz 0000-0002-2783-1264 CISPA Helmholtz Center for Information SecuritySaarbrückenGermany Asja Fischer 0000-0002-1916-7033 Ruhr University BochumBochumGermany  and  Erwin Quiring 0009-0004-7170-1274 Ruhr University BochumBochumGermany ICSIBerkeleyUSA
Abstract.

Recent advances in the field of generative artificial intelligence (AI) have blurred the lines between authentic and machine-generated content, making it almost impossible for humans to distinguish between such media. One notable consequence is the use of AI-generated images for fake profiles on social media. While several types of disinformation campaigns and similar incidents have been reported in the past, a systematic analysis has been lacking.

In this work, we conduct the first large-scale investigation of the prevalence of AI-generated profile pictures on Twitter. We tackle the challenges of a real-world measurement study by carefully integrating various data sources and designing a multi-stage detection pipeline. Our analysis of nearly 15 million Twitter profile pictures shows that 0.052 %times0.052percent0.052\text{\,}\mathrm{\char 37\relax}start_ARG 0.052 end_ARG start_ARG times end_ARG start_ARG % end_ARG were artificially generated, confirming their notable presence on the platform. We comprehensively examine the characteristics of these accounts and their tweet content, and uncover patterns of coordinated inauthentic behavior. The results also reveal several motives, including spamming and political amplification campaigns. Our research reaffirms the need for effective detection and mitigation strategies to cope with the potential negative effects of generative AI in the future.

AI-Generated Content, Fake Image Detection, Social Networks
ccs: Computing methodologies Machine learningccs: Security and privacy Human and societal aspects of security and privacyccs: Information systems Social networks

1. Introduction

The emergence of generative artificial intelligence (AI) has revolutionized content creation, enabling us to produce highly authentic and diverse outputs, such as images, videos, texts, and music that bear a striking resemblance to human-created media. These AI-driven systems have become ubiquitous in various areas of our society and provide reliable support in numerous applications. Among other use cases, they streamline the writing of emails and texts or enhance programming with advanced code completion tools. However, alongside its impressive benefits, generative AI also has the potential for significant detrimental effects. A pressing problem is the ability to generate compellingly realistic but false content, which can be used as a way to spread misinformation, manipulate people, and influence public opinion.

In a significant action in late 2019, Facebook dismantled an extensive network of over 900 accounts, pages, and groups that had collectively spent more than 9 million USD on advertisements promoting Donald Trump, potentially impacting the 2020 US presidential election \parencitenimmoOperationFFSFakeFace2019. A notable feature of this network was the use of AI-generated profile images, possibly taken from the website thispersondoesnotexist.com (TPDNE) which became operational in February 2019. Using NVIDIA’s StyleGAN \parencitekarrasStylebasedGeneratorArchitecture2019, TPDNE generates a new facial image every time the page is refreshed, making it easily accessible to everyone. Since this incident, the use of AI-synthesized faces in disinformation campaigns has been on the rise, likely because such images reduce the risk of detection through reverse image searches \parencitegoldsteinHowDisinformationEvolved2021. Investigations have revealed that many of these deceptive clusters operate with state interests in mind, seeking to bolster specific narratives \parencitenimmoOperationNavalGazing2020,nimmoSpamouflageGoesAmerica,strickAnalysisProchinaPropaganda2021,stanfordinternetobservatoryAnalysisTwitterTakedowns2020 or interfere in the domestic policies of foreign states \parencitenimmoIRAAgainUnlucky2020,graphikateamStepMyParler2020,graphikateamFakeClusterBoosts2021. Additionally, there are efforts to influence public opinion \parencitestanfordinternetobservatoryReplyguysGoHunting2020,strickWestPapuaNew2020 or establish connections with unsuspecting social media users \parencitevincentSpyReportedlyUsed2019,goldsteinResearchNoteThis2022. The FBI and Europol have expressed concerns that the trend of using AI-generated content in cybercrime and foreign influence operations is expected to grow steadily \parenciteMaliciousActorsAlmost2021,Europol2024. Given these examples, it is essential to understand the detection possibility, prevalence, and usage of AI-generated images in the wild instead of a lab setting.

In this work, we tackle this challenge by concentrating on the phenomenon of AI-generated images in social media. At the time of writing, it is becoming increasingly difficult for humans to differentiate these machine-generated media from authentic photographs, as evidenced by recent studies \parencitehulzeboschDetectingCNNgeneratedFacial2020, tucciarelliRealnessPeopleWho2020, nightingaleSyntheticFacesHow2021, shenStudyHumanPerception2021, lagoMoreRealReal2022, nightingaleAIsynthesizedFacesAre2022, frankRepresentativeStudyHuman2023. Although the detection of generated images has been explored extensively in lab settings, there is a surprising lack of comprehensive research addressing their identification and widespread use on social media platforms in real-world contexts. In this paper, we provide the first systematic and large-scale study of AI-generated profile images on Twitter. Our research is founded on three main pillars.

First, we develop a fast and effective detection pipeline tailored to the identification of AI-generated images in real-world scenarios. This task presents unique challenges, including the lack of a definitive ground truth and the diversity of possible image manipulations. To solve these problems, we carefully design a detection pipeline step by step. We consider different dataset types, apply a pre-filter to discard images with too small or no faces, and adapt a state-of-the-art classification model specifically targeting synthetic profile images on Twitter. As mentioned above, observations suggest that the majority of AI-generated profile images originate from TPDNE, which is why we tailor our detection pipeline to this kind of fake faces. Finally, we integrate various tools that help with the manual labeling that is required to estimate error rates on unlabeled in-the-wild data. We study each component of our system in controlled setups and show that the pipeline is capable of accurately recognizing AI-generated images.

Second, we analyze a large collection of 14 989 3851498938514\,989\,38514 989 385 Twitter profile pictures to determine how prevalent AI-generated profile pictures are on the platform. We identify 7 72377237\,7237 723 accounts that use such images, which corresponds to a prevalence rate of 0.052 %times0.052percent0.052\text{\,}\mathrm{\char 37\relax}start_ARG 0.052 end_ARG start_ARG times end_ARG start_ARG % end_ARG. This result indicates a notable presence of generated profile images on Twitter. We also assess the accuracy and reliability of our findings by estimating error rates. We estimate the false negative rate (FNR)—the fraction of mislabeled fake images—of our approach to lie between 2.88 %times2.88percent2.88\text{\,}\mathrm{\char 37\relax}start_ARG 2.88 end_ARG start_ARG times end_ARG start_ARG % end_ARG and 3.03 %times3.03percent3.03\text{\,}\mathrm{\char 37\relax}start_ARG 3.03 end_ARG start_ARG times end_ARG start_ARG % end_ARG, and the false discovery rate (FDR)—the fraction of real images among all images classified as fake—to be 1.4 %times1.4percent1.4\text{\,}\mathrm{\char 37\relax}start_ARG 1.4 end_ARG start_ARG times end_ARG start_ARG % end_ARG. The results suggest a low error rate of our method.

Third, we contextualize the use of AI-generated profile pictures on Twitter by examining the corresponding accounts and their tweets. Our results show clear differences between the two types of accounts: accounts with fake images tend to have lower social engagement as well as fewer followers and followed accounts. Despite the generally lower activity, some accounts with fake images are very active, suggesting possible involvement in spam campaigns. In addition, fake accounts are often newer and are suspended more frequently by Twitter, indicating inauthentic behavior. A significant portion of accounts was created in bulk shortly before our data collection, which is a common pattern for accounts created for message amplification, disinformation campaigns, or similar disruptive activity. This impression is confirmed by our textual analysis of the accounts’ tweets. We identify large clusters spamming very similar contents, frequently referring to giveaways, cryptocurrencies, and pornography. Notably, we also observe accounts that engage in contentious or political topics, such as the war in Ukraine, debates on COVID and vaccinations, and election-related discourse.

Contributions

We make the following key contributions:

  1. (1)

    Detection Pipeline. We propose a multi-step pipeline for detecting AI-generated profile images on social media. We evaluate each stage in a controlled setup and demonstrate the pipeline’s suitability for real-world settings.

  2. (2)

    Prevalence Study on Twitter. We apply our pipeline on 14 989 3851498938514\,989\,38514 989 385 authentic profile images to systematically study the prevalence of AI-generated faces on Twitter. We identify 7 72377237\,7237 723 accounts with generated profile images, corresponding to a prevalence rate of 0.052 %times0.052percent0.052\text{\,}\mathrm{\char 37\relax}start_ARG 0.052 end_ARG start_ARG times end_ARG start_ARG % end_ARG.

  3. (3)

    Account and Tweet Analysis. We analyze the user metrics and tweets of accounts using AI-generated profile images to learn more about their intended purpose. We identify prevalent topics and find a significant number of accounts to apparently participate in coordinated inauthentic behavior.

2. Background

We start by providing a short primer on the creation and detection of AI-generated images.

AI-Generated Content (AIGC)

AIGC, sometimes also referred to as “deepfakes”, is content that appears authentic to humans but is synthesized or altered using a deep neural network. It is most prominently associated with manipulated videos in which the face of a person is replaced with a different one \parencitemirskyCreationDetectionDeepfakes2021, but also encompasses other types of media including images, audio, and text. While AIGC offers great creative potential, it is also used for malicious purposes, including defamatory images and videos \parencitecoleAIassistedFakePorn2017, voice cloning \parencitegaoVoiceImpersonationUsing2018,damianiVoiceDeepfakeWas, fake customer reviews \parenciteyaoAutomatedCrowdturfingAttacks2017, and machine-generated posts on social media \parencitefagniTweepFakeDetectingDeepfake2021,goldsteinGenerativeLanguageModels2023.

Image Synthesis

Learning a probability distribution from samples in order to generate novel samples is a longstanding challenge, especially in the high-dimensional image domain. Besides variational autoencoders (VAEs) \parencitekingmaAutoencodingVariationalBayes2014 and autoregressive models \parenciteoordPixelRecurrentNeural2016a,vandenoordConditionalImageGeneration2016, generative adversarial networks \parencitegoodfellowGenerativeAdversarialNets2014 have proven to be effective in synthesizing high-quality images \parencitezhuUnpairedImagetoimageTranslation2017,choiStarGANUnifiedGenerative2018,karrasProgressiveGrowingGANs2018,karrasStylebasedGeneratorArchitecture2019,karrasAnalyzingImprovingImage2020,karrasAliasFreeGenerativeAdversarial2021,sauerProjectedGANsConverge2021,kangScalingGANsTexttoimage2023. The StyleGAN family \parencitekarrasStylebasedGeneratorArchitecture2019,karrasAnalyzingImprovingImage2020,karrasAliasFreeGenerativeAdversarial2021 received special attention due to their ability to generate faces that are practically indistinguishable from real ones \parencitenightingaleAIsynthesizedFacesAre2022. Recently, it has been shown that diffusion models (DMs) \parencitesohl-dicksteinDeepUnsupervisedLearning2015,hoDenoisingDiffusionProbabilistic2020,dhariwalDiffusionModelsBeat2021 are able to match and even surpass the visual quality of GAN-generated images.

Generated Image Detection

There is a continuing arms race for effective detection techniques and newer generations of image synthesis algorithms. Broadly speaking, generated image detection techniques can be divided into two categories: methods that rely on handcrafted features and learning-based methods. Methods from the first category either exploit visual defects (e.g., facial inconsistencies \parencitematernExploitingVisualArtifacts2019, impossible reflections \parencitehuExposingGANGeneratedFaces2021, irregular pupil shapes \parenciteguoEyesTellAll2022) or “invisible” characteristics such as frequency artifacts \parencitezhangDetectingSimulatingArtifacts2019,durallWatchYourUpconvolution2020,frankLeveragingFrequencyAnalysis2020,chandrasegaranCloserLookFourier2021,schwarzFrequencyBiasGenerative2021,chenSSDGANMeasuringRealness2021, pixel statistics \parencitenatarajDetectingGANGenerated2019,mccloskeyDetectingGANgeneratedImagery2019, or model-specific properties \parencitemarraGANsLeaveArtificial2019,yuAttributingFakeImages2019,rickerAEROBLADE2024. Learning-based methods, on the other hand, use neural networks to learn a suitable feature representation to distinguish fake from real images \parencitemarraDetectionGANgeneratedFake2018,chaiWhatMakesFake2020,hulzeboschDetectingCNNgeneratedFacial2020,wangCNNgeneratedImagesAre2020,gragnanielloAreGANGenerated2021,cozzolinoUniversalGANImage2021,ojhaUniversalFakeImage2023,corviDetectionSyntheticImages2023.

3. Methodology

A large-scale study on generated images in the wild comes with multiple challenges. First, we do not know the ground truth. As a result, it is difficult to estimate the amount of overlooked generated images (false negatives) and to be sure that an image detected as generated is actually generated (precision). Finally, studying millions of images comes with a computational overhead so that the detection method has to be efficient, too. We discuss further challenges and limitations of our study in Section 8. To deal with all these challenges, we carefully design a multi-step detection pipeline. The following is a step-by-step description of this pipeline. Note that while the presented approach is applied to Twitter, our method can be adapted to any other social network. We provide implementation details in Appendix A.

3.1. Data Collection

We describe the four types of datasets that we use for studying generated images, with Twitter being our use case. Table 1 summarizes our notation.

In-The-Wild Dataset 𝒟W𝕏superscriptsubscript𝒟𝑊𝕏\mathcal{D}_{W}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT

To estimate the prevalence of generated images on a social network, it is important to obtain a mostly unconditional sample. In the case of Twitter, this can be achieved by using the API endpoint that provides real-time access to a random 1 %times1percent1\text{\,}\mathrm{\char 37\relax}start_ARG 1 end_ARG start_ARG times end_ARG start_ARG % end_ARG subset of all publicly posted tweets. We download each author’s profile image together with their profile metadata (cf. Section A.1 for an overview). Note that this approach only enables us to obtain profile images from users who write posts during the data collection period. Additionally, we omit users who have not set a profile image, that is, who are using Twitter’s default profile image. From March 7 to March 15 2023, we collected 14 989 3851498938514\,989\,38514 989 385 profile images.

Labeled Datasets 𝒟Rsubscript𝒟𝑅\mathcal{D}_{R}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT/𝒟Fsubscript𝒟𝐹\mathcal{D}_{F}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT and Variations

We continue with labeled datasets of fake and real images which can be used to train a detector. As discussed in Section 1, existing observations suggest that the vast majority of generated profile images on Twitter are taken from TPDNE, which generates images with StyleGAN2 \parencitekarrasAnalyzingImprovingImage2020 trained on the FFHQ \parencitekarrasStylebasedGeneratorArchitecture2019 dataset111When published in 2019, TPDNE used the original StyleGAN \parencitekarrasStylebasedGeneratorArchitecture2019, but switched to StyleGAN2 \parencitekarrasAnalyzingImprovingImage2020 shortly after its release.. We therefore decide to focus on this specific kind of fake faces and use 10 0001000010\,00010 000 images from TPDNE as our fake-labeled dataset (denoted by 𝒟Fsubscript𝒟𝐹\mathcal{D}_{F}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT) and correspondingly 10 0001000010\,00010 000 images from FFHQ as our real-labeled dataset (denoted by 𝒟Rsubscript𝒟𝑅\mathcal{D}_{R}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT). We discuss this limitation of focusing on TPDNE in Section 8. As prior work shows that processing operations like resizing and compression can affect the detection \parenciteparmarOnAliasedResizing2022, mandelliTrainingCNNsPresence2020, we consider two dataset variations:

  • 𝒟R𝕏superscriptsubscript𝒟𝑅𝕏\mathcal{D}_{R}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT and 𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT. To obtain profile images with the social network’s processing steps, we adapt the approach from \textciteboatoTrueFaceDatasetDetection2022. We upload both 𝒟Rsubscript𝒟𝑅\mathcal{D}_{R}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT and 𝒟Fsubscript𝒟𝐹\mathcal{D}_{F}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT to Twitter, set each image as profile image, and then download all images again. We denote these processed images by 𝒟R𝕏superscriptsubscript𝒟𝑅𝕏\mathcal{D}_{R}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT and 𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT.

  • 𝒟R𝕏superscriptsubscript𝒟𝑅superscript𝕏\mathcal{D}_{R}^{\mathbb{X}^{\prime}}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT and 𝒟F𝕏superscriptsubscript𝒟𝐹superscript𝕏\mathcal{D}_{F}^{\mathbb{X}^{\prime}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. We additionally simulate a user which zooms into the profile image during the upload, as it is common for social media platforms. We denote these images by 𝒟R𝕏superscriptsubscript𝒟𝑅superscript𝕏\mathcal{D}_{R}^{\mathbb{X}^{\prime}}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT and 𝒟F𝕏superscriptsubscript𝒟𝐹superscript𝕏\mathcal{D}_{F}^{\mathbb{X}^{\prime}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, respectively.

We confirm in Section 7.2 that considering the preprocessing indeed improves the detection performance under realistic conditions.

Proxy-Labeled Real Dataset 𝒟P𝕏superscriptsubscript𝒟𝑃𝕏\mathcal{D}_{P}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT

Social media platforms often have very popular users with a lot of followers. These popular users are rather unlikely to use deceptive fake images. Hence, we can build a proxy-labeled dataset with presumably real images. In particular, we select 10 0001000010\,00010 000 profile images from the accounts in 𝒟W𝕏superscriptsubscript𝒟𝑊𝕏\mathcal{D}_{W}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT with the highest numbers of followers that also pass our pre-filter (which is presented in the next section). We denote the so-created proxy-labeled dataset of real profile images by 𝒟P𝕏superscriptsubscript𝒟𝑃𝕏\mathcal{D}_{P}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT.

Documented Fakes Dataset 𝒟D𝕏superscriptsubscript𝒟𝐷𝕏\mathcal{D}_{D}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT

Finally, there are documented cases of generated profile images that were discovered manually. For example, blog posts regularly report such images when analyzing inauthentic Twitter accounts \parencitenortenoConspiradorNortenoSubstack2024. These cases can be used to build a labeled dataset of fake images in the wild, which we denote by 𝒟D𝕏superscriptsubscript𝒟𝐷𝕏\mathcal{D}_{D}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT. Such a dataset is not free of bias, but provides a good means to finally check the performance of our classifier on an independent source. For our study, we use a dataset of 1 35313531\,3531 353 generated Twitter profile images that were manually collected between November 2022 and May 2023 \parenciteyangCharacteristicsPrevalenceFake2024.

Table 1. Dataset notation. The symbol X indicates that images were processed by Twitter.
Symbol Description
𝒟W𝕏superscriptsubscript𝒟𝑊𝕏\mathcal{D}_{W}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT Unlabeled dataset of Twitter profile images.
𝒟Fsubscript𝒟𝐹\mathcal{D}_{F}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT Labeled dataset of fake images.
𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT Labeled dataset of fake images uploaded as profile image and downloaded afterward.
𝒟F𝕏superscriptsubscript𝒟𝐹superscript𝕏\mathcal{D}_{F}^{\mathbb{X}^{\prime}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT Version of 𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT where images are zoomed into during upload.
𝒟Rsubscript𝒟𝑅\mathcal{D}_{R}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT Labeled dataset of real images.
𝒟R𝕏superscriptsubscript𝒟𝑅𝕏\mathcal{D}_{R}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT Labeled dataset of real images uploaded as profile image and downloaded afterward.
𝒟R𝕏superscriptsubscript𝒟𝑅superscript𝕏\mathcal{D}_{R}^{\mathbb{X}^{\prime}}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT Version of 𝒟R𝕏superscriptsubscript𝒟𝑅𝕏\mathcal{D}_{R}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT where images are zoomed into during upload.
𝒟P𝕏superscriptsubscript𝒟𝑃𝕏\mathcal{D}_{P}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT Proxy-labeled dataset of supposedly real Twitter profile images.
𝒟D𝕏superscriptsubscript𝒟𝐷𝕏\mathcal{D}_{D}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT Labeled dataset of documented fake Twitter profile images.

3.2. Detection

Equipped with these different datasets, we can proceed with the detection of generated profile images. Here, we propose a two-stage procedure to improve the accuracy and the efficiency.

Pre-Filter ϕitalic-ϕ\phiitalic_ϕ

We start with a pre-filter ϕitalic-ϕ\phiitalic_ϕ to discard irrelevant samples. In our case, we can discard images without any face or where the face is too small. We use the efficient BlazeFace \parencitebazarevskyBlazeFace2019 face detector to detect faces and locate facial landmarks. An image passes ϕitalic-ϕ\phiitalic_ϕ if at least one face is detected and the Euclidean distance between the coordinates of both eyes is greater or equal to 0.10.10.10.1. The pre-filter serves two purposes: First, the overall computational complexity decreases by reducing the number of analyzed candidates in the subsequent, more demanding detection stage. Second, the detection stage is trained on facial images, so that other types of profile images, such as logos or monochrome images, could be wrongly classified as fake. Filtering irrelevant images can therefore decrease the false positive rate (FPR).

Classifier 𝒞𝒞\mathcal{C}caligraphic_C

To automatically label a profile image as real or fake, we use a state-of-the-art CNN detector based on ResNet-50 \parenciteheDeepResidualLearning2016. Previous work \parencitewangCNNgeneratedImagesAre2020, mandelliTrainingCNNsPresence2020, cozzolinoSpoCSpoofingCamera2021, cozzolinoUniversalGANImage2021, gragnanielloAreGANGenerated2021 has demonstrated that this model is able to effectively distinguish real from generated images and that it provides good generalization capabilities. We initially attempted to use pre-trained fake image detectors, however, we found that the heavy pre-processing performed by Twitter makes it necessary to train our own detector (cf. Section 7.1). In particular, we train on the combination of 𝒟R𝕏superscriptsubscript𝒟𝑅𝕏\mathcal{D}_{R}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT and 𝒟P𝕏superscriptsubscript𝒟𝑃𝕏\mathcal{D}_{P}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT for real images, and 𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT for fake images. The resulting final classifier is denoted by 𝒞R𝕏,P𝕏/F𝕏subscript𝒞superscript𝑅𝕏superscript𝑃𝕏superscript𝐹𝕏\mathcal{C}_{R^{\mathbb{X}},P^{\mathbb{X}}/F^{\mathbb{X}}}caligraphic_C start_POSTSUBSCRIPT italic_R start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT / italic_F start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. Note that we experiment with using other dataset variations to train a classifier in our ablation study in Section 7.2. Yet, using processed real, fake, and proxy-labeled real images provides the highest performance for processed and zoomed inputs.

3.3. Assistance for Manual Labeling

To estimate error rates of our detection scheme on unlabeled in-the-wild data, it is necessary to manually label these images as real or fake. As generated images have reached a level of quality which makes them almost indistinguishable from real images \parencitehulzeboschDetectingCNNgeneratedFacial2020, tucciarelliRealnessPeopleWho2020, nightingaleSyntheticFacesHow2021, shenStudyHumanPerception2021, lagoMoreRealReal2022, nightingaleAIsynthesizedFacesAre2022, frankRepresentativeStudyHuman2023, we use two tools to facilitate this process.

Alignment

Faces generated by StyleGAN2 \parencitekarrasAnalyzingImprovingImage2020 are characterized by being almost perfectly aligned with respect to their facial landmarks, caused by the alignment of the training dataset FFHQ. By superimposing multiple images, this characteristic has been leveraged to visually identify clusters of fake accounts in social networks \parencitenimmoOperationNavalGazing2020, nimmoIRAAgainUnlucky2020, graphikateamStepMyParler2020, graphikateamFakeClusterBoosts2021, stanfordinternetobservatoryReplyguysGoHunting2020, strickAnalysisProchinaPropaganda2021, goldsteinResearchNoteThis2022. We automate this manual process by extracting facial landmarks with BlazeFace \parencitebazarevskyBlazeFace2019 and computing the deviation from a reference. For each landmark L1,,L12subscript𝐿1subscript𝐿12L_{1},\dotsc,L_{12}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_L start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT (x- and y-coordinates of eyes, ears, mouth, and nose), we compute its mean μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and standard deviation σisubscript𝜎𝑖\sigma_{i}italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over a reference dataset. In our study, we use the training subset of 𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT as reference. We define an image x𝑥xitalic_x as being aligned, if the condition

(1) |Li(x)μi|<kσii{1,,12}formulae-sequencesubscript𝐿𝑖𝑥subscript𝜇𝑖𝑘subscript𝜎𝑖for-all𝑖112|L_{i}(x)-\mu_{i}|<k\sigma_{i}\quad\forall i\in\{1,\dotsc,12\}| italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | < italic_k italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∀ italic_i ∈ { 1 , … , 12 }

holds, where Li(x)subscript𝐿𝑖𝑥L_{i}(x)italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) are the landmarks extracted from the image x𝑥xitalic_x and k𝑘k\in\mathbb{Z}italic_k ∈ blackboard_Z controls the maximum deviation from the reference. We set k=7𝑘7k=7italic_k = 7. During our evaluation in Section 4, we find that this is the lowest value at which all generated images in the validation set of 𝒟Fsubscript𝒟𝐹\mathcal{D}_{F}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT are aligned. While a close alignment hints towards a face generated by StyleGAN2, it is ineffective if the image has been cropped or geometrically transformed.

Inversion

Additionally, we leverage GAN inversion \parencitexiaGANInversionSurvey2023 as an assistance tool. For a given input image, this method finds the latent code which reconstructs the original input when passed through the generator. We use the provided implementation by \textcitekarrasAnalyzingImprovingImage2020 to invert images using StyleGAN2. Previous work has shown that generated images can be reconstructed more successfully than real images \parencitealbrightSourceGeneratorAttribution2019,karrasAnalyzingImprovingImage2020,pasquiniIdentifyingSyntheticFaces2023 (we provide a visual example in Appendix B). Note that inversion also relies on facial alignment. If an adversary uses a cropped version of a fake face, the inversion result will be distorted. We therefore only use inversion as labeling assistance if the image is aligned.

4. Evaluation

In this section, we proceed with an evaluation of our proposed methodology in a controlled setting with labeled data. This allows us to verify the components of our detection pipeline before studying generated faces in the wild on Twitter in Section 5 and analyzing the corresponding profiles and tweets in Section 6.

Dataset Splits

We randomly split 𝒟Rsubscript𝒟𝑅\mathcal{D}_{R}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT, 𝒟Fsubscript𝒟𝐹\mathcal{D}_{F}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT, and 𝒟P𝕏superscriptsubscript𝒟𝑃𝕏\mathcal{D}_{P}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT into 8 50085008\,5008 500 train, 500500500500 validation, and 1 00010001\,0001 000 test images, respectively. 𝒟R𝕏superscriptsubscript𝒟𝑅𝕏\mathcal{D}_{R}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT and 𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT are split in the same manner. As we use 𝒟R𝕏superscriptsubscript𝒟𝑅superscript𝕏\mathcal{D}_{R}^{\mathbb{X}^{\prime}}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT and 𝒟F𝕏superscriptsubscript𝒟𝐹superscript𝕏\mathcal{D}_{F}^{\mathbb{X}^{\prime}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT only for evaluation, they only contain the corresponding 1 00010001\,0001 000 test images, respectively.

Pre-Filter

We start with the pre-filter ϕitalic-ϕ\phiitalic_ϕ that should discard irrelevant images, but keep potentially generated images. An image passes ϕitalic-ϕ\phiitalic_ϕ if a face (a) is detected and (b) has a sufficient size (see Section 3.2). In the following, we apply ϕitalic-ϕ\phiitalic_ϕ to the test set from 𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT and to 1 00010001\,0001 000 randomly sampled images from 𝒟W𝕏superscriptsubscript𝒟𝑊𝕏\mathcal{D}_{W}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT. For the latter subset, we manually label each image whether it (partly) contains a human face. Our experiment here has three goals: we want to verify that all generated images from 𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT pass ϕitalic-ϕ\phiitalic_ϕ, confirm that the face detector works reliably on the in-the-wild images from 𝒟W𝕏superscriptsubscript𝒟𝑊𝕏\mathcal{D}_{W}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT, and finally get an estimate of the number of kept in-the-wild images passed to the next stage.

Table 2 shows the results for our evaluation of ϕitalic-ϕ\phiitalic_ϕ. All generated images from 𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT pass ϕitalic-ϕ\phiitalic_ϕ, fulfilling our first goal. Among the sampled in-the-wild images from 𝒟W𝕏superscriptsubscript𝒟𝑊𝕏\mathcal{D}_{W}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT with a face, the face detector correctly identifies 92.47 %times92.47percent92.47\text{\,}\mathrm{\char 37\relax}start_ARG 92.47 end_ARG start_ARG times end_ARG start_ARG % end_ARG. We manually look through the undetected faces. In most cases, the face is either very small, obstructed (e.g., by masks or smartphones), or partly outside the frame. The face in these images is not prevalent, so that we consider it acceptable to skip them. For in-the-wild images without a face, the face detector mistakenly locates a face in 42.53 %times42.53percent42.53\text{\,}\mathrm{\char 37\relax}start_ARG 42.53 end_ARG start_ARG times end_ARG start_ARG % end_ARG of the cases. We manually inspect the mislabeled images. The vast majority contains faces, but they are drawn, digitally created, or belong to animals or statues. Only very few detections are obviously “wrong”, such as images with Twitter’s former default profile image. Since these images are just passed to the next stage, having some false positives is not critical. Based on this analysis, we can conclude that the face detector reliably works, fulfilling our second goal. Finally, we measure how many images from 𝒟W𝕏superscriptsubscript𝒟𝑊𝕏\mathcal{D}_{W}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT additionally pass the size check and therefore ϕitalic-ϕ\phiitalic_ϕ. In only 58.16 %times58.16percent58.16\text{\,}\mathrm{\char 37\relax}start_ARG 58.16 end_ARG start_ARG times end_ARG start_ARG % end_ARG of the face images and 30.27 %times30.27percent30.27\text{\,}\mathrm{\char 37\relax}start_ARG 30.27 end_ARG start_ARG times end_ARG start_ARG % end_ARG of the non-face images, the face is considered large enough, considerably reducing the number of images passed to the next stage. Overall, we conclude that our pre-filter allows us to skip irrelevant images efficiently, without mistakenly discarding generated faces.

Table 2. Evaluation of our pre-filter ϕitalic-ϕ\phiitalic_ϕ. We separately analyze its two conditions, which are the presence of a face and its sufficient size.
With Face Without Face
Dataset Face Detected Size Check Face Detected Size Check
𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT 100 %times100percent100\text{\,}\mathrm{\char 37\relax}start_ARG 100 end_ARG start_ARG times end_ARG start_ARG % end_ARG 100 %times100percent100\text{\,}\mathrm{\char 37\relax}start_ARG 100 end_ARG start_ARG times end_ARG start_ARG % end_ARG
𝒟W𝕏superscriptsubscript𝒟𝑊𝕏\mathcal{D}_{W}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT 92.47 %times92.47percent92.47\text{\,}\mathrm{\char 37\relax}start_ARG 92.47 end_ARG start_ARG times end_ARG start_ARG % end_ARG 58.16 %times58.16percent58.16\text{\,}\mathrm{\char 37\relax}start_ARG 58.16 end_ARG start_ARG times end_ARG start_ARG % end_ARG 42.53 %times42.53percent42.53\text{\,}\mathrm{\char 37\relax}start_ARG 42.53 end_ARG start_ARG times end_ARG start_ARG % end_ARG 30.27 %times30.27percent30.27\text{\,}\mathrm{\char 37\relax}start_ARG 30.27 end_ARG start_ARG times end_ARG start_ARG % end_ARG

Classification

Next, we verify that our classifier 𝒞R𝕏,P𝕏/F𝕏subscript𝒞superscript𝑅𝕏superscript𝑃𝕏superscript𝐹𝕏\mathcal{C}_{R^{\mathbb{X}},P^{\mathbb{X}}/F^{\mathbb{X}}}caligraphic_C start_POSTSUBSCRIPT italic_R start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT / italic_F start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is capable of spotting generated images in realistic settings. We evaluate the performance of our classifier under three conditions: (a) processed images (𝒟R𝕏superscriptsubscript𝒟𝑅𝕏\mathcal{D}_{R}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT vs. 𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT), (b) zoomed images (𝒟R𝕏superscriptsubscript𝒟𝑅superscript𝕏\mathcal{D}_{R}^{\mathbb{X}^{\prime}}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT vs. 𝒟F𝕏superscriptsubscript𝒟𝐹superscript𝕏\mathcal{D}_{F}^{\mathbb{X}^{\prime}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT), and (c) proxy-labeled real and fake images (𝒟P𝕏superscriptsubscript𝒟𝑃𝕏\mathcal{D}_{P}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT vs. 𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT). We use the test set from each dataset.

Figure 1 shows the respective ROC curves. Our classifier has an almost perfect detection rate with an AUC value close to 1.01.01.01.0. Note that the setup on zoomed data is slightly more challenging, because there are no examples of zoomed images in the detector’s training data. Still, the error rate remains very small. Due to the strong class imbalance on real Twitter data, a small error rate is required to avoid an excessive amount of false positives.

Refer to caption
Figure 1. Evaluation of our classifier 𝒞R𝕏,P𝕏/F𝕏subscript𝒞superscript𝑅𝕏superscript𝑃𝕏superscript𝐹𝕏\mathcal{C}_{R^{\mathbb{X}},P^{\mathbb{X}}/F^{\mathbb{X}}}caligraphic_C start_POSTSUBSCRIPT italic_R start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT / italic_F start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. We show the ROC curve under different conditions.

Assistance Tools

We finally verify our methods that allow us to better label images for the error-rate estimation later.

Alignment. Using the test set from 𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT, we confirm that with k=7𝑘7k=7italic_k = 7, all fake images are correctly labeled as being aligned. From the 1 00010001\,0001 000 randomly sampled images from 𝒟W𝕏superscriptsubscript𝒟𝑊𝕏\mathcal{D}_{W}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT, only 35 are aligned.

Inversion. We first verify that generated images can be inverted more accurately than real images. We invert 500 images from 𝒟R𝕏superscriptsubscript𝒟𝑅𝕏\mathcal{D}_{R}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT and 𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT, respectively, and compute the LPIPS \parencitezhangTheUnreasonableEffectiveness2018 distance between original and reconstructed images. This distance metric measures the perceptual similarity between two images and has been previously used to estimate the reconstruction quality \parencitekarrasAnalyzingImprovingImage2020. The histograms in Figure 2 show that the reconstructions from 𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT are perceptually more similar to the originals compared to the reconstructions from 𝒟R𝕏superscriptsubscript𝒟𝑅𝕏\mathcal{D}_{R}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT. A classification based on the LPIPS scores results in an AUC of 0.97.

Refer to caption
Figure 2. Evaluation of GAN inversion. The lower the LPIPS distance between the original image and its reconstruction, the more similar they are.

As second experiment, we check that inversion is helpful for manual labeling. We divide the 1 00010001\,0001 000 images (500 real, 500 fake) into 900 training and 100 test images. For each image we construct a side-by-side view with the original, its reconstruction obtained by inversion, and the distance measured in LPIPS and MSE (cf. Appendix B). Using the training set, one annotator practices the manual classification. We then evaluate the performance based on the held-out test images. 99 out of 100 images are correctly assigned, demonstrating a feasible manual inspection.

We emphasize that images from 𝒟R𝕏superscriptsubscript𝒟𝑅𝕏\mathcal{D}_{R}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT are very similar to images from 𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT. In contrast, most in-the-wild profile images are visually different, leading to even worse reconstructions (despite being aligned) and thus to comparatively high LPIPS values. Hence, we expect the actual manual labeling process to be easier than in the controlled setting.

Summary

Our evaluation indicates a valid detection pipeline. The pre-filter allows skip** irrelevant images while the classifier allows detecting generated images. The assistance tools can help with the manual labeling process.

5. Detecting Generated Images In the Wild

Equipped with a valid detection pipeline, we can now explore the prevalence of generated images on Twitter. To this end, we first need to calibrate the detection to the real-world setup before we can present the final results.

Manual Labeling

To begin with, we have to label a subset of 𝒟W𝕏superscriptsubscript𝒟𝑊𝕏\mathcal{D}_{W}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT. First, this allows us to get a detection threshold, so that we are able to classify individual images as real or generated. Note that in the controlled setup before, we evaluated the overall performance of 𝒞R𝕏,P𝕏/F𝕏subscript𝒞superscript𝑅𝕏superscript𝑃𝕏superscript𝐹𝕏\mathcal{C}_{R^{\mathbb{X}},P^{\mathbb{X}}/F^{\mathbb{X}}}caligraphic_C start_POSTSUBSCRIPT italic_R start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT / italic_F start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT end_POSTSUBSCRIPT with the AUC metric that takes into account all possible thresholds and thus does not require picking a specific value. Second, a separate labeled set is necessary to estimate error rates.

Unfortunately, manual labeling of all samples in 𝒟W𝕏superscriptsubscript𝒟𝑊𝕏\mathcal{D}_{W}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT is unfeasible due to the sheer volume of samples within the dataset. Thus, we resort to a random subset, containing 10 %times10percent10\text{\,}\mathrm{\char 37\relax}start_ARG 10 end_ARG start_ARG times end_ARG start_ARG % end_ARG of all samples. We then sort these images based on their score (from 𝒞R𝕏,P𝕏/F𝕏subscript𝒞superscript𝑅𝕏superscript𝑃𝕏superscript𝐹𝕏\mathcal{C}_{R^{\mathbb{X}},P^{\mathbb{X}}/F^{\mathbb{X}}}caligraphic_C start_POSTSUBSCRIPT italic_R start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT / italic_F start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT end_POSTSUBSCRIPT) from low (real) to high (fake) and select the top 1 00010001\,0001 000 images that pass ϕitalic-ϕ\phiitalic_ϕ for manual labeling. We acknowledge that choosing the subset based on the classifier that we are trying to evaluate introduces an unwanted bias: there could be fake images with very low scores that are overlooked. However, we argue that this approach strikes a balance between practicability and a sound estimation. Selecting the subset by pure chance would require an enormous amount of manual labeling to gather a sufficient number of fake images. Moreover, the scores of our subset range from 1.0 to 0.33. From the test set of 𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT, only 3 out of 1 00010001\,0001 000 images get a score below 0.33. We therefore assume that only a very small number of false negatives is potentially overlooked.

We carefully inspect each image and, if it is aligned, its reconstruction from GAN inversion. We label an image as real if the framing and pose do not match with that of 𝒟Fsubscript𝒟𝐹\mathcal{D}_{F}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT, if it contains a complex and meaningful background, or if the reconstruction deviates significantly from the original. In contrast, images are labeled as fake if they contain diffuse backgrounds, asymmetries (eyes, earrings), unnatural clothing, color artifacts, and/or an almost perfect reconstruction. By doing so, we obtain 185 images labeled as “Real”, 725 images labeled as “Fake”, and 90 images labeled as “Unsure”. Most images with label “Unsure” resemble images from TPDNE, but do not contain clear artifacts or were strongly edited. We also assigned this label if we suspect that an image was generated using a different kind of generative model. We randomly split the 910 images labeled as “Real” or “Fake” into a validation set (for calibrating the threshold) and a test set (for estimating the error rates) of equal size, maintaining the label ratio in both splits.

Choosing a Threshold

Due to the high imbalance between real and generated images in 𝒟W𝕏superscriptsubscript𝒟𝑊𝕏\mathcal{D}_{W}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT, choosing an appropriate threshold is not trivial. A too high threshold leads to many overlooked fake images (low recall), while a too low threshold leads to many real images classified as fake (low precision). As recall and precision are equally relevant in our setting, we follow the common practice to select the threshold based on the F1-score (based on the validation set). The best F1-score (0.9832) is achieved using a threshold of 0.9899361. Such a high threshold might appear counterintuitive. Yet, Figure 3 shows that most fake images are confidently classified as fake—with scores very close or equal to 1. The scores of real images, however, have greater variation. Thus, choosing a relatively high threshold gives the best performance. Note that the scores of real images in this subset are not representative for all real images, since we purposely selected images with high scores.

Refer to caption
Figure 3. Score distribution of manually labeled images.

Estimating Error Rates

Equipped with our selected threshold, we can now estimate the error rates of our detector. We start with the test set of our manually labeled subset and calculate the FNR and FDR here. The FNR, the fraction of mislabeled fake images, is 3.03 %times3.03percent3.03\text{\,}\mathrm{\char 37\relax}start_ARG 3.03 end_ARG start_ARG times end_ARG start_ARG % end_ARG. The FDR, the fraction of real images among all images classified as fake, is 1.4 %times1.4percent1.4\text{\,}\mathrm{\char 37\relax}start_ARG 1.4 end_ARG start_ARG times end_ARG start_ARG % end_ARG.

To understand the errors, we take a closer look on the misclassified images. Figure 4 shows the false negatives within the test set together with their scores. Although the majority actually gets a high score and is only classified as real due to the high threshold, three images have a considerably lower score. We cannot identify a pattern which causes their misclassification. Neither do we observe any characteristics that would explain the real images classified as fake (FDR). As these profile images are real users, we cannot provide visual examples here.

In addition, we can leverage our independent dataset 𝒟D𝕏superscriptsubscript𝒟𝐷𝕏\mathcal{D}_{D}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT of fake profile images that were spotted by users on the web before. We obtain a low FNR of 2.88 %times2.88percent2.88\text{\,}\mathrm{\char 37\relax}start_ARG 2.88 end_ARG start_ARG times end_ARG start_ARG % end_ARG, that is, 39393939 out of 1 35313531\,3531 353 fake profile images are incorrectly labeled as real. All images pass the prefilter ϕitalic-ϕ\phiitalic_ϕ. All images in 𝒟D𝕏superscriptsubscript𝒟𝐷𝕏\mathcal{D}_{D}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT are aligned according to our definition. False negatives therefore only depend on the classifier’s score.

Overall, we can confirm the performance of our detector on two different test sets. While the errors rate are not zero, they are small enough to draw conclusions in our analysis in the next section.

Refer to caption
Figure 4. Examples of fake images falsely classified as real, together with their classification score.

Prevalence of Fake Profiles on Twitter

We are now ready for the final step. We apply our detection scheme on the entire in-the-wild dataset 𝒟W𝕏superscriptsubscript𝒟𝑊𝕏\mathcal{D}_{W}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT. The pre-filter ϕitalic-ϕ\phiitalic_ϕ discards 8 535 32285353228\,535\,3228 535 322 images, reducing the number of images by 56.94 %times56.94percent56.94\text{\,}\mathrm{\char 37\relax}start_ARG 56.94 end_ARG start_ARG times end_ARG start_ARG % end_ARG. Next, using our detector, we classify 7 72377237\,7237 723 profile images as fake. This is 0.052 %times0.052percent0.052\text{\,}\mathrm{\char 37\relax}start_ARG 0.052 end_ARG start_ARG times end_ARG start_ARG % end_ARG of the full dataset. In the next section, we analyze the profiles behind these images and their tweets in more detail.

6. Analysis

Our goal in this section is to understand the context where the generated profile images are used. To this end, we first perform an analysis of the accounts behind these images (Section 6.1). Then, we thoroughly analyze the content of the tweets that were sent from these accounts (Section 6.2). For simplicity, we refer to accounts using generated profile images as “fake-image accounts” as opposed to “real-image accounts” in the following.

6.1. User Metrics

We begin by analyzing the difference between fake-image and real-image accounts regarding social connections, account activity, as well as account creation and status.

Social Connections

On Twitter, social interactions are primarily measured in the number of followers an account has and the number of other accounts it follows. Figures 5(a) and 5(b) visualize the distribution of these metrics for real- and fake-image accounts at the time of data collection. We find that fake-image accounts have fewer followers (mean: 393.35, median: 60) compared to real-image accounts (mean: 5 086.385086.385\,086.385 086.38, median: 165) in our dataset. 1 99719971\,9971 997 (25.86 %times25.86percent25.86\text{\,}\mathrm{\char 37\relax}start_ARG 25.86 end_ARG start_ARG times end_ARG start_ARG % end_ARG) of all fake-image accounts have 9999 or fewer followers and 1 06310631\,0631 063 (13.76 %times13.76percent13.76\text{\,}\mathrm{\char 37\relax}start_ARG 13.76 end_ARG start_ARG times end_ARG start_ARG % end_ARG) have exactly zero followers. We notice that 1 99619961\,9961 996 fake-image accounts (25.84 %times25.84percent25.84\text{\,}\mathrm{\char 37\relax}start_ARG 25.84 end_ARG start_ARG times end_ARG start_ARG % end_ARG) have exactly 106 followers. Our content analysis in Section 6.2 reveals that these accounts belong to a large cluster of fake accounts involved in coordinated inauthentic behavior.

We find that fake-image accounts also follow fewer other accounts (mean: 283.18, median: 21) compared to real-image accounts (mean: 759.83, median: 262). Interestingly, 2 17521752\,1752 175 fake-image accounts (28.16 %times28.16percent28.16\text{\,}\mathrm{\char 37\relax}start_ARG 28.16 end_ARG start_ARG times end_ARG start_ARG % end_ARG) follow exactly two other accounts. In contrast to the number of followers, a relatively small number of fake-image accounts (163163163163, 2.11 %times2.11percent2.11\text{\,}\mathrm{\char 37\relax}start_ARG 2.11 end_ARG start_ARG times end_ARG start_ARG % end_ARG) follows exactly zero other accounts.

Activity

Figure 5(c) shows that fake-image accounts do participate in Twitter based on the number of tweets. Yet, they are overall less active than real-image accounts. On average, fake-image accounts posted 3 158.93158.93\,158.93 158.9 (median: 112) tweets, as opposed to 17 096.3917096.3917\,096.3917 096.39 (median: 3450) tweets from real-image accounts. 1 94819481\,9481 948 (25.22 %times25.22percent25.22\text{\,}\mathrm{\char 37\relax}start_ARG 25.22 end_ARG start_ARG times end_ARG start_ARG % end_ARG) of all fake-image accounts have 10 or fewer tweets. In addition, Figure 5(d) shows the average number of tweets per day, calculated by dividing the total number of tweets by the number of days the account exists. Based on the median, fake-image accounts are still less active than real-image accounts (0.95 vs. 3.7 tweets per day). However, a large fraction of fake-image accounts posts exceptionally many tweets per day, causing a higher mean (19.96 vs. 13.56 tweets per day). In particular, there are 266 fake-image accounts (3.44 %times3.44percent3.44\text{\,}\mathrm{\char 37\relax}start_ARG 3.44 end_ARG start_ARG times end_ARG start_ARG % end_ARG) that submitted more than 100 tweets per day.

Refer to caption
(a) Number of followers.
Refer to caption
(b) Number of users an account follows.
Refer to caption
(c) Number of tweets.
Refer to caption
(d) Average number of tweets per day.
Figure 5. Distributions of user metrics from real- and fake-image accounts. The points depict 1 00010001\,0001 000 randomly selected samples from each class, respectively.
Refer to caption
(a) Real- and fake-image accounts.
Refer to caption
(b) Fake-image accounts split by status.
Refer to caption
(c) Similar to (b), showing only the last three months.
Figure 6. Distributions of account creation times from real- and fake-image accounts. In (b) and (c) we differentiate fake-image accounts by their status nine months after data collection. The points depict up to 1 00010001\,0001 000 randomly selected samples for each label and status, respectively.

Account Creation and Status

Figure 6(a) compares the times of account creation. Fake-image accounts are considerably “younger”, with more than half of them (52.38 %times52.38percent52.38\text{\,}\mathrm{\char 37\relax}start_ARG 52.38 end_ARG start_ARG times end_ARG start_ARG % end_ARG) being created in 2023 (note that our data collection happened in March 2023). In contrast, only 6.22 %times6.22percent6.22\text{\,}\mathrm{\char 37\relax}start_ARG 6.22 end_ARG start_ARG times end_ARG start_ARG % end_ARG of real-image accounts have been created in this period.

In addition to the creation date, we also examine the account status after a certain period of time. We checked the status of all 7 72377237\,7237 723 fake-image accounts nine months after data collection by querying the respective profile page. As a reference, we did the same for an equal number of randomly sampled real-image accounts. Accounts can be either alive, deactivated (by the user), or suspended (by Twitter). Figure 7 illustrates that more than half of the fake-image accounts (52.07 %times52.07percent52.07\text{\,}\mathrm{\char 37\relax}start_ARG 52.07 end_ARG start_ARG times end_ARG start_ARG % end_ARG) have been suspended. In contrast, only 5.01 %times5.01percent5.01\text{\,}\mathrm{\char 37\relax}start_ARG 5.01 end_ARG start_ARG times end_ARG start_ARG % end_ARG of real-image accounts in the reference set have been suspended. The high number of suspended fake-image accounts suggests that they were violating Twitter’s rules.

In Figures 6(b) and 6(c), we analyze the account creation of fake-image accounts given their status. We observe various suspended accounts that were created in bulk just shortly before our data collection, especially in the middle of February. Note that we do not know when these accounts were suspended, so that we cannot determine the effective lifetime of these accounts.

Refer to caption
Figure 7. Account status nine months after data collection.

Takeaways

Our analysis shows that real and fake-image accounts notably differ. Fake-image accounts have fewer social interactions, both regarding the number of followers and the number of accounts they follow. While these metrics are distributed evenly for real-image accounts, we observe patterns with fake-image accounts. There are large groups with identical values, indicating an orchestrated network of inauthentic users. Moreover, fake-image accounts are not passive, they considerably participate in Twitter based on the number of tweets. Although they are in general less active than real-image accounts, there are several fake-image accounts that post very frequently, hinting towards spamming attacks. Finally, fake-image accounts have a more limited lifetime. They are usually created more recently than real-image accounts, and they are also disproportionately often suspended by Twitter. This suggests inauthentic behavior. Moreover, a substantial number was created in bulk just before our data collection period started. This bulk creation (or batch creation) is a common pattern for inauthentic behavior, used, for example, to amplify messages or to participate in spamming or trolling activities \parencitegurajalaFakeTwitterAccounts2015,ferraraTwitterSpamFalse2022.

6.2. Content Analysis

To evaluate the purpose of the identified fake-image accounts, we proceed to analyze their tweets (original as well as retweets) posted in 2023. We utilize data collected in the context of a large-scale Twitter stream archiving effort \parencitefafalios2018tweetskb based on Twitter’s 1 %times1percent1\text{\,}\mathrm{\char 37\relax}start_ARG 1 end_ARG start_ARG times end_ARG start_ARG % end_ARG sampled stream (the same we used to create 𝒟W𝕏superscriptsubscript𝒟𝑊𝕏\mathcal{D}_{W}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT). This allows us to access information about the activity of the profiles before and after the profile collection week (until Twitter restricted access to its API in June 2023). In total, we have access to 111 165111165111\,165111 165 tweets from the 7 72377237\,7237 723 fake-image accounts in our collection.

We begin our analysis with the language and availability. The upper half of Figure 8 shows a breakdown of the number of tweets per language. Using the accounts’ status nine months after our data collection (cf. Section 6.1), we can also calculate the fraction of unavailable tweets. Overall, 49.6 %times49.6percent49.6\text{\,}\mathrm{\char 37\relax}start_ARG 49.6 end_ARG start_ARG times end_ARG start_ARG % end_ARG of all tweets were unavailable after nine months. Interestingly, Turkish and Arabic stand out as languages with significantly higher unavailability rates (87.95 %times87.95percent87.95\text{\,}\mathrm{\char 37\relax}start_ARG 87.95 end_ARG start_ARG times end_ARG start_ARG % end_ARG and 95.59 %times95.59percent95.59\text{\,}\mathrm{\char 37\relax}start_ARG 95.59 end_ARG start_ARG times end_ARG start_ARG % end_ARG, respectively) than other languages. The number of unique accounts that created the tweets in each language are reported in the lower half of Figure 8. It shows that Turkish tweets, for instance, stem from a relatively small number of users.

Refer to caption
Figure 8. Number of tweets (top) and corresponding accounts (bottom) for the 20 most represented languages. The smaller bar indicates the number of unavailable tweets/accounts in each language. Note that a single account can be associated with multiple languages.

We proceed with a textual analysis. To identify structural patterns, we employ state-of-the-art sentence embeddings \parencitereimers-2019-sentence-bert to group the tweet texts into semantically related clusters. We utilize the cosine similarity between the sentence embeddings to determine cluster belonging. A new observation (tweet) is assigned to an existing cluster if a certain similarity threshold (in our case 0.6) is reached. Otherwise a new cluster will be generated. Furthermore, we limit our analysis on clusters that exhibit a minimum cluster size of 50 (i.e., at least 50 tweets should be in one cluster). This approach allows us to identify dominant trends. Note that it does not provide a distribution of topics, because not every tweet is assigned to a cluster. For the purpose of visualization, we use UMAP \parencitemcinnesUMAPUniformManifold2020 as a dimensionality reduction technique to generate a two-dimensional representation of the clustering outcome (cf. Figure 9). For each cluster, we calculate the class-based term frequency–inverse document frequency (TF-IDF) terms to determine representative class tokens. In a subsequent step, we conduct a manual qualitative review of all clusters to identify and describe common themes, which are detailed in the following paragraphs. We describe the general cluster contents and provide representative examples for important topics. We also analyze the metadata of accounts within cluster and report unusual characteristics.

Refer to captiondesantis, trump, florida, governor#auspol, linda,voicecovid, vaccine,people, vaccinesukraine, ukrainian,nazi, #ukraine, war#stockmarket,crypto, nft, #banknifty, chart, dayheart, amazing, looks, cute, gorgeous, cute, beautiful, lovely, charmingwin, giveaway drop, follow, retweet
Figure 9. UMAP representation of English tweets posted by users that are still available. Distinct (groups of) clusters are annotated by their most representative tokens. Different clusters are separated by color.

English (Unavailable tweets)

The clustering for English content posted by users that are not on the platform anymore reveals a notable pattern: we observe a single, extremely large cluster that encompasses 49.67 %times49.67percent49.67\text{\,}\mathrm{\char 37\relax}start_ARG 49.67 end_ARG start_ARG times end_ARG start_ARG % end_ARG of all unavailable English tweets. Despite the variability of the actual content, these tweets all share a common structure. Each tweet begins by mentioning a specific Twitter user, followed by a short sequence of English terms. Interestingly, these sequences do not form logical sentences, so they are neither semantically nor syntactically correct. Each of these sentences is then followed by a specific Chinese hashtag that can be translated to: “This is really useful”. Unfortunately, we can only speculate about their purpose. Our hypothesis is that the embedded hyperlinks within the tweets may have directed users to malicious external websites. As the links are no longer functional, we cannot verify this hypothesis.

The accounts’ metadata corroborate the assumption that the 1 57915791\,5791 579 accounts within this cluster were part of an organized network. All but three were created between February 16 and February 20, which is consistent with our observations in Figure 6(c). Up to 754 accounts were created on a single day. We also find that this cluster contributes to the large number of accounts with identical social connections (cf. Figure 5(a) and Figure 5(b)). 94.93 %times94.93percent94.93\text{\,}\mathrm{\char 37\relax}start_ARG 94.93 end_ARG start_ARG times end_ARG start_ARG % end_ARG have exactly 106 followers and 95.31 %times95.31percent95.31\text{\,}\mathrm{\char 37\relax}start_ARG 95.31 end_ARG start_ARG times end_ARG start_ARG % end_ARG follow exactly two other accounts. The usernames (Twitter handles) appear to be constructed from a list of German-sounding first names and last names (or initials), and optionally one or multiple digits (e.g., @GuntherForstner86). 67.44 %times67.44percent67.44\text{\,}\mathrm{\char 37\relax}start_ARG 67.44 end_ARG start_ARG times end_ARG start_ARG % end_ARG of all accounts have the same display name that can be translated to “Noon Namshi Sivvi discount code is strong and effective” (Noon and Namshi are e-commerce platforms operating in the Arabic region). These accounts also have their location set to “KSA” (Kingdom of Saudi Arabia). Moreover, the accounts contain nonsense descriptions like “Personal west service street laugh small.”. We hypothesize that these were automatically generated or translated. Again, we can only speculate about the reason, especially about the mixed use of English, German, Arabic, and Chinese language. We also notice that 99.18 %times99.18percent99.18\text{\,}\mathrm{\char 37\relax}start_ARG 99.18 end_ARG start_ARG times end_ARG start_ARG % end_ARG of all accounts in this cluster use profile images that are duplicates within our dataset of fake-image accounts. Appendix D elaborates our method for identifying duplicate images.

The remaining clusters mainly focus on giveaways, often related to cryptocurrencies, with tweets like

$50 (2 winners x $25) 24 hours - like, follow)
i will #giveaway 100 usdt worth of $loop as we cele-00brate our 10k milestone

or the promotion of illegal content such as links to broadcasting streams of soccer matches, e.g.,

live stream arouca vs benfica live [link]

Another trend is the distribution of links to websites and Telegram groups containing explicit content, e.g.,

follow for more [link]

English (Available tweets)

The clustering of tweets from users who were still active after nine months reveals similarities and differences. Figure 9 depicts a visual representation of the top clusters with their representative text tokens. A significant portion of all clusters is again related to various forms of cryptocurrency, stocks, and giveaways, e.g.,

drop your #tezos #nft if you need it sold!
15000$ in $eth — 5 lucky winners!

Additionally, we find a significant share of adult content/porn related clusters, actively advertising explicit content, also through dedicated patterns like

beautiful/charming/etc. [profile of porn actress] [link]

Compared to inactive users, we observe that available accounts also engage in discussions on contentious or political issues. These include, for example, the war in Ukraine, election-related discourse, and debates on COVID and vaccinations:

welcome to nazi ukraine #russia
desantis racks up wins while trump, potential 2024 00opponents take swipes at florida governor
albos crocodile tears: watch this video, that the main-00stream media refuses to show.
someone needs to find an antidote for the vaxxx

Turkish (Unavailable tweets)

For the Turkish accounts, we restrict our analysis to content posted by users who have been removed, since this is the majority of the dataset. Our findings indicate that all of this content is related to pornography or escort services. The primary distinction among the clusters are the cities mentioned within the posts. Most tweets also contain links to other websites, which are no longer functional. Upon examining the metadata of all 932 accounts, we again identify the systematic pattern for usernames that we already observed in the large cluster of English tweets. However, first and last names appear to be of Turkish descent. Moreover, almost all accounts have their location set to a real Turkish city. 46.78 %times46.78percent46.78\text{\,}\mathrm{\char 37\relax}start_ARG 46.78 end_ARG start_ARG times end_ARG start_ARG % end_ARG of all accounts again use duplicate profile images and 85.52 %times85.52percent85.52\text{\,}\mathrm{\char 37\relax}start_ARG 85.52 end_ARG start_ARG times end_ARG start_ARG % end_ARG were created within one month. These findings again indicate that at least some systematic approach (automatic or semi-automatic) is used to generate the accounts.

Arabic (Unavailable tweets)

For Arabic tweets, we again only consider accounts that have been suspended. All clustered tweets appear to be related to literature, with individual clusters being characterized by mentions of certain authors, countries, or topics—all related to the Arabic region. These tweets make up 72.34 %times72.34percent72.34\text{\,}\mathrm{\char 37\relax}start_ARG 72.34 end_ARG start_ARG times end_ARG start_ARG % end_ARG of all unavailable Arabic tweets. Surprisingly, the tweets share a common structure with those from the large cluster of English tweets: they contain the specific Chinese hashtag, an external link, and an incoherent sentence. Our metadata analysis suggests that the 1 80618061\,8061 806 accounts indeed belong to the same cluster, despite the different language. Almost all accounts were created between February 16 and February 20, with 892 being created on a single day. We observe the same anomalies regarding the (German) usernames, locations, descriptions, and social connections. Given the book-related content and the frequently occurring username that promotes a discount code, we hypothesize that the external links might have referred to the respective shop** platforms.

Takeaways

Our content analysis reveals that English, Turkish, and Arabic are the dominant languages used by the fake-image accounts in our collection. We identify large networks of fake-image accounts that were probably automatically created and that participated in large-scale spamming attacks. We observe recurring patterns as part of the automation. Accounts are created in bulk. Tweets, usernames, locations, descriptions, and social connections follow a systematic pattern. Multiple accounts within a network share the same profile image. Furthermore, our analysis shows that frequently occurring topics are cryptocurrencies, giveaways, and content related to pornography and escort services. Fake-image accounts also participate in controversial political discussions. These findings align with prior analyses of inauthentic content on Twitter \parenciteratkiewiczDetectingTrackingPolitical2011,cresciDecadeSocialBot2020,nizzoliChartingLandscapeOnline2020,pfefferJustAnotherDay2023.

6.3. Sample Study on Available Accounts

Finally, we analyze the current behavior of fake-image accounts that are still alive at the time of writing (February–March 2024). This gives insights about the use-case of rather long-term fake-accounts. As we cannot use data from Twitter’s API any longer, we randomly select 1 00010001\,0001 000 available fake-image accounts and visit their Twitter profile manually. Two annotators independently check the most recent tweets and assign a topic to each profile (Cohen’s kappa: 0.84). Accounts where both annotators disagree are revisited. We choose topics from five categories, so that we can get a broad understanding of the prevalent application scenarios.

Refer to caption
Figure 10. Topic distribution of 1 00010001\,0001 000 manually inspected available accounts.

Figure 10 depicts the distribution of topics. The majority of fake-image accounts participates in the political discourse (36.1 %times36.1percent36.1\text{\,}\mathrm{\char 37\relax}start_ARG 36.1 end_ARG start_ARG times end_ARG start_ARG % end_ARG) or shares finance-related content (33.1 %times33.1percent33.1\text{\,}\mathrm{\char 37\relax}start_ARG 33.1 end_ARG start_ARG times end_ARG start_ARG % end_ARG), mostly related to cryptocurrencies. 5.1 %times5.1percent5.1\text{\,}\mathrm{\char 37\relax}start_ARG 5.1 end_ARG start_ARG times end_ARG start_ARG % end_ARG of the profiles revolve around other websites or products (“Business”), while 4.3 %times4.3percent4.3\text{\,}\mathrm{\char 37\relax}start_ARG 4.3 end_ARG start_ARG times end_ARG start_ARG % end_ARG share explicit content or promote escort services (“Sex”). The remaining accounts (21.4 %times21.4percent21.4\text{\,}\mathrm{\char 37\relax}start_ARG 21.4 end_ARG start_ARG times end_ARG start_ARG % end_ARG) cover diverse topics or have an empty timeline. Taken together, we observe similar topics as before in our cluster analysis.

6.4. Summary

Our systematic analysis revealed 7 72377237\,7237 723 Twitter accounts that use AI-generated profile images. By analyzing both their user metrics and the content of their tweets, we identify particular patterns. Some of these patterns, like the high number of suspended accounts, striking similarities within the accounts’ properties, or the multitude of similar tweets posted by different users, represent strong evidence that a subset of these accounts are part of organized, inauthentic networks. While many accounts amplify content related to cryptocurrencies or pornography, we also observe accounts that express controversial political opinions.

7. Ablation Study

Before finishing our study, we shortly confirm the design choices of the classification methodology proposed in Section 3. In particular, we justify the need to train our own classifier (Section 7.1) and study the impact of training data (Section 7.2).

7.1. Evaluation of Pre-Trained Detectors

Detecting GAN-generated images is a well-researched problem and several pre-trained detectors have been proposed (cf. Section 2). However, we observe that the performance of these detectors suffer from Twitter’s image processing, making it necessary to directly train a classifier on processed profile images.

Setup

We test three existing pre-trained classifiers: 𝒞Wangsubscript𝒞Wang\mathcal{C}_{\text{Wang}}caligraphic_C start_POSTSUBSCRIPT Wang end_POSTSUBSCRIPT \parencitewangCNNgeneratedImagesAre2020 (which is the basis of our classifier), 𝒞Gragsubscript𝒞Grag\mathcal{C}_{\text{Grag}}caligraphic_C start_POSTSUBSCRIPT Grag end_POSTSUBSCRIPT \parencitegragnanielloAreGANGenerated2021, and 𝒞Ojhasubscript𝒞Ojha\mathcal{C}_{\text{Ojha}}caligraphic_C start_POSTSUBSCRIPT Ojha end_POSTSUBSCRIPT \parenciteojhaUniversalFakeImage2023. Appendix C provides more details on these three classifiers. We evaluate four conditions (see Table 3). The conditions (a)-(c) correspond to those in Figure 1 and all use images processed by Twitter. We additionally test the pre-trained detectors on unprocessed images in condition (d).

Results

Table 3 shows the AUCs of the three classifiers compared to 𝒞R𝕏,P𝕏/F𝕏subscript𝒞superscript𝑅𝕏superscript𝑃𝕏superscript𝐹𝕏\mathcal{C}_{R^{\mathbb{X}},P^{\mathbb{X}}/F^{\mathbb{X}}}caligraphic_C start_POSTSUBSCRIPT italic_R start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT / italic_F start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. Our trained detector significantly outperforms the pre-trained classifiers under the Twitter conditions (a)-(c). The fact that the latter perform better under the clean condition (d) demonstrates the strong effect of Twitter’s processing. It is therefore not possible to use a pre-trained detector for our study of in-the-wild profile images.

Table 3. Evaluation of existing pre-trained detectors. We report the AUCs under different conditions.
Condition 𝒞Wangsubscript𝒞Wang\mathcal{C}_{\text{Wang}}caligraphic_C start_POSTSUBSCRIPT Wang end_POSTSUBSCRIPT 𝒞Gragsubscript𝒞Grag\mathcal{C}_{\text{Grag}}caligraphic_C start_POSTSUBSCRIPT Grag end_POSTSUBSCRIPT 𝒞Ojhasubscript𝒞Ojha\mathcal{C}_{\text{Ojha}}caligraphic_C start_POSTSUBSCRIPT Ojha end_POSTSUBSCRIPT 𝒞R𝕏,P𝕏/F𝕏subscript𝒞superscript𝑅𝕏superscript𝑃𝕏superscript𝐹𝕏\mathcal{C}_{R^{\mathbb{X}},P^{\mathbb{X}}/F^{\mathbb{X}}}caligraphic_C start_POSTSUBSCRIPT italic_R start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT / italic_F start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT end_POSTSUBSCRIPT
(a) 𝒟R𝕏superscriptsubscript𝒟𝑅𝕏\mathcal{D}_{R}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT vs. 𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT 0.7279 0.9249 0.6405 0.9998
(b) 𝒟R𝕏superscriptsubscript𝒟𝑅superscript𝕏\mathcal{D}_{R}^{\mathbb{X}^{\prime}}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT vs. 𝒟F𝕏superscriptsubscript𝒟𝐹superscript𝕏\mathcal{D}_{F}^{\mathbb{X}^{\prime}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT 0.7243 0.9600 0.6338 0.9997
(c) 𝒟P𝕏superscriptsubscript𝒟𝑃𝕏\mathcal{D}_{P}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT vs. 𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT 0.8713 0.9015 0.6922 0.9998
(d) 𝒟Rsubscript𝒟𝑅\mathcal{D}_{R}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT vs. 𝒟Fsubscript𝒟𝐹\mathcal{D}_{F}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT 0.9466 1.0000 0.8296

7.2. Effect of Training Data

Our datasets described in Section 3.1 allow for different combinations of training data. In the following, we justify the choice of training our detector on real images, proxy-labeled real images, and fake images.

Setup

We consider three classifier variants and analyze their performance under the three conditions from Figure 1, respectively. The classifier 𝒞R/Fsubscript𝒞𝑅𝐹\mathcal{C}_{R/F}caligraphic_C start_POSTSUBSCRIPT italic_R / italic_F end_POSTSUBSCRIPT is trained on 𝒟Rsubscript𝒟𝑅\mathcal{D}_{R}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT and 𝒟Fsubscript𝒟𝐹\mathcal{D}_{F}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT and represents the most straightforward option. The images are not processed by Twitter, but we resize them to 400×400400400400\times 400400 × 400 pixels to match the resolution of actual profile images. The second classifier, 𝒞R𝕏/F𝕏subscript𝒞superscript𝑅𝕏superscript𝐹𝕏\mathcal{C}_{R^{\mathbb{X}}/F^{\mathbb{X}}}caligraphic_C start_POSTSUBSCRIPT italic_R start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT / italic_F start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, is trained on the same images but with Twitter’s processing. Finally, 𝒞R𝕏,P𝕏/F𝕏subscript𝒞superscript𝑅𝕏superscript𝑃𝕏superscript𝐹𝕏\mathcal{C}_{R^{\mathbb{X}},P^{\mathbb{X}}/F^{\mathbb{X}}}caligraphic_C start_POSTSUBSCRIPT italic_R start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT / italic_F start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is additionally trained on 𝒟P𝕏superscriptsubscript𝒟𝑃𝕏\mathcal{D}_{P}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT as real images.

Results

Table 4 shows the AUCs of the three detector variants under the different conditions. Our finally chosen classifier, 𝒞R𝕏,P𝕏/F𝕏subscript𝒞superscript𝑅𝕏superscript𝑃𝕏superscript𝐹𝕏\mathcal{C}_{R^{\mathbb{X}},P^{\mathbb{X}}/F^{\mathbb{X}}}caligraphic_C start_POSTSUBSCRIPT italic_R start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT / italic_F start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, has the highest performance in all conditions. The classifier 𝒞R/Fsubscript𝒞𝑅𝐹\mathcal{C}_{R/F}caligraphic_C start_POSTSUBSCRIPT italic_R / italic_F end_POSTSUBSCRIPT trained on unprocessed images performs worse than both variants trained on processed images, confirming our findings from Section 7.1. Note that, while an AUC >0.99absent0.99>0.99> 0.99 is still very high, the small difference can cause a significant increase in false positives, given the size of 𝒟W𝕏superscriptsubscript𝒟𝑊𝕏\mathcal{D}_{W}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT. Overall, our results provide two insights. First, the classifier 𝒞𝒞\mathcal{C}caligraphic_C should be trained on images that are processed similarly to the target images. Second, including proxy-labeled real images from the target distribution (𝒟P𝕏superscriptsubscript𝒟𝑃𝕏\mathcal{D}_{P}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT) improves the detection performance. A closer look shows that this causes a better separation of the classifier scores, shifting scores towards either end of the output range. This motivates our choice of 𝒞R𝕏,P𝕏/F𝕏subscript𝒞superscript𝑅𝕏superscript𝑃𝕏superscript𝐹𝕏\mathcal{C}_{R^{\mathbb{X}},P^{\mathbb{X}}/F^{\mathbb{X}}}caligraphic_C start_POSTSUBSCRIPT italic_R start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT / italic_F start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT end_POSTSUBSCRIPT for our study.

Table 4. Evaluation of three detector variants trained on different datasets. We report the AUCs under different conditions.
Condition 𝒞R/Fsubscript𝒞𝑅𝐹\mathcal{C}_{R/F}caligraphic_C start_POSTSUBSCRIPT italic_R / italic_F end_POSTSUBSCRIPT 𝒞R𝕏/F𝕏subscript𝒞superscript𝑅𝕏superscript𝐹𝕏\mathcal{C}_{R^{\mathbb{X}}/F^{\mathbb{X}}}caligraphic_C start_POSTSUBSCRIPT italic_R start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT / italic_F start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT end_POSTSUBSCRIPT 𝒞R𝕏,P𝕏/F𝕏subscript𝒞superscript𝑅𝕏superscript𝑃𝕏superscript𝐹𝕏\mathcal{C}_{R^{\mathbb{X}},P^{\mathbb{X}}/F^{\mathbb{X}}}caligraphic_C start_POSTSUBSCRIPT italic_R start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT / italic_F start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT end_POSTSUBSCRIPT
(a) 𝒟R𝕏superscriptsubscript𝒟𝑅𝕏\mathcal{D}_{R}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT vs. 𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT 0.9971 0.9998 0.9998
(b) 𝒟R𝕏superscriptsubscript𝒟𝑅superscript𝕏\mathcal{D}_{R}^{\mathbb{X}^{\prime}}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT vs. 𝒟F𝕏superscriptsubscript𝒟𝐹superscript𝕏\mathcal{D}_{F}^{\mathbb{X}^{\prime}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT 0.9953 0.9995 0.9997
(c) 𝒟P𝕏superscriptsubscript𝒟𝑃𝕏\mathcal{D}_{P}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT vs. 𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT 0.9983 0.9994 0.9998

8. Discussion and Limitations

Our work systematically examines the prevalence of generated images on Twitter. Despite great effort, our study has limitations that we discuss in the following.

Sampling Bias

The restricted 1 %times1percent1\text{\,}\mathrm{\char 37\relax}start_ARG 1 end_ARG start_ARG times end_ARG start_ARG % end_ARG access to Twitter as well as the limited and randomly chosen collection period can introduce a sampling bias \parenciteArpQuiPen+22 to our study. Especially the presence of several large clusters with seemingly orchestrated accounts in our collected dataset has a significant effect on our analysis. These clusters and their concrete topics are expected to change over time. Nevertheless, the characteristics, such as the bulk creation of accounts, should generally apply. The same holds for high-level tendencies, such as political amplification or spamming. These are also in line with prior observations on Twitter misuse \parenciteratkiewiczDetectingTrackingPolitical2011,cresciDecadeSocialBot2020. Finally, we note a possible bias due to the restructuring of Twitter/𝕏𝕏\mathbb{X}blackboard_X after the takeover by Elon Musk. It is possible that with the rise of hate speech and bots \parencitehickeyAuditingElonMusk2023, the prevalence of generated profile images has also increased. Unfortunately, the current API limits impede a replication of our analysis.

Selection Bias in Analysis

A full analysis of all tweets is beyond the scope of our work. Thus, our cluster analysis is not exhaustive and only focuses on the prevalent trends. Still, this allows us to identify the primary contexts in which generated images are used on Twitter, so that we can draw general conclusions on topics.

Focus on Images Generated from TPDNE

We focus on facial images from TPDNE that are generated by StyleGAN2 \parencitekarrasAnalyzingImprovingImage2020. Although we are unable to provide statements regarding the prevalence of other types of generated images on Twitter, we expect to cover the most prevalent type. TPDNE has made it considerably easier to access generated images compared to other generative models. Several reports confirm that GAN-generated faces are in fact used by fake social media accounts \parencitenimmoOperationFFSFakeFace2019,nimmoOperationNavalGazing2020, nimmoIRAAgainUnlucky2020, graphikateamStepMyParler2020, stanfordinternetobservatoryReplyguysGoHunting2020,nimmoSpamouflageGoesAmerica,stanfordinternetobservatoryAnalysisTwitterTakedowns2020,strickAnalysisProchinaPropaganda2021,graphikateamFakeClusterBoosts2021,williamsPortraitModeGAN2022. Moreover, most alternative models need to be deployed locally. This requires technical knowledge and possibly specialized hardware. Although text-to-image models like Stable Diffusion or Midjourney can be accessed through a browser, generating images at scale may require significant time and additional costs. Achieving good images can require multiple attempts and services like Midjourney require payment. Finally, we note that detecting all kinds of generated images, especially in a real-world setting where images are heavily processed, is still an open challenge \parencitegragnanielloAreGANGenerated2021,corviDetectionSyntheticImages2023. Therefore, we focus on one setting where we aim at develo** a highly reliable detector.

Likelihood of Overlooked Fake Profiles

As discussed in Section 5, classifying in-the-wild data always requires trading off the number of overlooked fakes against the number of falsely detected real images. While we make our best efforts to evaluate the performance of our detection pipeline under realistic conditions, we cannot exclude that the actual FNR is higher than our estimate. Fake profile images with an unusual processing could potentially bypass our detector. Furthermore, our tool-assisted manual labeling process is not guaranteed to be error-free. However, as the FNR on the independent dataset 𝒟D𝕏superscriptsubscript𝒟𝐷𝕏\mathcal{D}_{D}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT closely matches our estimate, the likelihood of overlooked generated images should be low.

9. Related Work

Studying generated faces on social media touches different research areas. In the following, we examine related methods and concepts.

Detecting Generated Images on Social Media

Despite the plethora of proposed fake image detection methods (cf. Section 2), there exists only little work on the detection in real-world settings. \textciteboatoTrueFaceDatasetDetection2022 create a synthetic dataset of processed images by sharing real and generated images on different social media platforms. They find that a classifier trained on “original” images is not able to effectively detect shared images, unless it is fine-tuned. This confirms our results in Section 7. In a related work \parencitemarconDetectionManipulatedFace2021, the same methodology is applied to deepfake videos, yielding similar findings. \textcitesabelDetectingGeneratedMedia2021 present an approach to detect generated text and profile images on Twitter. They collect tweets related to controversial topics (e.g., COVID-19) and separately classify the tweet’s text and the corresponding profile image. Their method can detect generated media but is highly sensitive to selected thresholds. High precision thresholds cause a significant decrease in true positives, resulting in many overlooked generated images.

Closest to our work is the concurrent preprint by \textciteyangCharacteristicsPrevalenceFake2024. They estimate the prevalence of generated profile images on Twitter based on 254 275254275254\,275254 275 randomly sampled accounts using their proposed GANEyeDistance metric. This metric relies on StyleGAN2’s facial alignment by computing the distance between the actual and expected eye location. Their evaluation shows a FDR of 85.86 %times85.86percent85.86\text{\,}\mathrm{\char 37\relax}start_ARG 85.86 end_ARG start_ARG times end_ARG start_ARG % end_ARG, requiring to check each detected image manually. In Appendix E, we describe their method in more detail, reproduce it, and compare it with our method. We find that their approach is also vulnerable to simple geometric transformations, making it more likely to overlook generated faces. In contrast, we use a larger dataset and build a more robust detection method. Based on the results, \textciteyangCharacteristicsPrevalenceFake2024 estimate a lower bound of 0.0210.0210.0210.0210.044 %times0.044percent0.044\text{\,}\mathrm{\char 37\relax}start_ARG 0.044 end_ARG start_ARG times end_ARG start_ARG % end_ARG active Twitter accounts that use GAN-generated profile images. Our estimated rate with 0.052 %times0.052percent0.052\text{\,}\mathrm{\char 37\relax}start_ARG 0.052 end_ARG start_ARG times end_ARG start_ARG % end_ARG is slightly higher, which we attribute to our higher detection performance and the fact that we discard accounts with Twitter’s default profile image.

Human Perception of Generated Social Media Profiles

Since it is unlikely that artificially generated social media profiles can be prevented completely, studying their effect on humans and our society is crucial. \textciteminkDeepPhishUnderstandingUser2022 conduct a user study to measure users’ trust towards such profiles in a social engineering context. They find that users are likely to accept a connection request from a LinkedIn profile using generated faces or texts. Even participants that were explicitly informed about the presence of fake accounts had an acceptance rate of 43 %times43percent43\text{\,}\mathrm{\char 37\relax}start_ARG 43 end_ARG start_ARG times end_ARG start_ARG % end_ARG. A similar work \parenciterossiAreDeepLearninggenerated2023 in which participants were asked to label profiles as real or fake in a Twitter-like environment, shows that human performance is almost equivalent to random guessing (48.9 %times48.9percent48.9\text{\,}\mathrm{\char 37\relax}start_ARG 48.9 end_ARG start_ARG times end_ARG start_ARG % end_ARG). These findings emphasize the need for reliable detection methods of generated contents in social networks.

Social Media Studies

Complimentary to our work, a large body of interdisciplinary research has focused on the misuse of social media \parencitecresciDecadeSocialBot2020, ferrara_challenges, yardi2010detecting. For example, the 2016 US elections were marked by accusations of opinion manipulation through automated accounts on social media, particularly on Twitter, so that researchers investigated these inauthentic and coordinated campaigns \parencitebessi2016social,badam. In recent years, research has increasingly focused on the harms caused to online communities and the potential to manipulate public sentiment. Studies have extensively explored the roles of disinformation spread, online conspiracy proliferation, and political interference \parencitewang2023,shao2018spread,luceri2019evolution. Another research direction is the identification of inauthentic behavior in context of financial campaigns \parencitecresci-financial, tardelli. Recently, these research efforts are facing new challenges given the increasing use of AI-generated content by social bots \parenciteferrara_challenges.

10. Conclusion

Generative AI provides unprecedented capabilities to create deceptively realistic content, be it images, videos, text, or music. Despite the considerable applications for the good, these methods also raise significant concerns about their harmful effects. On social media, generated images can be misused to create seemingly real accounts that spread, for example, political misinformation or spam. While the detection of generated content has been explored extensively in controlled laboratory settings, there has been limited systematic research on the prevalence on social media. In this paper, we provide the first systematic large-scale study of generated profile images on Twitter. To build a reliable detection method, we carefully build a pipeline step by step where we consider different dataset types, pre-filtering, classification, and labeling-assistance methods.

In our dataset of 14 989 3851498938514\,989\,38514 989 385 profile images from Twitter, we classify 7 72377237\,7237 723 profile images as generated. This is 0.052 %times0.052percent0.052\text{\,}\mathrm{\char 37\relax}start_ARG 0.052 end_ARG start_ARG times end_ARG start_ARG % end_ARG of the dataset, showing that generated profile images are notably present on Twitter. Our analysis of the corresponding accounts and their tweets leads to various insights. Fake-image accounts and real-image accounts differ regarding social connections, account activity, account creation time, and availability rate. For example, many fake-image accounts are created in batches and have identical metadata, indicating that they are part of an organized network. The tweet analysis shows that frequently occurring topics are cryptocurrencies, giveaways, content related to pornography and escort services, as well as controversial political discussions.

In summary, our work introduces a detection method for studying generated content on social media. Our analysis underlines that generated images are used as profile images for a wide range of applications. Addressing this threat will require several steps. First, platforms can implement detection algorithms to flag generated content, as Meta has announced lately \parencitecleggLabelingAIgeneratedImages2024. Second, watermarking methods (e.g., \parencitefernandezStableSignatureRooting2023) that integrate a detectable watermarking directly into the generation process can facilitate the detection. Finally, raising more awareness about the existence and impact of generated content will be necessary.

Ethics Statement and Data Availability

Working with real-world data from social media carries ethical and privacy-related risks. We take different measures to reduce these risks. In our study, statistics of real accounts are reported in aggregated form. We show personal information, such as profile images and tweet texts, only for accounts using generated images. However, we acknowledge that we cannot completely avoid the risk of falsely labeling a real image as generated.

To foster the development and evaluation of real-world generated image detectors, we plan to share our labeled image datasets. Moreover, to comply with Twitter’s/𝕏𝕏\mathbb{X}blackboard_X’s terms of service (ToS), we will release the IDs of users and tweets from our in-the-wild dataset. Due to the recent changes to Twitter’s API, we are aware that accessing the full dataset based on the IDs is challenging. We therefore invite researchers to contact us for discussing further uses of the dataset and potential collaborations.

Acknowledgements.
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC 2092 CASA – 390781972. Moreover, this work was supported by the Leibniz Association Competition (P101/2020) as well as by the IFI program of the German Academic Exchange Service (DAAD) funded by the Federal Ministry of Education and Research (BMBF).
\printbibliography

Appendix A Methodology Details

Here, we provide implementation details of our methodology.

A.1. Data Collection

In-the-Wild Dataset 𝒟W𝕏superscriptsubscript𝒟𝑊𝕏\mathcal{D}_{W}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT

We access the Twitter API using the tweepy \parencitetweepy Python package and download the profile image of each tweet’s author from the respective profile_image_url. Table 5 lists all metadata fields we obtain from the API. The second column denotes how many accounts in 𝒟W𝕏superscriptsubscript𝒟𝑊𝕏\mathcal{D}_{W}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT have a value in the respective field.

Table 5. Overview of metadata for accounts in our dataset 𝒟W𝕏superscriptsubscript𝒟𝑊𝕏\mathcal{D}_{W}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT.
Field Count Description
id 14 989 3851498938514\,989\,38514 989 385 Unique user identifier.
username 14 989 3851498938514\,989\,38514 989 385 Username (handle).
name 14 989 3851498938514\,989\,38514 989 385 Name shown in profile (display name).
created_at 14 989 3851498938514\,989\,38514 989 385 Account creation time.
location 7 940 86379408637\,940\,8637 940 863 User-specified location.
description 14 989 3851498938514\,989\,38514 989 385 Profile bio.
url 3 654 74936547493\,654\,7493 654 749 User-specified URL.
profile_image_url 14 989 3851498938514\,989\,38514 989 385 URL to user’s profile image.
public_metrics.followers_count 14 989 3851498938514\,989\,38514 989 385 Number of followers.
public_metrics.following_count 14 989 3851498938514\,989\,38514 989 385 Number of accounts user is following.
public_metrics.tweet_count 14 989 3851498938514\,989\,38514 989 385 Number of tweets.
public_metrics.listed_count 14 989 3851498938514\,989\,38514 989 385 Number of lists containing user.
protected 14 989 3851498938514\,989\,38514 989 385 Whether account is private.
verified 14 989 3851498938514\,989\,38514 989 385 Whether account is verified.
withheld.country_codes 4 19241924\,1924 192 Countries where user is not available.
pinned_tweet_id 6 866 22468662246\,866\,2246 866 224 Identifier of user’s pinned tweet.
entities.url.urls 3 654 74936547493\,654\,7493 654 749 Details about profile website.
entities.description.mentions 1 611 34516113451\,611\,3451 611 345 Details about user mentions in description.
entities.description.urls 780 440780440780\,440780 440 Details about URLs in description.
entities.description.hashtags 1 340 52013405201\,340\,5201 340 520 Details about hashtags in description.
entities.description.cashtags 35 7463574635\,74635 746 Details about cashtags in description.

Labeled Datasets 𝒟Rsubscript𝒟𝑅\mathcal{D}_{R}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT/𝒟Fsubscript𝒟𝐹\mathcal{D}_{F}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT and Variations

We collect 10 0001000010\,00010 000 images from TPDNE by repeatedly querying the website, mimicking a user creating a fake profile. We analogously take the first 10 0001000010\,00010 000 real images from the FFHQ dataset. To avoid an unwanted bias based on image processing, we convert the PNG files from FFHQ to JPEG using the same parameters as TPDNE. Then, to obtain processed images as they would appear on Twitter (𝒟R𝕏superscriptsubscript𝒟𝑅𝕏\mathcal{D}_{R}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT and 𝒟F𝕏superscriptsubscript𝒟𝐹𝕏\mathcal{D}_{F}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT), we upload each image as a profile image and download it. We observed a difference in the image processing between API-based and browser-based uploads. Images uploaded with the API kept their resolution, while images uploaded in the browser were resized to 400×400400400400\times 400400 × 400 pixels. As the majority of in-the-wild images has the resized resolution, we select the browser-based approach and automate the upload using the web automation framework Selenium \parenciteselenium. To obtain the zoomed-in versions (𝒟R𝕏superscriptsubscript𝒟𝑅superscript𝕏\mathcal{D}_{R}^{\mathbb{X}^{\prime}}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT and 𝒟F𝕏superscriptsubscript𝒟𝐹superscript𝕏\mathcal{D}_{F}^{\mathbb{X}^{\prime}}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT), the automated upload procedure is extended by first zooming into each image by a random amount and then moving the image by a random x- and y-offset. We ensure that the image still looks like a plausible profile image at the maximum zoom rate.

A.2. Pre-Filter

BlazeFace \parencitebazarevskyBlazeFace2019 predicts a bounding box as well as the x- and y-positions of six facial landmarks (eyes, ears, mouth, and nose) in normalized coordinates between 0 and 1. If an image contains multiple faces, we select the one with the largest bounding box.

A.3. Classifier

Our architecture and training procedure is adapted from \textcitewangCNNgeneratedImagesAre2020. We follow the common practice of initializing a ResNet-50 \parenciteheDeepResidualLearning2016 with weights from an image classifier trained on ImageNet \parenciterussakovskyImageNetLargeScale2015 and replace the final layer to reflect the binary classification setting. During training, we use a batch size of 32323232 and optimize the model using Adam \parencitekingmaAdamMethodStochastic2015 and binary cross-entropy loss. In the case of 𝒞R𝕏,P𝕏/F𝕏subscript𝒞superscript𝑅𝕏superscript𝑃𝕏superscript𝐹𝕏\mathcal{C}_{R^{\mathbb{X}},P^{\mathbb{X}}/F^{\mathbb{X}}}caligraphic_C start_POSTSUBSCRIPT italic_R start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT / italic_F start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT end_POSTSUBSCRIPT we ensure balanced sampling of real/proxy-labeled real and fake samples. The learning rate is reduced by a factor of 10 if the validation loss does not decrease by 0.0010.0010.0010.001 during 5 epochs. We perform early stop** once the learning rate becomes smaller than 106superscript10610^{-6}10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT. For training 𝒞R/Fsubscript𝒞𝑅𝐹\mathcal{C}_{R/F}caligraphic_C start_POSTSUBSCRIPT italic_R / italic_F end_POSTSUBSCRIPT, the images in 𝒟Rsubscript𝒟𝑅\mathcal{D}_{R}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT and 𝒟Fsubscript𝒟𝐹\mathcal{D}_{F}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT are resized to 400×400400400400\times 400400 × 400 using bilinear interpolation to match the profile image dimensions of Twitter. The training data is augmented using three kinds of perturbations, each applied with probability p=0.1𝑝0.1p=0.1italic_p = 0.1: Gaussian blurring with a kernel size of 9999 and σ𝜎\sigmaitalic_σ uniformly sampled from [0.5,5.0]0.55.0[0.5,5.0][ 0.5 , 5.0 ], JPEG compression with quality uniformly sampled from [30,100]30100[30,100][ 30 , 100 ], and resizing, with scale and aspect ratio uniformly sampled from [0.25,0.75]0.250.75[0.25,0.75][ 0.25 , 0.75 ] and [0.8,1.25]0.81.25[0.8,1.25][ 0.8 , 1.25 ], respectively. During training, we randomly extract crops of size 224×224224224224\times 224224 × 224, while we take the center crop of the same size during validation and testing.

Appendix B Inversion Examples

Figure 11 depicts example images to demonstrate the assisted manual labeling. The left image is the original while the right image is its reconstruction obtained by GAN inversion. For the real image from 𝒟Rsubscript𝒟𝑅\mathcal{D}_{R}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT, we observe that the background is inaccurate and the face is slightly blurred. In contrast, the generated image from 𝒟Fsubscript𝒟𝐹\mathcal{D}_{F}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT can be reconstructed very accurately, including the background.

Refer to caption
(a) Real image from 𝒟Rsubscript𝒟𝑅\mathcal{D}_{R}caligraphic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT.
Refer to caption
(b) Generated image from 𝒟Fsubscript𝒟𝐹\mathcal{D}_{F}caligraphic_D start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT.
Figure 11. Examples of original images (left) and their reconstructions (right).

Appendix C Pre-Trained Detectors

Here we provide details on the three existing pre-trained classifiers we evaluate in Section 7.1. 𝒞Wangsubscript𝒞Wang\mathcal{C}_{\text{Wang}}caligraphic_C start_POSTSUBSCRIPT Wang end_POSTSUBSCRIPT \parencitewangCNNgeneratedImagesAre2020 is the model on that our detector is based on. However, it is trained on a diverse set of images generated by ProGAN \parencitekarrasProgressiveGrowingGANs2018 and corresponding real images from LSUN \parenciteyuLSUNConstructionLargescale2016. We select the version Blur+JPEG (0.1) since the authors report a good performance on images generated by StyleGAN2 \parencitekarrasAnalyzingImprovingImage2020. 𝒞Gragsubscript𝒞Grag\mathcal{C}_{\text{Grag}}caligraphic_C start_POSTSUBSCRIPT Grag end_POSTSUBSCRIPT \parencitegragnanielloAreGANGenerated2021 is an improved version of 𝒞Wangsubscript𝒞Wang\mathcal{C}_{\text{Wang}}caligraphic_C start_POSTSUBSCRIPT Wang end_POSTSUBSCRIPT that avoids downsampling in the first layer of the ResNet-50 \parenciteheDeepResidualLearning2016 backbone to preserve high-frequency artifacts (at the cost of a larger model). Besides training on ProGAN \parencitekarrasProgressiveGrowingGANs2018 images, the authors provide a detector trained on StyleGAN2 \parencitekarrasAnalyzingImprovingImage2020 images, which we select since it should yield the best results on our dataset. Finally, 𝒞Ojhasubscript𝒞Ojha\mathcal{C}_{\text{Ojha}}caligraphic_C start_POSTSUBSCRIPT Ojha end_POSTSUBSCRIPT follows a different approach and leverages the feature space of a pre-trained vision transformer (CLIP-ViT \parencitedosovitskiyImageWorth16x162020, radfordLearningTransferableVisual2021. It uses a single linear layer on top (trained on ProGAN \parencitekarrasProgressiveGrowingGANs2018 images) to predict whether an image is real or fake.

Appendix D Duplicate Image Detection

Despite the trivial access to generated faces using TPDNE, creators of fake account clusters might use the same face for multiple accounts. To identify such duplicates, we need an approach that is robust to subtle differences caused by varying image processing. We adapt the technique used by previous works \parencitegarimellaImagesMisinformationPolitical2020,zannettouOriginsMemesMeans2018,wangUnderstandingUseImages2023 and cluster images based on their perceptual hashes (pHashes). Perceptual image hashing \parencitefaridOverviewPerceptualHashing2021 aims to extract a meaningful representation of an image that does not depend on individual pixel values, but on the perceived content. The algorithm we use \parencitebuchnerImageHash achieves this by deriving 64 bits from the DCT coefficients belonging to the lower frequencies of an image. To obtain groups of duplicate images, we apply the DBSCAN \parenciteesterDensitybasedAlgorithmDiscovering1996 clustering algorithm to our calculated pHashes. We use the implementation from scikit-learn \parencitescikit-learn and set the minimum number of elements to 2. We empirically find that we obtain meaningful clusters by setting the maximum allowed Hamming distance between two pHashes to 3.

In total, we identify 540 groups of duplicated images with an average size of 4.88 images. The distribution of the sizes is given in Figure 12. About half of all groups consist of only two or three duplicated images, while the most frequently used faces appeared in 18 profiles.

Refer to caption
Figure 12. Size distribution of duplicate image clusters.

Appendix E Evaluation of Alignment-Based Detection

In the concurrent work by \textciteyangCharacteristicsPrevalenceFake2024, the authors identify GAN-generated faces on Twitter using a method that is related to our concept of alignment (cf. Section 3.3). They define the GANEyeDistance 𝒢𝒢\mathcal{G}caligraphic_G as the normalized Euclidean distance between the actual and expected location of each eye. They propose to consider an image to be potentially GAN-generated if 𝒢<0.02𝒢0.02\mathcal{G}<0.02caligraphic_G < 0.02. To reach a final decision, they propose to manually classify images based on visual artifacts. While this approach is easy to implement and computationally efficient, we find that is suboptimal regarding (a) the number of false positives (causing a large manual workload) and (b) the number of false negatives (overlooking generated faces that are not aligned).

We test 𝒢𝒢\mathcal{G}caligraphic_G with the suggested threshold on 150 000150000150\,000150 000 randomly chosen images from 𝒟W𝕏superscriptsubscript𝒟𝑊𝕏\mathcal{D}_{W}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT (about 1 %times1percent1\text{\,}\mathrm{\char 37\relax}start_ARG 1 end_ARG start_ARG times end_ARG start_ARG % end_ARG, which yields 730 candidate profiles. For all samples in 𝒟W𝕏superscriptsubscript𝒟𝑊𝕏\mathcal{D}_{W}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT, the estimated number of candidates therefore is 73 0007300073\,00073 000. Manually classifying these images would require an excessive amount of manual effort.

Refer to caption
Figure 13. Examples of fake images that evade alignment-based detection. Below each image we provide its GANEyeDistance 𝒢𝒢\mathcal{G}caligraphic_G. The reference eye position is highlighted. Note that we only display images which we confidently consider to be fake to avoid disclosing real profile images.

On the other hand, we find 440 images in 𝒟W𝕏superscriptsubscript𝒟𝑊𝕏\mathcal{D}_{W}^{\mathbb{X}}caligraphic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT that are detected as fake by 𝒞R𝕏,P𝕏/F𝕏subscript𝒞superscript𝑅𝕏superscript𝑃𝕏superscript𝐹𝕏\mathcal{C}_{R^{\mathbb{X}},P^{\mathbb{X}}/F^{\mathbb{X}}}caligraphic_C start_POSTSUBSCRIPT italic_R start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT / italic_F start_POSTSUPERSCRIPT blackboard_X end_POSTSUPERSCRIPT end_POSTSUBSCRIPT but are overlooked when classifying based on 𝒢𝒢\mathcal{G}caligraphic_G. Naturally, it can be assumed that in this subset our classifier has a higher number of false positives, since most generated images are in fact aligned. Still, after manual inspection, we rate 303 of these images to be definitely or very likely generated. Note that manual labeling is more challenging on these images since we cannot resort to GAN inversion. Figure 13 depicts some examples together with their value of 𝒢𝒢\mathcal{G}caligraphic_G. One can see that zooming in by a small amount is sufficient to cause a misalignment. We consider it probable that malicious accounts do this on purpose to appear more credible and avoid detection based on facial landmarks.