main.bib
- AIGC
- AI-generated content
- FNR
- false negative rate
- FDR
- false discovery rate
- TPR
- true positive rate
- FPR
- false positive rate
- GAN
- generative adversarial network
- TPDNE
- thispersondoesnotexist.com
- TF-IDF
- term frequency–inverse document frequency
AI-Generated Faces in the Real World:
A Large-Scale Case Study of Twitter Profile Images
Abstract.
Recent advances in the field of generative artificial intelligence (AI) have blurred the lines between authentic and machine-generated content, making it almost impossible for humans to distinguish between such media. One notable consequence is the use of AI-generated images for fake profiles on social media. While several types of disinformation campaigns and similar incidents have been reported in the past, a systematic analysis has been lacking.
In this work, we conduct the first large-scale investigation of the prevalence of AI-generated profile pictures on Twitter. We tackle the challenges of a real-world measurement study by carefully integrating various data sources and designing a multi-stage detection pipeline. Our analysis of nearly 15 million Twitter profile pictures shows that were artificially generated, confirming their notable presence on the platform. We comprehensively examine the characteristics of these accounts and their tweet content, and uncover patterns of coordinated inauthentic behavior. The results also reveal several motives, including spamming and political amplification campaigns. Our research reaffirms the need for effective detection and mitigation strategies to cope with the potential negative effects of generative AI in the future.
1. Introduction
The emergence of generative artificial intelligence (AI) has revolutionized content creation, enabling us to produce highly authentic and diverse outputs, such as images, videos, texts, and music that bear a striking resemblance to human-created media. These AI-driven systems have become ubiquitous in various areas of our society and provide reliable support in numerous applications. Among other use cases, they streamline the writing of emails and texts or enhance programming with advanced code completion tools. However, alongside its impressive benefits, generative AI also has the potential for significant detrimental effects. A pressing problem is the ability to generate compellingly realistic but false content, which can be used as a way to spread misinformation, manipulate people, and influence public opinion.
In a significant action in late 2019, Facebook dismantled an extensive network of over 900 accounts, pages, and groups that had collectively spent more than 9 million USD on advertisements promoting Donald Trump, potentially impacting the 2020 US presidential election \parencitenimmoOperationFFSFakeFace2019. A notable feature of this network was the use of AI-generated profile images, possibly taken from the website thispersondoesnotexist.com (TPDNE) which became operational in February 2019. Using NVIDIA’s StyleGAN \parencitekarrasStylebasedGeneratorArchitecture2019, TPDNE generates a new facial image every time the page is refreshed, making it easily accessible to everyone. Since this incident, the use of AI-synthesized faces in disinformation campaigns has been on the rise, likely because such images reduce the risk of detection through reverse image searches \parencitegoldsteinHowDisinformationEvolved2021. Investigations have revealed that many of these deceptive clusters operate with state interests in mind, seeking to bolster specific narratives \parencitenimmoOperationNavalGazing2020,nimmoSpamouflageGoesAmerica,strickAnalysisProchinaPropaganda2021,stanfordinternetobservatoryAnalysisTwitterTakedowns2020 or interfere in the domestic policies of foreign states \parencitenimmoIRAAgainUnlucky2020,graphikateamStepMyParler2020,graphikateamFakeClusterBoosts2021. Additionally, there are efforts to influence public opinion \parencitestanfordinternetobservatoryReplyguysGoHunting2020,strickWestPapuaNew2020 or establish connections with unsuspecting social media users \parencitevincentSpyReportedlyUsed2019,goldsteinResearchNoteThis2022. The FBI and Europol have expressed concerns that the trend of using AI-generated content in cybercrime and foreign influence operations is expected to grow steadily \parenciteMaliciousActorsAlmost2021,Europol2024. Given these examples, it is essential to understand the detection possibility, prevalence, and usage of AI-generated images in the wild instead of a lab setting.
In this work, we tackle this challenge by concentrating on the phenomenon of AI-generated images in social media. At the time of writing, it is becoming increasingly difficult for humans to differentiate these machine-generated media from authentic photographs, as evidenced by recent studies \parencitehulzeboschDetectingCNNgeneratedFacial2020, tucciarelliRealnessPeopleWho2020, nightingaleSyntheticFacesHow2021, shenStudyHumanPerception2021, lagoMoreRealReal2022, nightingaleAIsynthesizedFacesAre2022, frankRepresentativeStudyHuman2023. Although the detection of generated images has been explored extensively in lab settings, there is a surprising lack of comprehensive research addressing their identification and widespread use on social media platforms in real-world contexts. In this paper, we provide the first systematic and large-scale study of AI-generated profile images on Twitter. Our research is founded on three main pillars.
First, we develop a fast and effective detection pipeline tailored to the identification of AI-generated images in real-world scenarios. This task presents unique challenges, including the lack of a definitive ground truth and the diversity of possible image manipulations. To solve these problems, we carefully design a detection pipeline step by step. We consider different dataset types, apply a pre-filter to discard images with too small or no faces, and adapt a state-of-the-art classification model specifically targeting synthetic profile images on Twitter. As mentioned above, observations suggest that the majority of AI-generated profile images originate from TPDNE, which is why we tailor our detection pipeline to this kind of fake faces. Finally, we integrate various tools that help with the manual labeling that is required to estimate error rates on unlabeled in-the-wild data. We study each component of our system in controlled setups and show that the pipeline is capable of accurately recognizing AI-generated images.
Second, we analyze a large collection of Twitter profile pictures to determine how prevalent AI-generated profile pictures are on the platform. We identify accounts that use such images, which corresponds to a prevalence rate of . This result indicates a notable presence of generated profile images on Twitter. We also assess the accuracy and reliability of our findings by estimating error rates. We estimate the false negative rate (FNR)—the fraction of mislabeled fake images—of our approach to lie between and , and the false discovery rate (FDR)—the fraction of real images among all images classified as fake—to be . The results suggest a low error rate of our method.
Third, we contextualize the use of AI-generated profile pictures on Twitter by examining the corresponding accounts and their tweets. Our results show clear differences between the two types of accounts: accounts with fake images tend to have lower social engagement as well as fewer followers and followed accounts. Despite the generally lower activity, some accounts with fake images are very active, suggesting possible involvement in spam campaigns. In addition, fake accounts are often newer and are suspended more frequently by Twitter, indicating inauthentic behavior. A significant portion of accounts was created in bulk shortly before our data collection, which is a common pattern for accounts created for message amplification, disinformation campaigns, or similar disruptive activity. This impression is confirmed by our textual analysis of the accounts’ tweets. We identify large clusters spamming very similar contents, frequently referring to giveaways, cryptocurrencies, and pornography. Notably, we also observe accounts that engage in contentious or political topics, such as the war in Ukraine, debates on COVID and vaccinations, and election-related discourse.
Contributions
We make the following key contributions:
-
(1)
Detection Pipeline. We propose a multi-step pipeline for detecting AI-generated profile images on social media. We evaluate each stage in a controlled setup and demonstrate the pipeline’s suitability for real-world settings.
-
(2)
Prevalence Study on Twitter. We apply our pipeline on authentic profile images to systematically study the prevalence of AI-generated faces on Twitter. We identify accounts with generated profile images, corresponding to a prevalence rate of .
-
(3)
Account and Tweet Analysis. We analyze the user metrics and tweets of accounts using AI-generated profile images to learn more about their intended purpose. We identify prevalent topics and find a significant number of accounts to apparently participate in coordinated inauthentic behavior.
2. Background
We start by providing a short primer on the creation and detection of AI-generated images.
AI-Generated Content (AIGC)
AIGC, sometimes also referred to as “deepfakes”, is content that appears authentic to humans but is synthesized or altered using a deep neural network. It is most prominently associated with manipulated videos in which the face of a person is replaced with a different one \parencitemirskyCreationDetectionDeepfakes2021, but also encompasses other types of media including images, audio, and text. While AIGC offers great creative potential, it is also used for malicious purposes, including defamatory images and videos \parencitecoleAIassistedFakePorn2017, voice cloning \parencitegaoVoiceImpersonationUsing2018,damianiVoiceDeepfakeWas, fake customer reviews \parenciteyaoAutomatedCrowdturfingAttacks2017, and machine-generated posts on social media \parencitefagniTweepFakeDetectingDeepfake2021,goldsteinGenerativeLanguageModels2023.
Image Synthesis
Learning a probability distribution from samples in order to generate novel samples is a longstanding challenge, especially in the high-dimensional image domain. Besides variational autoencoders (VAEs) \parencitekingmaAutoencodingVariationalBayes2014 and autoregressive models \parenciteoordPixelRecurrentNeural2016a,vandenoordConditionalImageGeneration2016, generative adversarial networks \parencitegoodfellowGenerativeAdversarialNets2014 have proven to be effective in synthesizing high-quality images \parencitezhuUnpairedImagetoimageTranslation2017,choiStarGANUnifiedGenerative2018,karrasProgressiveGrowingGANs2018,karrasStylebasedGeneratorArchitecture2019,karrasAnalyzingImprovingImage2020,karrasAliasFreeGenerativeAdversarial2021,sauerProjectedGANsConverge2021,kangScalingGANsTexttoimage2023. The StyleGAN family \parencitekarrasStylebasedGeneratorArchitecture2019,karrasAnalyzingImprovingImage2020,karrasAliasFreeGenerativeAdversarial2021 received special attention due to their ability to generate faces that are practically indistinguishable from real ones \parencitenightingaleAIsynthesizedFacesAre2022. Recently, it has been shown that diffusion models (DMs) \parencitesohl-dicksteinDeepUnsupervisedLearning2015,hoDenoisingDiffusionProbabilistic2020,dhariwalDiffusionModelsBeat2021 are able to match and even surpass the visual quality of GAN-generated images.
Generated Image Detection
There is a continuing arms race for effective detection techniques and newer generations of image synthesis algorithms. Broadly speaking, generated image detection techniques can be divided into two categories: methods that rely on handcrafted features and learning-based methods. Methods from the first category either exploit visual defects (e.g., facial inconsistencies \parencitematernExploitingVisualArtifacts2019, impossible reflections \parencitehuExposingGANGeneratedFaces2021, irregular pupil shapes \parenciteguoEyesTellAll2022) or “invisible” characteristics such as frequency artifacts \parencitezhangDetectingSimulatingArtifacts2019,durallWatchYourUpconvolution2020,frankLeveragingFrequencyAnalysis2020,chandrasegaranCloserLookFourier2021,schwarzFrequencyBiasGenerative2021,chenSSDGANMeasuringRealness2021, pixel statistics \parencitenatarajDetectingGANGenerated2019,mccloskeyDetectingGANgeneratedImagery2019, or model-specific properties \parencitemarraGANsLeaveArtificial2019,yuAttributingFakeImages2019,rickerAEROBLADE2024. Learning-based methods, on the other hand, use neural networks to learn a suitable feature representation to distinguish fake from real images \parencitemarraDetectionGANgeneratedFake2018,chaiWhatMakesFake2020,hulzeboschDetectingCNNgeneratedFacial2020,wangCNNgeneratedImagesAre2020,gragnanielloAreGANGenerated2021,cozzolinoUniversalGANImage2021,ojhaUniversalFakeImage2023,corviDetectionSyntheticImages2023.
3. Methodology
A large-scale study on generated images in the wild comes with multiple challenges. First, we do not know the ground truth. As a result, it is difficult to estimate the amount of overlooked generated images (false negatives) and to be sure that an image detected as generated is actually generated (precision). Finally, studying millions of images comes with a computational overhead so that the detection method has to be efficient, too. We discuss further challenges and limitations of our study in Section 8. To deal with all these challenges, we carefully design a multi-step detection pipeline. The following is a step-by-step description of this pipeline. Note that while the presented approach is applied to Twitter, our method can be adapted to any other social network. We provide implementation details in Appendix A.
3.1. Data Collection
We describe the four types of datasets that we use for studying generated images, with Twitter being our use case. Table 1 summarizes our notation.
In-The-Wild Dataset
To estimate the prevalence of generated images on a social network, it is important to obtain a mostly unconditional sample. In the case of Twitter, this can be achieved by using the API endpoint that provides real-time access to a random subset of all publicly posted tweets. We download each author’s profile image together with their profile metadata (cf. Section A.1 for an overview). Note that this approach only enables us to obtain profile images from users who write posts during the data collection period. Additionally, we omit users who have not set a profile image, that is, who are using Twitter’s default profile image. From March 7 to March 15 2023, we collected profile images.
Labeled Datasets / and Variations
We continue with labeled datasets of fake and real images which can be used to train a detector. As discussed in Section 1, existing observations suggest that the vast majority of generated profile images on Twitter are taken from TPDNE, which generates images with StyleGAN2 \parencitekarrasAnalyzingImprovingImage2020 trained on the FFHQ \parencitekarrasStylebasedGeneratorArchitecture2019 dataset111When published in 2019, TPDNE used the original StyleGAN \parencitekarrasStylebasedGeneratorArchitecture2019, but switched to StyleGAN2 \parencitekarrasAnalyzingImprovingImage2020 shortly after its release.. We therefore decide to focus on this specific kind of fake faces and use images from TPDNE as our fake-labeled dataset (denoted by ) and correspondingly images from FFHQ as our real-labeled dataset (denoted by ). We discuss this limitation of focusing on TPDNE in Section 8. As prior work shows that processing operations like resizing and compression can affect the detection \parenciteparmarOnAliasedResizing2022, mandelliTrainingCNNsPresence2020, we consider two dataset variations:
-
•
and . To obtain profile images with the social network’s processing steps, we adapt the approach from \textciteboatoTrueFaceDatasetDetection2022. We upload both and to Twitter, set each image as profile image, and then download all images again. We denote these processed images by and .
-
•
and . We additionally simulate a user which zooms into the profile image during the upload, as it is common for social media platforms. We denote these images by and , respectively.
We confirm in Section 7.2 that considering the preprocessing indeed improves the detection performance under realistic conditions.
Proxy-Labeled Real Dataset
Social media platforms often have very popular users with a lot of followers. These popular users are rather unlikely to use deceptive fake images. Hence, we can build a proxy-labeled dataset with presumably real images. In particular, we select profile images from the accounts in with the highest numbers of followers that also pass our pre-filter (which is presented in the next section). We denote the so-created proxy-labeled dataset of real profile images by .
Documented Fakes Dataset
Finally, there are documented cases of generated profile images that were discovered manually. For example, blog posts regularly report such images when analyzing inauthentic Twitter accounts \parencitenortenoConspiradorNortenoSubstack2024. These cases can be used to build a labeled dataset of fake images in the wild, which we denote by . Such a dataset is not free of bias, but provides a good means to finally check the performance of our classifier on an independent source. For our study, we use a dataset of generated Twitter profile images that were manually collected between November 2022 and May 2023 \parenciteyangCharacteristicsPrevalenceFake2024.
Symbol | Description |
---|---|
Unlabeled dataset of Twitter profile images. | |
Labeled dataset of fake images. | |
Labeled dataset of fake images uploaded as profile image and downloaded afterward. | |
Version of where images are zoomed into during upload. | |
Labeled dataset of real images. | |
Labeled dataset of real images uploaded as profile image and downloaded afterward. | |
Version of where images are zoomed into during upload. | |
Proxy-labeled dataset of supposedly real Twitter profile images. | |
Labeled dataset of documented fake Twitter profile images. |
3.2. Detection
Equipped with these different datasets, we can proceed with the detection of generated profile images. Here, we propose a two-stage procedure to improve the accuracy and the efficiency.
Pre-Filter
We start with a pre-filter to discard irrelevant samples. In our case, we can discard images without any face or where the face is too small. We use the efficient BlazeFace \parencitebazarevskyBlazeFace2019 face detector to detect faces and locate facial landmarks. An image passes if at least one face is detected and the Euclidean distance between the coordinates of both eyes is greater or equal to . The pre-filter serves two purposes: First, the overall computational complexity decreases by reducing the number of analyzed candidates in the subsequent, more demanding detection stage. Second, the detection stage is trained on facial images, so that other types of profile images, such as logos or monochrome images, could be wrongly classified as fake. Filtering irrelevant images can therefore decrease the false positive rate (FPR).
Classifier
To automatically label a profile image as real or fake, we use a state-of-the-art CNN detector based on ResNet-50 \parenciteheDeepResidualLearning2016. Previous work \parencitewangCNNgeneratedImagesAre2020, mandelliTrainingCNNsPresence2020, cozzolinoSpoCSpoofingCamera2021, cozzolinoUniversalGANImage2021, gragnanielloAreGANGenerated2021 has demonstrated that this model is able to effectively distinguish real from generated images and that it provides good generalization capabilities. We initially attempted to use pre-trained fake image detectors, however, we found that the heavy pre-processing performed by Twitter makes it necessary to train our own detector (cf. Section 7.1). In particular, we train on the combination of and for real images, and for fake images. The resulting final classifier is denoted by . Note that we experiment with using other dataset variations to train a classifier in our ablation study in Section 7.2. Yet, using processed real, fake, and proxy-labeled real images provides the highest performance for processed and zoomed inputs.
3.3. Assistance for Manual Labeling
To estimate error rates of our detection scheme on unlabeled in-the-wild data, it is necessary to manually label these images as real or fake. As generated images have reached a level of quality which makes them almost indistinguishable from real images \parencitehulzeboschDetectingCNNgeneratedFacial2020, tucciarelliRealnessPeopleWho2020, nightingaleSyntheticFacesHow2021, shenStudyHumanPerception2021, lagoMoreRealReal2022, nightingaleAIsynthesizedFacesAre2022, frankRepresentativeStudyHuman2023, we use two tools to facilitate this process.
Alignment
Faces generated by StyleGAN2 \parencitekarrasAnalyzingImprovingImage2020 are characterized by being almost perfectly aligned with respect to their facial landmarks, caused by the alignment of the training dataset FFHQ. By superimposing multiple images, this characteristic has been leveraged to visually identify clusters of fake accounts in social networks \parencitenimmoOperationNavalGazing2020, nimmoIRAAgainUnlucky2020, graphikateamStepMyParler2020, graphikateamFakeClusterBoosts2021, stanfordinternetobservatoryReplyguysGoHunting2020, strickAnalysisProchinaPropaganda2021, goldsteinResearchNoteThis2022. We automate this manual process by extracting facial landmarks with BlazeFace \parencitebazarevskyBlazeFace2019 and computing the deviation from a reference. For each landmark (x- and y-coordinates of eyes, ears, mouth, and nose), we compute its mean and standard deviation over a reference dataset. In our study, we use the training subset of as reference. We define an image as being aligned, if the condition
(1) |
holds, where are the landmarks extracted from the image and controls the maximum deviation from the reference. We set . During our evaluation in Section 4, we find that this is the lowest value at which all generated images in the validation set of are aligned. While a close alignment hints towards a face generated by StyleGAN2, it is ineffective if the image has been cropped or geometrically transformed.
Inversion
Additionally, we leverage GAN inversion \parencitexiaGANInversionSurvey2023 as an assistance tool. For a given input image, this method finds the latent code which reconstructs the original input when passed through the generator. We use the provided implementation by \textcitekarrasAnalyzingImprovingImage2020 to invert images using StyleGAN2. Previous work has shown that generated images can be reconstructed more successfully than real images \parencitealbrightSourceGeneratorAttribution2019,karrasAnalyzingImprovingImage2020,pasquiniIdentifyingSyntheticFaces2023 (we provide a visual example in Appendix B). Note that inversion also relies on facial alignment. If an adversary uses a cropped version of a fake face, the inversion result will be distorted. We therefore only use inversion as labeling assistance if the image is aligned.
4. Evaluation
In this section, we proceed with an evaluation of our proposed methodology in a controlled setting with labeled data. This allows us to verify the components of our detection pipeline before studying generated faces in the wild on Twitter in Section 5 and analyzing the corresponding profiles and tweets in Section 6.
Dataset Splits
We randomly split , , and into train, validation, and test images, respectively. and are split in the same manner. As we use and only for evaluation, they only contain the corresponding test images, respectively.
Pre-Filter
We start with the pre-filter that should discard irrelevant images, but keep potentially generated images. An image passes if a face (a) is detected and (b) has a sufficient size (see Section 3.2). In the following, we apply to the test set from and to randomly sampled images from . For the latter subset, we manually label each image whether it (partly) contains a human face. Our experiment here has three goals: we want to verify that all generated images from pass , confirm that the face detector works reliably on the in-the-wild images from , and finally get an estimate of the number of kept in-the-wild images passed to the next stage.
Table 2 shows the results for our evaluation of . All generated images from pass , fulfilling our first goal. Among the sampled in-the-wild images from with a face, the face detector correctly identifies . We manually look through the undetected faces. In most cases, the face is either very small, obstructed (e.g., by masks or smartphones), or partly outside the frame. The face in these images is not prevalent, so that we consider it acceptable to skip them. For in-the-wild images without a face, the face detector mistakenly locates a face in of the cases. We manually inspect the mislabeled images. The vast majority contains faces, but they are drawn, digitally created, or belong to animals or statues. Only very few detections are obviously “wrong”, such as images with Twitter’s former default profile image. Since these images are just passed to the next stage, having some false positives is not critical. Based on this analysis, we can conclude that the face detector reliably works, fulfilling our second goal. Finally, we measure how many images from additionally pass the size check and therefore . In only of the face images and of the non-face images, the face is considered large enough, considerably reducing the number of images passed to the next stage. Overall, we conclude that our pre-filter allows us to skip irrelevant images efficiently, without mistakenly discarding generated faces.
With Face | Without Face | |||
---|---|---|---|---|
Dataset | Face Detected | Size Check | Face Detected | Size Check |
— | — | |||
Classification
Next, we verify that our classifier is capable of spotting generated images in realistic settings. We evaluate the performance of our classifier under three conditions: (a) processed images ( vs. ), (b) zoomed images ( vs. ), and (c) proxy-labeled real and fake images ( vs. ). We use the test set from each dataset.
Figure 1 shows the respective ROC curves. Our classifier has an almost perfect detection rate with an AUC value close to . Note that the setup on zoomed data is slightly more challenging, because there are no examples of zoomed images in the detector’s training data. Still, the error rate remains very small. Due to the strong class imbalance on real Twitter data, a small error rate is required to avoid an excessive amount of false positives.
Assistance Tools
We finally verify our methods that allow us to better label images for the error-rate estimation later.
Alignment. Using the test set from , we confirm that with , all fake images are correctly labeled as being aligned. From the randomly sampled images from , only 35 are aligned.
Inversion. We first verify that generated images can be inverted more accurately than real images. We invert 500 images from and , respectively, and compute the LPIPS \parencitezhangTheUnreasonableEffectiveness2018 distance between original and reconstructed images. This distance metric measures the perceptual similarity between two images and has been previously used to estimate the reconstruction quality \parencitekarrasAnalyzingImprovingImage2020. The histograms in Figure 2 show that the reconstructions from are perceptually more similar to the originals compared to the reconstructions from . A classification based on the LPIPS scores results in an AUC of 0.97.
As second experiment, we check that inversion is helpful for manual labeling. We divide the images (500 real, 500 fake) into 900 training and 100 test images. For each image we construct a side-by-side view with the original, its reconstruction obtained by inversion, and the distance measured in LPIPS and MSE (cf. Appendix B). Using the training set, one annotator practices the manual classification. We then evaluate the performance based on the held-out test images. 99 out of 100 images are correctly assigned, demonstrating a feasible manual inspection.
We emphasize that images from are very similar to images from . In contrast, most in-the-wild profile images are visually different, leading to even worse reconstructions (despite being aligned) and thus to comparatively high LPIPS values. Hence, we expect the actual manual labeling process to be easier than in the controlled setting.
Summary
Our evaluation indicates a valid detection pipeline. The pre-filter allows skip** irrelevant images while the classifier allows detecting generated images. The assistance tools can help with the manual labeling process.
5. Detecting Generated Images In the Wild
Equipped with a valid detection pipeline, we can now explore the prevalence of generated images on Twitter. To this end, we first need to calibrate the detection to the real-world setup before we can present the final results.
Manual Labeling
To begin with, we have to label a subset of . First, this allows us to get a detection threshold, so that we are able to classify individual images as real or generated. Note that in the controlled setup before, we evaluated the overall performance of with the AUC metric that takes into account all possible thresholds and thus does not require picking a specific value. Second, a separate labeled set is necessary to estimate error rates.
Unfortunately, manual labeling of all samples in is unfeasible due to the sheer volume of samples within the dataset. Thus, we resort to a random subset, containing of all samples. We then sort these images based on their score (from ) from low (real) to high (fake) and select the top images that pass for manual labeling. We acknowledge that choosing the subset based on the classifier that we are trying to evaluate introduces an unwanted bias: there could be fake images with very low scores that are overlooked. However, we argue that this approach strikes a balance between practicability and a sound estimation. Selecting the subset by pure chance would require an enormous amount of manual labeling to gather a sufficient number of fake images. Moreover, the scores of our subset range from 1.0 to 0.33. From the test set of , only 3 out of images get a score below 0.33. We therefore assume that only a very small number of false negatives is potentially overlooked.
We carefully inspect each image and, if it is aligned, its reconstruction from GAN inversion. We label an image as real if the framing and pose do not match with that of , if it contains a complex and meaningful background, or if the reconstruction deviates significantly from the original. In contrast, images are labeled as fake if they contain diffuse backgrounds, asymmetries (eyes, earrings), unnatural clothing, color artifacts, and/or an almost perfect reconstruction. By doing so, we obtain 185 images labeled as “Real”, 725 images labeled as “Fake”, and 90 images labeled as “Unsure”. Most images with label “Unsure” resemble images from TPDNE, but do not contain clear artifacts or were strongly edited. We also assigned this label if we suspect that an image was generated using a different kind of generative model. We randomly split the 910 images labeled as “Real” or “Fake” into a validation set (for calibrating the threshold) and a test set (for estimating the error rates) of equal size, maintaining the label ratio in both splits.
Choosing a Threshold
Due to the high imbalance between real and generated images in , choosing an appropriate threshold is not trivial. A too high threshold leads to many overlooked fake images (low recall), while a too low threshold leads to many real images classified as fake (low precision). As recall and precision are equally relevant in our setting, we follow the common practice to select the threshold based on the F1-score (based on the validation set). The best F1-score (0.9832) is achieved using a threshold of 0.9899361. Such a high threshold might appear counterintuitive. Yet, Figure 3 shows that most fake images are confidently classified as fake—with scores very close or equal to 1. The scores of real images, however, have greater variation. Thus, choosing a relatively high threshold gives the best performance. Note that the scores of real images in this subset are not representative for all real images, since we purposely selected images with high scores.
Estimating Error Rates
Equipped with our selected threshold, we can now estimate the error rates of our detector. We start with the test set of our manually labeled subset and calculate the FNR and FDR here. The FNR, the fraction of mislabeled fake images, is . The FDR, the fraction of real images among all images classified as fake, is .
To understand the errors, we take a closer look on the misclassified images. Figure 4 shows the false negatives within the test set together with their scores. Although the majority actually gets a high score and is only classified as real due to the high threshold, three images have a considerably lower score. We cannot identify a pattern which causes their misclassification. Neither do we observe any characteristics that would explain the real images classified as fake (FDR). As these profile images are real users, we cannot provide visual examples here.
In addition, we can leverage our independent dataset of fake profile images that were spotted by users on the web before. We obtain a low FNR of , that is, out of fake profile images are incorrectly labeled as real. All images pass the prefilter . All images in are aligned according to our definition. False negatives therefore only depend on the classifier’s score.
Overall, we can confirm the performance of our detector on two different test sets. While the errors rate are not zero, they are small enough to draw conclusions in our analysis in the next section.
Prevalence of Fake Profiles on Twitter
We are now ready for the final step. We apply our detection scheme on the entire in-the-wild dataset . The pre-filter discards images, reducing the number of images by . Next, using our detector, we classify profile images as fake. This is of the full dataset. In the next section, we analyze the profiles behind these images and their tweets in more detail.
6. Analysis
Our goal in this section is to understand the context where the generated profile images are used. To this end, we first perform an analysis of the accounts behind these images (Section 6.1). Then, we thoroughly analyze the content of the tweets that were sent from these accounts (Section 6.2). For simplicity, we refer to accounts using generated profile images as “fake-image accounts” as opposed to “real-image accounts” in the following.
6.1. User Metrics
We begin by analyzing the difference between fake-image and real-image accounts regarding social connections, account activity, as well as account creation and status.
Social Connections
On Twitter, social interactions are primarily measured in the number of followers an account has and the number of other accounts it follows. Figures 5(a) and 5(b) visualize the distribution of these metrics for real- and fake-image accounts at the time of data collection. We find that fake-image accounts have fewer followers (mean: 393.35, median: 60) compared to real-image accounts (mean: , median: 165) in our dataset. () of all fake-image accounts have or fewer followers and () have exactly zero followers. We notice that fake-image accounts () have exactly 106 followers. Our content analysis in Section 6.2 reveals that these accounts belong to a large cluster of fake accounts involved in coordinated inauthentic behavior.
We find that fake-image accounts also follow fewer other accounts (mean: 283.18, median: 21) compared to real-image accounts (mean: 759.83, median: 262). Interestingly, fake-image accounts () follow exactly two other accounts. In contrast to the number of followers, a relatively small number of fake-image accounts (, ) follows exactly zero other accounts.
Activity
Figure 5(c) shows that fake-image accounts do participate in Twitter based on the number of tweets. Yet, they are overall less active than real-image accounts. On average, fake-image accounts posted (median: 112) tweets, as opposed to (median: 3450) tweets from real-image accounts. () of all fake-image accounts have 10 or fewer tweets. In addition, Figure 5(d) shows the average number of tweets per day, calculated by dividing the total number of tweets by the number of days the account exists. Based on the median, fake-image accounts are still less active than real-image accounts (0.95 vs. 3.7 tweets per day). However, a large fraction of fake-image accounts posts exceptionally many tweets per day, causing a higher mean (19.96 vs. 13.56 tweets per day). In particular, there are 266 fake-image accounts () that submitted more than 100 tweets per day.
Account Creation and Status
Figure 6(a) compares the times of account creation. Fake-image accounts are considerably “younger”, with more than half of them () being created in 2023 (note that our data collection happened in March 2023). In contrast, only of real-image accounts have been created in this period.
In addition to the creation date, we also examine the account status after a certain period of time. We checked the status of all fake-image accounts nine months after data collection by querying the respective profile page. As a reference, we did the same for an equal number of randomly sampled real-image accounts. Accounts can be either alive, deactivated (by the user), or suspended (by Twitter). Figure 7 illustrates that more than half of the fake-image accounts () have been suspended. In contrast, only of real-image accounts in the reference set have been suspended. The high number of suspended fake-image accounts suggests that they were violating Twitter’s rules.
In Figures 6(b) and 6(c), we analyze the account creation of fake-image accounts given their status. We observe various suspended accounts that were created in bulk just shortly before our data collection, especially in the middle of February. Note that we do not know when these accounts were suspended, so that we cannot determine the effective lifetime of these accounts.
Takeaways
Our analysis shows that real and fake-image accounts notably differ. Fake-image accounts have fewer social interactions, both regarding the number of followers and the number of accounts they follow. While these metrics are distributed evenly for real-image accounts, we observe patterns with fake-image accounts. There are large groups with identical values, indicating an orchestrated network of inauthentic users. Moreover, fake-image accounts are not passive, they considerably participate in Twitter based on the number of tweets. Although they are in general less active than real-image accounts, there are several fake-image accounts that post very frequently, hinting towards spamming attacks. Finally, fake-image accounts have a more limited lifetime. They are usually created more recently than real-image accounts, and they are also disproportionately often suspended by Twitter. This suggests inauthentic behavior. Moreover, a substantial number was created in bulk just before our data collection period started. This bulk creation (or batch creation) is a common pattern for inauthentic behavior, used, for example, to amplify messages or to participate in spamming or trolling activities \parencitegurajalaFakeTwitterAccounts2015,ferraraTwitterSpamFalse2022.
6.2. Content Analysis
To evaluate the purpose of the identified fake-image accounts, we proceed to analyze their tweets (original as well as retweets) posted in 2023. We utilize data collected in the context of a large-scale Twitter stream archiving effort \parencitefafalios2018tweetskb based on Twitter’s sampled stream (the same we used to create ). This allows us to access information about the activity of the profiles before and after the profile collection week (until Twitter restricted access to its API in June 2023). In total, we have access to tweets from the fake-image accounts in our collection.
We begin our analysis with the language and availability. The upper half of Figure 8 shows a breakdown of the number of tweets per language. Using the accounts’ status nine months after our data collection (cf. Section 6.1), we can also calculate the fraction of unavailable tweets. Overall, of all tweets were unavailable after nine months. Interestingly, Turkish and Arabic stand out as languages with significantly higher unavailability rates ( and , respectively) than other languages. The number of unique accounts that created the tweets in each language are reported in the lower half of Figure 8. It shows that Turkish tweets, for instance, stem from a relatively small number of users.
We proceed with a textual analysis. To identify structural patterns, we employ state-of-the-art sentence embeddings \parencitereimers-2019-sentence-bert to group the tweet texts into semantically related clusters. We utilize the cosine similarity between the sentence embeddings to determine cluster belonging. A new observation (tweet) is assigned to an existing cluster if a certain similarity threshold (in our case 0.6) is reached. Otherwise a new cluster will be generated. Furthermore, we limit our analysis on clusters that exhibit a minimum cluster size of 50 (i.e., at least 50 tweets should be in one cluster). This approach allows us to identify dominant trends. Note that it does not provide a distribution of topics, because not every tweet is assigned to a cluster. For the purpose of visualization, we use UMAP \parencitemcinnesUMAPUniformManifold2020 as a dimensionality reduction technique to generate a two-dimensional representation of the clustering outcome (cf. Figure 9). For each cluster, we calculate the class-based term frequency–inverse document frequency (TF-IDF) terms to determine representative class tokens. In a subsequent step, we conduct a manual qualitative review of all clusters to identify and describe common themes, which are detailed in the following paragraphs. We describe the general cluster contents and provide representative examples for important topics. We also analyze the metadata of accounts within cluster and report unusual characteristics.
English (Unavailable tweets)
The clustering for English content posted by users that are not on the platform anymore reveals a notable pattern: we observe a single, extremely large cluster that encompasses of all unavailable English tweets. Despite the variability of the actual content, these tweets all share a common structure. Each tweet begins by mentioning a specific Twitter user, followed by a short sequence of English terms. Interestingly, these sequences do not form logical sentences, so they are neither semantically nor syntactically correct. Each of these sentences is then followed by a specific Chinese hashtag that can be translated to: “This is really useful”. Unfortunately, we can only speculate about their purpose. Our hypothesis is that the embedded hyperlinks within the tweets may have directed users to malicious external websites. As the links are no longer functional, we cannot verify this hypothesis.
The accounts’ metadata corroborate the assumption that the accounts within this cluster were part of an organized network. All but three were created between February 16 and February 20, which is consistent with our observations in Figure 6(c). Up to 754 accounts were created on a single day. We also find that this cluster contributes to the large number of accounts with identical social connections (cf. Figure 5(a) and Figure 5(b)). have exactly 106 followers and follow exactly two other accounts. The usernames (Twitter handles) appear to be constructed from a list of German-sounding first names and last names (or initials), and optionally one or multiple digits (e.g., @GuntherForstner86). of all accounts have the same display name that can be translated to “Noon Namshi Sivvi discount code is strong and effective” (Noon and Namshi are e-commerce platforms operating in the Arabic region). These accounts also have their location set to “KSA” (Kingdom of Saudi Arabia). Moreover, the accounts contain nonsense descriptions like “Personal west service street laugh small.”. We hypothesize that these were automatically generated or translated. Again, we can only speculate about the reason, especially about the mixed use of English, German, Arabic, and Chinese language. We also notice that of all accounts in this cluster use profile images that are duplicates within our dataset of fake-image accounts. Appendix D elaborates our method for identifying duplicate images.
The remaining clusters mainly focus on giveaways, often related to cryptocurrencies, with tweets like
$50 (2 winners x $25) 24 hours - like, follow)
i will #giveaway 100 usdt worth of $loop as we cele-brate our 10k milestone
or the promotion of illegal content such as links to broadcasting streams of soccer matches, e.g.,
live stream arouca vs benfica live [link]
Another trend is the distribution of links to websites and Telegram groups containing explicit content, e.g.,
follow for more [link]
English (Available tweets)
The clustering of tweets from users who were still active after nine months reveals similarities and differences. Figure 9 depicts a visual representation of the top clusters with their representative text tokens. A significant portion of all clusters is again related to various forms of cryptocurrency, stocks, and giveaways, e.g.,
drop your #tezos #nft if you need it sold!
15000$ in $eth — 5 lucky winners!
Additionally, we find a significant share of adult content/porn related clusters, actively advertising explicit content, also through dedicated patterns like
beautiful/charming/etc. [profile of porn actress] [link]
Compared to inactive users, we observe that available accounts also engage in discussions on contentious or political issues. These include, for example, the war in Ukraine, election-related discourse, and debates on COVID and vaccinations:
welcome to nazi ukraine #russia
desantis racks up wins while trump, potential 2024 opponents take swipes at florida governor
albos crocodile tears: watch this video, that the main-stream media refuses to show.
someone needs to find an antidote for the vaxxx
Turkish (Unavailable tweets)
For the Turkish accounts, we restrict our analysis to content posted by users who have been removed, since this is the majority of the dataset. Our findings indicate that all of this content is related to pornography or escort services. The primary distinction among the clusters are the cities mentioned within the posts. Most tweets also contain links to other websites, which are no longer functional. Upon examining the metadata of all 932 accounts, we again identify the systematic pattern for usernames that we already observed in the large cluster of English tweets. However, first and last names appear to be of Turkish descent. Moreover, almost all accounts have their location set to a real Turkish city. of all accounts again use duplicate profile images and were created within one month. These findings again indicate that at least some systematic approach (automatic or semi-automatic) is used to generate the accounts.
Arabic (Unavailable tweets)
For Arabic tweets, we again only consider accounts that have been suspended. All clustered tweets appear to be related to literature, with individual clusters being characterized by mentions of certain authors, countries, or topics—all related to the Arabic region. These tweets make up of all unavailable Arabic tweets. Surprisingly, the tweets share a common structure with those from the large cluster of English tweets: they contain the specific Chinese hashtag, an external link, and an incoherent sentence. Our metadata analysis suggests that the accounts indeed belong to the same cluster, despite the different language. Almost all accounts were created between February 16 and February 20, with 892 being created on a single day. We observe the same anomalies regarding the (German) usernames, locations, descriptions, and social connections. Given the book-related content and the frequently occurring username that promotes a discount code, we hypothesize that the external links might have referred to the respective shop** platforms.
Takeaways
Our content analysis reveals that English, Turkish, and Arabic are the dominant languages used by the fake-image accounts in our collection. We identify large networks of fake-image accounts that were probably automatically created and that participated in large-scale spamming attacks. We observe recurring patterns as part of the automation. Accounts are created in bulk. Tweets, usernames, locations, descriptions, and social connections follow a systematic pattern. Multiple accounts within a network share the same profile image. Furthermore, our analysis shows that frequently occurring topics are cryptocurrencies, giveaways, and content related to pornography and escort services. Fake-image accounts also participate in controversial political discussions. These findings align with prior analyses of inauthentic content on Twitter \parenciteratkiewiczDetectingTrackingPolitical2011,cresciDecadeSocialBot2020,nizzoliChartingLandscapeOnline2020,pfefferJustAnotherDay2023.
6.3. Sample Study on Available Accounts
Finally, we analyze the current behavior of fake-image accounts that are still alive at the time of writing (February–March 2024). This gives insights about the use-case of rather long-term fake-accounts. As we cannot use data from Twitter’s API any longer, we randomly select available fake-image accounts and visit their Twitter profile manually. Two annotators independently check the most recent tweets and assign a topic to each profile (Cohen’s kappa: 0.84). Accounts where both annotators disagree are revisited. We choose topics from five categories, so that we can get a broad understanding of the prevalent application scenarios.
Figure 10 depicts the distribution of topics. The majority of fake-image accounts participates in the political discourse () or shares finance-related content (), mostly related to cryptocurrencies. of the profiles revolve around other websites or products (“Business”), while share explicit content or promote escort services (“Sex”). The remaining accounts () cover diverse topics or have an empty timeline. Taken together, we observe similar topics as before in our cluster analysis.
6.4. Summary
Our systematic analysis revealed Twitter accounts that use AI-generated profile images. By analyzing both their user metrics and the content of their tweets, we identify particular patterns. Some of these patterns, like the high number of suspended accounts, striking similarities within the accounts’ properties, or the multitude of similar tweets posted by different users, represent strong evidence that a subset of these accounts are part of organized, inauthentic networks. While many accounts amplify content related to cryptocurrencies or pornography, we also observe accounts that express controversial political opinions.
7. Ablation Study
Before finishing our study, we shortly confirm the design choices of the classification methodology proposed in Section 3. In particular, we justify the need to train our own classifier (Section 7.1) and study the impact of training data (Section 7.2).
7.1. Evaluation of Pre-Trained Detectors
Detecting GAN-generated images is a well-researched problem and several pre-trained detectors have been proposed (cf. Section 2). However, we observe that the performance of these detectors suffer from Twitter’s image processing, making it necessary to directly train a classifier on processed profile images.
Setup
We test three existing pre-trained classifiers: \parencitewangCNNgeneratedImagesAre2020 (which is the basis of our classifier), \parencitegragnanielloAreGANGenerated2021, and \parenciteojhaUniversalFakeImage2023. Appendix C provides more details on these three classifiers. We evaluate four conditions (see Table 3). The conditions (a)-(c) correspond to those in Figure 1 and all use images processed by Twitter. We additionally test the pre-trained detectors on unprocessed images in condition (d).
Results
Table 3 shows the AUCs of the three classifiers compared to . Our trained detector significantly outperforms the pre-trained classifiers under the Twitter conditions (a)-(c). The fact that the latter perform better under the clean condition (d) demonstrates the strong effect of Twitter’s processing. It is therefore not possible to use a pre-trained detector for our study of in-the-wild profile images.
Condition | ||||
---|---|---|---|---|
(a) vs. | 0.7279 | 0.9249 | 0.6405 | 0.9998 |
(b) vs. | 0.7243 | 0.9600 | 0.6338 | 0.9997 |
(c) vs. | 0.8713 | 0.9015 | 0.6922 | 0.9998 |
(d) vs. | 0.9466 | 1.0000 | 0.8296 | — |
7.2. Effect of Training Data
Our datasets described in Section 3.1 allow for different combinations of training data. In the following, we justify the choice of training our detector on real images, proxy-labeled real images, and fake images.
Setup
We consider three classifier variants and analyze their performance under the three conditions from Figure 1, respectively. The classifier is trained on and and represents the most straightforward option. The images are not processed by Twitter, but we resize them to pixels to match the resolution of actual profile images. The second classifier, , is trained on the same images but with Twitter’s processing. Finally, is additionally trained on as real images.
Results
Table 4 shows the AUCs of the three detector variants under the different conditions. Our finally chosen classifier, , has the highest performance in all conditions. The classifier trained on unprocessed images performs worse than both variants trained on processed images, confirming our findings from Section 7.1. Note that, while an AUC is still very high, the small difference can cause a significant increase in false positives, given the size of . Overall, our results provide two insights. First, the classifier should be trained on images that are processed similarly to the target images. Second, including proxy-labeled real images from the target distribution () improves the detection performance. A closer look shows that this causes a better separation of the classifier scores, shifting scores towards either end of the output range. This motivates our choice of for our study.
Condition | |||
---|---|---|---|
(a) vs. | 0.9971 | 0.9998 | 0.9998 |
(b) vs. | 0.9953 | 0.9995 | 0.9997 |
(c) vs. | 0.9983 | 0.9994 | 0.9998 |
8. Discussion and Limitations
Our work systematically examines the prevalence of generated images on Twitter. Despite great effort, our study has limitations that we discuss in the following.
Sampling Bias
The restricted access to Twitter as well as the limited and randomly chosen collection period can introduce a sampling bias \parenciteArpQuiPen+22 to our study. Especially the presence of several large clusters with seemingly orchestrated accounts in our collected dataset has a significant effect on our analysis. These clusters and their concrete topics are expected to change over time. Nevertheless, the characteristics, such as the bulk creation of accounts, should generally apply. The same holds for high-level tendencies, such as political amplification or spamming. These are also in line with prior observations on Twitter misuse \parenciteratkiewiczDetectingTrackingPolitical2011,cresciDecadeSocialBot2020. Finally, we note a possible bias due to the restructuring of Twitter/ after the takeover by Elon Musk. It is possible that with the rise of hate speech and bots \parencitehickeyAuditingElonMusk2023, the prevalence of generated profile images has also increased. Unfortunately, the current API limits impede a replication of our analysis.
Selection Bias in Analysis
A full analysis of all tweets is beyond the scope of our work. Thus, our cluster analysis is not exhaustive and only focuses on the prevalent trends. Still, this allows us to identify the primary contexts in which generated images are used on Twitter, so that we can draw general conclusions on topics.
Focus on Images Generated from TPDNE
We focus on facial images from TPDNE that are generated by StyleGAN2 \parencitekarrasAnalyzingImprovingImage2020. Although we are unable to provide statements regarding the prevalence of other types of generated images on Twitter, we expect to cover the most prevalent type. TPDNE has made it considerably easier to access generated images compared to other generative models. Several reports confirm that GAN-generated faces are in fact used by fake social media accounts \parencitenimmoOperationFFSFakeFace2019,nimmoOperationNavalGazing2020, nimmoIRAAgainUnlucky2020, graphikateamStepMyParler2020, stanfordinternetobservatoryReplyguysGoHunting2020,nimmoSpamouflageGoesAmerica,stanfordinternetobservatoryAnalysisTwitterTakedowns2020,strickAnalysisProchinaPropaganda2021,graphikateamFakeClusterBoosts2021,williamsPortraitModeGAN2022. Moreover, most alternative models need to be deployed locally. This requires technical knowledge and possibly specialized hardware. Although text-to-image models like Stable Diffusion or Midjourney can be accessed through a browser, generating images at scale may require significant time and additional costs. Achieving good images can require multiple attempts and services like Midjourney require payment. Finally, we note that detecting all kinds of generated images, especially in a real-world setting where images are heavily processed, is still an open challenge \parencitegragnanielloAreGANGenerated2021,corviDetectionSyntheticImages2023. Therefore, we focus on one setting where we aim at develo** a highly reliable detector.
Likelihood of Overlooked Fake Profiles
As discussed in Section 5, classifying in-the-wild data always requires trading off the number of overlooked fakes against the number of falsely detected real images. While we make our best efforts to evaluate the performance of our detection pipeline under realistic conditions, we cannot exclude that the actual FNR is higher than our estimate. Fake profile images with an unusual processing could potentially bypass our detector. Furthermore, our tool-assisted manual labeling process is not guaranteed to be error-free. However, as the FNR on the independent dataset closely matches our estimate, the likelihood of overlooked generated images should be low.
9. Related Work
Studying generated faces on social media touches different research areas. In the following, we examine related methods and concepts.
Detecting Generated Images on Social Media
Despite the plethora of proposed fake image detection methods (cf. Section 2), there exists only little work on the detection in real-world settings. \textciteboatoTrueFaceDatasetDetection2022 create a synthetic dataset of processed images by sharing real and generated images on different social media platforms. They find that a classifier trained on “original” images is not able to effectively detect shared images, unless it is fine-tuned. This confirms our results in Section 7. In a related work \parencitemarconDetectionManipulatedFace2021, the same methodology is applied to deepfake videos, yielding similar findings. \textcitesabelDetectingGeneratedMedia2021 present an approach to detect generated text and profile images on Twitter. They collect tweets related to controversial topics (e.g., COVID-19) and separately classify the tweet’s text and the corresponding profile image. Their method can detect generated media but is highly sensitive to selected thresholds. High precision thresholds cause a significant decrease in true positives, resulting in many overlooked generated images.
Closest to our work is the concurrent preprint by \textciteyangCharacteristicsPrevalenceFake2024. They estimate the prevalence of generated profile images on Twitter based on randomly sampled accounts using their proposed GANEyeDistance metric. This metric relies on StyleGAN2’s facial alignment by computing the distance between the actual and expected eye location. Their evaluation shows a FDR of , requiring to check each detected image manually. In Appendix E, we describe their method in more detail, reproduce it, and compare it with our method. We find that their approach is also vulnerable to simple geometric transformations, making it more likely to overlook generated faces. In contrast, we use a larger dataset and build a more robust detection method. Based on the results, \textciteyangCharacteristicsPrevalenceFake2024 estimate a lower bound of – active Twitter accounts that use GAN-generated profile images. Our estimated rate with is slightly higher, which we attribute to our higher detection performance and the fact that we discard accounts with Twitter’s default profile image.
Human Perception of Generated Social Media Profiles
Since it is unlikely that artificially generated social media profiles can be prevented completely, studying their effect on humans and our society is crucial. \textciteminkDeepPhishUnderstandingUser2022 conduct a user study to measure users’ trust towards such profiles in a social engineering context. They find that users are likely to accept a connection request from a LinkedIn profile using generated faces or texts. Even participants that were explicitly informed about the presence of fake accounts had an acceptance rate of . A similar work \parenciterossiAreDeepLearninggenerated2023 in which participants were asked to label profiles as real or fake in a Twitter-like environment, shows that human performance is almost equivalent to random guessing (). These findings emphasize the need for reliable detection methods of generated contents in social networks.
Social Media Studies
Complimentary to our work, a large body of interdisciplinary research has focused on the misuse of social media \parencitecresciDecadeSocialBot2020, ferrara_challenges, yardi2010detecting. For example, the 2016 US elections were marked by accusations of opinion manipulation through automated accounts on social media, particularly on Twitter, so that researchers investigated these inauthentic and coordinated campaigns \parencitebessi2016social,badam. In recent years, research has increasingly focused on the harms caused to online communities and the potential to manipulate public sentiment. Studies have extensively explored the roles of disinformation spread, online conspiracy proliferation, and political interference \parencitewang2023,shao2018spread,luceri2019evolution. Another research direction is the identification of inauthentic behavior in context of financial campaigns \parencitecresci-financial, tardelli. Recently, these research efforts are facing new challenges given the increasing use of AI-generated content by social bots \parenciteferrara_challenges.
10. Conclusion
Generative AI provides unprecedented capabilities to create deceptively realistic content, be it images, videos, text, or music. Despite the considerable applications for the good, these methods also raise significant concerns about their harmful effects. On social media, generated images can be misused to create seemingly real accounts that spread, for example, political misinformation or spam. While the detection of generated content has been explored extensively in controlled laboratory settings, there has been limited systematic research on the prevalence on social media. In this paper, we provide the first systematic large-scale study of generated profile images on Twitter. To build a reliable detection method, we carefully build a pipeline step by step where we consider different dataset types, pre-filtering, classification, and labeling-assistance methods.
In our dataset of profile images from Twitter, we classify profile images as generated. This is of the dataset, showing that generated profile images are notably present on Twitter. Our analysis of the corresponding accounts and their tweets leads to various insights. Fake-image accounts and real-image accounts differ regarding social connections, account activity, account creation time, and availability rate. For example, many fake-image accounts are created in batches and have identical metadata, indicating that they are part of an organized network. The tweet analysis shows that frequently occurring topics are cryptocurrencies, giveaways, content related to pornography and escort services, as well as controversial political discussions.
In summary, our work introduces a detection method for studying generated content on social media. Our analysis underlines that generated images are used as profile images for a wide range of applications. Addressing this threat will require several steps. First, platforms can implement detection algorithms to flag generated content, as Meta has announced lately \parencitecleggLabelingAIgeneratedImages2024. Second, watermarking methods (e.g., \parencitefernandezStableSignatureRooting2023) that integrate a detectable watermarking directly into the generation process can facilitate the detection. Finally, raising more awareness about the existence and impact of generated content will be necessary.
Ethics Statement and Data Availability
Working with real-world data from social media carries ethical and privacy-related risks. We take different measures to reduce these risks. In our study, statistics of real accounts are reported in aggregated form. We show personal information, such as profile images and tweet texts, only for accounts using generated images. However, we acknowledge that we cannot completely avoid the risk of falsely labeling a real image as generated.
To foster the development and evaluation of real-world generated image detectors, we plan to share our labeled image datasets. Moreover, to comply with Twitter’s/’s terms of service (ToS), we will release the IDs of users and tweets from our in-the-wild dataset. Due to the recent changes to Twitter’s API, we are aware that accessing the full dataset based on the IDs is challenging. We therefore invite researchers to contact us for discussing further uses of the dataset and potential collaborations.
Acknowledgements.
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC 2092 CASA – 390781972. Moreover, this work was supported by the Leibniz Association Competition (P101/2020) as well as by the IFI program of the German Academic Exchange Service (DAAD) funded by the Federal Ministry of Education and Research (BMBF).Appendix A Methodology Details
Here, we provide implementation details of our methodology.
A.1. Data Collection
In-the-Wild Dataset
We access the Twitter API using the tweepy \parencitetweepy Python package and download the profile image of each tweet’s author from the respective profile_image_url. Table 5 lists all metadata fields we obtain from the API. The second column denotes how many accounts in have a value in the respective field.
Field | Count | Description |
---|---|---|
id | Unique user identifier. | |
username | Username (handle). | |
name | Name shown in profile (display name). | |
created_at | Account creation time. | |
location | User-specified location. | |
description | Profile bio. | |
url | User-specified URL. | |
profile_image_url | URL to user’s profile image. | |
public_metrics.followers_count | Number of followers. | |
public_metrics.following_count | Number of accounts user is following. | |
public_metrics.tweet_count | Number of tweets. | |
public_metrics.listed_count | Number of lists containing user. | |
protected | Whether account is private. | |
verified | Whether account is verified. | |
withheld.country_codes | Countries where user is not available. | |
pinned_tweet_id | Identifier of user’s pinned tweet. | |
entities.url.urls | Details about profile website. | |
entities.description.mentions | Details about user mentions in description. | |
entities.description.urls | Details about URLs in description. | |
entities.description.hashtags | Details about hashtags in description. | |
entities.description.cashtags | Details about cashtags in description. |
Labeled Datasets / and Variations
We collect images from TPDNE by repeatedly querying the website, mimicking a user creating a fake profile. We analogously take the first real images from the FFHQ dataset. To avoid an unwanted bias based on image processing, we convert the PNG files from FFHQ to JPEG using the same parameters as TPDNE. Then, to obtain processed images as they would appear on Twitter ( and ), we upload each image as a profile image and download it. We observed a difference in the image processing between API-based and browser-based uploads. Images uploaded with the API kept their resolution, while images uploaded in the browser were resized to pixels. As the majority of in-the-wild images has the resized resolution, we select the browser-based approach and automate the upload using the web automation framework Selenium \parenciteselenium. To obtain the zoomed-in versions ( and ), the automated upload procedure is extended by first zooming into each image by a random amount and then moving the image by a random x- and y-offset. We ensure that the image still looks like a plausible profile image at the maximum zoom rate.
A.2. Pre-Filter
BlazeFace \parencitebazarevskyBlazeFace2019 predicts a bounding box as well as the x- and y-positions of six facial landmarks (eyes, ears, mouth, and nose) in normalized coordinates between 0 and 1. If an image contains multiple faces, we select the one with the largest bounding box.
A.3. Classifier
Our architecture and training procedure is adapted from \textcitewangCNNgeneratedImagesAre2020. We follow the common practice of initializing a ResNet-50 \parenciteheDeepResidualLearning2016 with weights from an image classifier trained on ImageNet \parenciterussakovskyImageNetLargeScale2015 and replace the final layer to reflect the binary classification setting. During training, we use a batch size of and optimize the model using Adam \parencitekingmaAdamMethodStochastic2015 and binary cross-entropy loss. In the case of we ensure balanced sampling of real/proxy-labeled real and fake samples. The learning rate is reduced by a factor of 10 if the validation loss does not decrease by during 5 epochs. We perform early stop** once the learning rate becomes smaller than . For training , the images in and are resized to using bilinear interpolation to match the profile image dimensions of Twitter. The training data is augmented using three kinds of perturbations, each applied with probability : Gaussian blurring with a kernel size of and uniformly sampled from , JPEG compression with quality uniformly sampled from , and resizing, with scale and aspect ratio uniformly sampled from and , respectively. During training, we randomly extract crops of size , while we take the center crop of the same size during validation and testing.
Appendix B Inversion Examples
Figure 11 depicts example images to demonstrate the assisted manual labeling. The left image is the original while the right image is its reconstruction obtained by GAN inversion. For the real image from , we observe that the background is inaccurate and the face is slightly blurred. In contrast, the generated image from can be reconstructed very accurately, including the background.
Appendix C Pre-Trained Detectors
Here we provide details on the three existing pre-trained classifiers we evaluate in Section 7.1. \parencitewangCNNgeneratedImagesAre2020 is the model on that our detector is based on. However, it is trained on a diverse set of images generated by ProGAN \parencitekarrasProgressiveGrowingGANs2018 and corresponding real images from LSUN \parenciteyuLSUNConstructionLargescale2016. We select the version Blur+JPEG (0.1) since the authors report a good performance on images generated by StyleGAN2 \parencitekarrasAnalyzingImprovingImage2020. \parencitegragnanielloAreGANGenerated2021 is an improved version of that avoids downsampling in the first layer of the ResNet-50 \parenciteheDeepResidualLearning2016 backbone to preserve high-frequency artifacts (at the cost of a larger model). Besides training on ProGAN \parencitekarrasProgressiveGrowingGANs2018 images, the authors provide a detector trained on StyleGAN2 \parencitekarrasAnalyzingImprovingImage2020 images, which we select since it should yield the best results on our dataset. Finally, follows a different approach and leverages the feature space of a pre-trained vision transformer (CLIP-ViT \parencitedosovitskiyImageWorth16x162020, radfordLearningTransferableVisual2021. It uses a single linear layer on top (trained on ProGAN \parencitekarrasProgressiveGrowingGANs2018 images) to predict whether an image is real or fake.
Appendix D Duplicate Image Detection
Despite the trivial access to generated faces using TPDNE, creators of fake account clusters might use the same face for multiple accounts. To identify such duplicates, we need an approach that is robust to subtle differences caused by varying image processing. We adapt the technique used by previous works \parencitegarimellaImagesMisinformationPolitical2020,zannettouOriginsMemesMeans2018,wangUnderstandingUseImages2023 and cluster images based on their perceptual hashes (pHashes). Perceptual image hashing \parencitefaridOverviewPerceptualHashing2021 aims to extract a meaningful representation of an image that does not depend on individual pixel values, but on the perceived content. The algorithm we use \parencitebuchnerImageHash achieves this by deriving 64 bits from the DCT coefficients belonging to the lower frequencies of an image. To obtain groups of duplicate images, we apply the DBSCAN \parenciteesterDensitybasedAlgorithmDiscovering1996 clustering algorithm to our calculated pHashes. We use the implementation from scikit-learn \parencitescikit-learn and set the minimum number of elements to 2. We empirically find that we obtain meaningful clusters by setting the maximum allowed Hamming distance between two pHashes to 3.
In total, we identify 540 groups of duplicated images with an average size of 4.88 images. The distribution of the sizes is given in Figure 12. About half of all groups consist of only two or three duplicated images, while the most frequently used faces appeared in 18 profiles.
Appendix E Evaluation of Alignment-Based Detection
In the concurrent work by \textciteyangCharacteristicsPrevalenceFake2024, the authors identify GAN-generated faces on Twitter using a method that is related to our concept of alignment (cf. Section 3.3). They define the GANEyeDistance as the normalized Euclidean distance between the actual and expected location of each eye. They propose to consider an image to be potentially GAN-generated if . To reach a final decision, they propose to manually classify images based on visual artifacts. While this approach is easy to implement and computationally efficient, we find that is suboptimal regarding (a) the number of false positives (causing a large manual workload) and (b) the number of false negatives (overlooking generated faces that are not aligned).
We test with the suggested threshold on randomly chosen images from (about , which yields 730 candidate profiles. For all samples in , the estimated number of candidates therefore is . Manually classifying these images would require an excessive amount of manual effort.
On the other hand, we find 440 images in that are detected as fake by but are overlooked when classifying based on . Naturally, it can be assumed that in this subset our classifier has a higher number of false positives, since most generated images are in fact aligned. Still, after manual inspection, we rate 303 of these images to be definitely or very likely generated. Note that manual labeling is more challenging on these images since we cannot resort to GAN inversion. Figure 13 depicts some examples together with their value of . One can see that zooming in by a small amount is sufficient to cause a misalignment. We consider it probable that malicious accounts do this on purpose to appear more credible and avoid detection based on facial landmarks.