-
The Anatomy of Conspirators: Unveiling Traits using a Comprehensive Twitter Dataset
Authors:
Margherita Gambini,
Serena Tardelli,
Maurizio Tesconi
Abstract:
The discourse around conspiracy theories is currently thriving amidst the rampant misinformation in online environments. Research in this field has been focused on detecting conspiracy theories on social media, often relying on limited datasets. In this study, we present a novel methodology for constructing a Twitter dataset that encompasses accounts engaged in conspiracy-related activities throug…
▽ More
The discourse around conspiracy theories is currently thriving amidst the rampant misinformation in online environments. Research in this field has been focused on detecting conspiracy theories on social media, often relying on limited datasets. In this study, we present a novel methodology for constructing a Twitter dataset that encompasses accounts engaged in conspiracy-related activities throughout the year 2022. Our approach centers on data collection that is independent of specific conspiracy theories and information operations. Additionally, our dataset includes a control group comprising randomly selected users who can be fairly compared to the individuals involved in conspiracy activities. This comprehensive collection effort yielded a total of 15K accounts and 37M tweets extracted from their timelines. We conduct a comparative analysis of the two groups across three dimensions: topics, profiles, and behavioral characteristics. The results indicate that conspiracy and control users exhibit similarity in terms of their profile metadata characteristics. However, they diverge significantly in terms of behavior and activity, particularly regarding the discussed topics, the terminology used, and their stance on trending subjects. In addition, we find no significant disparity in the presence of bot users between the two groups. Finally, we develop a classifier to identify conspiracy users using features borrowed from bot, troll and linguistic literature. The results demonstrate a high accuracy level (with an F1 score of 0.94), enabling us to uncover the most discriminating features associated with conspiracy-related accounts.
△ Less
Submitted 5 February, 2024; v1 submitted 29 August, 2023;
originally announced August 2023.
-
Tweets2Stance: Users stance detection exploiting Zero-Shot Learning Algorithms on Tweets
Authors:
Margherita Gambini,
Tiziano Fagni,
Caterina Senette,
Maurizio Tesconi
Abstract:
In the last years there has been a growing attention towards predicting the political orientation of active social media users, being this of great help to study political forecasts, opinion dynamics modeling and users polarization. Existing approaches, mainly targeting Twitter users, rely on content-based analysis or are based on a mixture of content, network and communication analysis. The recen…
▽ More
In the last years there has been a growing attention towards predicting the political orientation of active social media users, being this of great help to study political forecasts, opinion dynamics modeling and users polarization. Existing approaches, mainly targeting Twitter users, rely on content-based analysis or are based on a mixture of content, network and communication analysis. The recent research perspective exploits the fact that a user's political affinity mainly depends on his/her positions on major political and social issues, thus shifting the focus on detecting the stance of users through user-generated content shared on social networks. The work herein described focuses on a completely unsupervised stance detection framework that predicts the user's stance about specific social-political statements by exploiting content-based analysis of its Twitter timeline. The ground-truth user's stance may come from Voting Advice Applications, online tools that help citizens to identify their political leanings by comparing their political preferences with party political stances. Starting from the knowledge of the agreement level of six parties on 20 different statements, the objective of the study is to predict the stance of a Party p in regard to each statement s exploiting what the Twitter Party account wrote on Twitter. To this end we propose Tweets2Stance (T2S), a novel and totally unsupervised stance detector framework which relies on the zero-shot learning technique to quickly and accurately operate on non-labeled data. Interestingly, T2S can be applied to any social media user for any context of interest, not limited to the political one. Results obtained from multiple experiments show that, although the general maximum F1 value is 0.4, T2S can correctly predict the stance with a general minimum MAE of 1.13, which is a great achievement considering the task complexity.
△ Less
Submitted 22 April, 2022;
originally announced April 2022.
-
TweepFake: about Detecting Deepfake Tweets
Authors:
Tiziano Fagni,
Fabrizio Falchi,
Margherita Gambini,
Antonio Martella,
Maurizio Tesconi
Abstract:
The recent advances in language modeling significantly improved the generative capabilities of deep neural models: in 2019 OpenAI released GPT-2, a pre-trained language model that can autonomously generate coherent, non-trivial and human-like text samples. Since then, ever more powerful text generative models have been developed. Adversaries can exploit these tremendous generative capabilities to…
▽ More
The recent advances in language modeling significantly improved the generative capabilities of deep neural models: in 2019 OpenAI released GPT-2, a pre-trained language model that can autonomously generate coherent, non-trivial and human-like text samples. Since then, ever more powerful text generative models have been developed. Adversaries can exploit these tremendous generative capabilities to enhance social bots that will have the ability to write plausible deepfake messages, ho** to contaminate public debate. To prevent this, it is crucial to develop deepfake social media messages detection systems. However, to the best of our knowledge no one has ever addressed the detection of machine-generated texts on social networks like Twitter or Facebook. With the aim of hel** the research in this detection field, we collected the first dataset of \real deepfake tweets, TweepFake. It is real in the sense that each deepfake tweet was actually posted on Twitter. We collected tweets from a total of 23 bots, imitating 17 human accounts. The bots are based on various generation techniques, i.e., Markov Chains, RNN, RNN+Markov, LSTM, GPT-2. We also randomly selected tweets from the humans imitated by the bots to have an overall balanced dataset of 25,572 tweets (half human and half bots generated). The dataset is publicly available on Kaggle. Lastly, we evaluated 13 deepfake text detection methods (based on various state-of-the-art approaches) to both demonstrate the challenges that Tweepfake poses and create a solid baseline of detection techniques. We hope that TweepFake can offer the opportunity to tackle the deepfake detection on social media messages as well.
△ Less
Submitted 6 May, 2021; v1 submitted 31 July, 2020;
originally announced August 2020.