Search | arXiv e-print repository

You are your Metadata: Identification and Obfuscation of Social Media Users using Metadata Information

Authors: Beatrice Perez, Mirco Musolesi, Gianluca Stringhini

Abstract: Metadata are associated to most of the information we produce in our daily interactions and communication in the digital world. Yet, surprisingly, metadata are often still catergorized as non-sensitive. Indeed, in the past, researchers and practitioners have mainly focused on the problem of the identification of a user from the content of a message. In this paper, we use Twitter as a case study… ▽ More Metadata are associated to most of the information we produce in our daily interactions and communication in the digital world. Yet, surprisingly, metadata are often still catergorized as non-sensitive. Indeed, in the past, researchers and practitioners have mainly focused on the problem of the identification of a user from the content of a message. In this paper, we use Twitter as a case study to quantify the uniqueness of the association between metadata and user identity and to understand the effectiveness of potential obfuscation strategies. More specifically, we analyze atomic fields in the metadata and systematically combine them in an effort to classify new tweets as belonging to an account using different machine learning algorithms of increasing complexity. We demonstrate that through the application of a supervised learning algorithm, we are able to identify any user in a group of 10,000 with approximately 96.7% accuracy. Moreover, if we broaden the scope of our search and consider the 10 most likely candidates we increase the accuracy of the model to 99.22%. We also found that data obfuscation is hard and ineffective for this type of data: even after perturbing 60% of the training data, it is still possible to classify users with an accuracy higher than 95%. These results have strong implications in terms of the design of metadata obfuscation strategies, for example for data set release, not only for Twitter, but, more generally, for most social media platforms. △ Less

Submitted 14 May, 2018; v1 submitted 27 March, 2018; originally announced March 2018.

Comments: 11 pages, 13 figures. Published in the Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM 2018). June 2018. Stanford, CA, USA

arXiv:1803.03448 [pdf, other]

A Family of Droids -- Android Malware Detection via Behavioral Modeling: Static vs Dynamic Analysis

Authors: Lucky Onwuzurike, Mario Almeida, Enrico Mariconti, Jeremy Blackburn, Gianluca Stringhini, Emiliano De Cristofaro

Abstract: Following the increasing popularity of mobile ecosystems, cybercriminals have increasingly targeted them, designing and distributing malicious apps that steal information or cause harm to the device's owner. Aiming to counter them, detection techniques based on either static or dynamic analysis that model Android malware, have been proposed. While the pros and cons of these analysis techniques are… ▽ More Following the increasing popularity of mobile ecosystems, cybercriminals have increasingly targeted them, designing and distributing malicious apps that steal information or cause harm to the device's owner. Aiming to counter them, detection techniques based on either static or dynamic analysis that model Android malware, have been proposed. While the pros and cons of these analysis techniques are known, they are usually compared in the context of their limitations e.g., static analysis is not able to capture runtime behaviors, full code coverage is usually not achieved during dynamic analysis, etc. Whereas, in this paper, we analyze the performance of static and dynamic analysis methods in the detection of Android malware and attempt to compare them in terms of their detection performance, using the same modeling approach. To this end, we build on MaMaDroid, a state-of-the-art detection system that relies on static analysis to create a behavioral model from the sequences of abstracted API calls. Then, aiming to apply the same technique in a dynamic analysis setting, we modify CHIMP, a platform recently proposed to crowdsource human inputs for app testing, in order to extract API calls' sequences from the traces produced while executing the app on a CHIMP virtual device. We call this system AuntieDroid and instantiate it by using both automated (Monkey) and user-generated inputs. We find that combining both static and dynamic analysis yields the best performance, with F-measure reaching 0.92. We also show that static analysis is at least as effective as dynamic analysis, depending on how apps are stimulated during execution, and, finally, investigate the reasons for inconsistent misclassifications across methods. △ Less

Submitted 13 July, 2018; v1 submitted 9 March, 2018; originally announced March 2018.

Comments: A preliminary version of this paper appears in the Proceedings of 16th Annual Conference on Privacy, Security and Trust (PST 2018). This is the full version

arXiv:1802.05287 [pdf, other]

doi 10.1145/3184558.3191531

What is Gab? A Bastion of Free Speech or an Alt-Right Echo Chamber?

Authors: Savvas Zannettou, Barry Bradlyn, Emiliano De Cristofaro, Haewoon Kwak, Michael Sirivianos, Gianluca Stringhini, Jeremy Blackburn

Abstract: Over the past few years, a number of new "fringe" communities, like 4chan or certain subreddits, have gained traction on the Web at a rapid pace. However, more often than not, little is known about how they evolve or what kind of activities they attract, despite recent research has shown that they influence how false information reaches mainstream communities. This motivates the need to monitor th… ▽ More Over the past few years, a number of new "fringe" communities, like 4chan or certain subreddits, have gained traction on the Web at a rapid pace. However, more often than not, little is known about how they evolve or what kind of activities they attract, despite recent research has shown that they influence how false information reaches mainstream communities. This motivates the need to monitor these communities and analyze their impact on the Web's information ecosystem. In August 2016, a new social network called Gab was created as an alternative to Twitter. It positions itself as putting "people and free speech first'", welcoming users banned or suspended from other social networks. In this paper, we provide, to the best of our knowledge, the first characterization of Gab. We collect and analyze 22M posts produced by 336K users between August 2016 and January 2018, finding that Gab is predominantly used for the dissemination and discussion of news and world events, and that it attracts alt-right users, conspiracy theorists, and other trolls. We also measure the prevalence of hate speech on the platform, finding it to be much higher than Twitter, but lower than 4chan's Politically Incorrect board. △ Less

Submitted 13 March, 2018; v1 submitted 14 February, 2018; originally announced February 2018.

Comments: To appear in 3rd Cybersafety Workshop (WWW Companion, 2018)

arXiv:1802.00393 [pdf, other]

Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior

Authors: Antigoni-Maria Founta, Constantinos Djouvas, Despoina Chatzakou, Ilias Leontiadis, Jeremy Blackburn, Gianluca Stringhini, Athena Vakali, Michael Sirivianos, Nicolas Kourtellis

Abstract: In recent years, offensive, abusive and hateful language, sexism, racism and other types of aggressive and cyberbullying behavior have been manifesting with increased frequency, and in many online social media platforms. In fact, past scientific work focused on studying these forms in popular media, such as Facebook and Twitter. Building on such work, we present an 8-month study of the various for… ▽ More In recent years, offensive, abusive and hateful language, sexism, racism and other types of aggressive and cyberbullying behavior have been manifesting with increased frequency, and in many online social media platforms. In fact, past scientific work focused on studying these forms in popular media, such as Facebook and Twitter. Building on such work, we present an 8-month study of the various forms of abusive behavior on Twitter, in a holistic fashion. Departing from past work, we examine a wide variety of labeling schemes, which cover different forms of abusive behavior, at the same time. We propose an incremental and iterative methodology, that utilizes the power of crowdsourcing to annotate a large scale collection of tweets with a set of abuse-related labels. In fact, by applying our methodology including statistical analysis for label merging or elimination, we identify a reduced but robust set of labels. Finally, we offer a first overview and findings of our collected and annotated dataset of 100 thousand tweets, which we make publicly available for further scientific exploration. △ Less

Submitted 15 April, 2018; v1 submitted 1 February, 2018; originally announced February 2018.

Comments: crowdsourcing, abusive behavior, hate speech, Twitter, aggression, bullying

MSC Class: 68T06 ACM Class: K.4.2

arXiv:1801.10396 [pdf, other]

Understanding Web Archiving Services and Their (Mis)Use on Social Media

Authors: Savvas Zannettou, Jeremy Blackburn, Emiliano De Cristofaro, Michael Sirivianos, Gianluca Stringhini

Abstract: Web archiving services play an increasingly important role in today's information ecosystem, by ensuring the continuing availability of information, or by deliberately caching content that might get deleted or removed. Among these, the Wayback Machine has been proactively archiving, since 2001, versions of a large number of Web pages, while newer services like archive.is allow users to create on-d… ▽ More Web archiving services play an increasingly important role in today's information ecosystem, by ensuring the continuing availability of information, or by deliberately caching content that might get deleted or removed. Among these, the Wayback Machine has been proactively archiving, since 2001, versions of a large number of Web pages, while newer services like archive.is allow users to create on-demand snapshots of specific Web pages, which serve as time capsules that can be shared across the Web. In this paper, we present a large-scale analysis of Web archiving services and their use on social media, shedding light on the actors involved in this ecosystem, the content that gets archived, and how it is shared. We crawl and study: 1) 21M URLs from archive.is, spanning almost two years, and 2) 356K archive.is plus 391K Wayback Machine URLs that were shared on four social networks: Reddit, Twitter, Gab, and 4chan's Politically Incorrect board (/pol/) over 14 months. We observe that news and social media posts are the most common types of content archived, likely due to their perceived ephemeral and/or controversial nature. Moreover, URLs of archiving services are extensively shared on "fringe" communities within Reddit and 4chan to preserve possibly contentious content. Lastly, we find evidence of moderators nudging or even forcing users to use archives, instead of direct links, for news sources with opposing ideologies, potentially depriving them of ad revenue. △ Less

Submitted 9 April, 2018; v1 submitted 31 January, 2018; originally announced January 2018.

Comments: Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM 2018)

arXiv:1801.09288 [pdf, other]

Disinformation Warfare: Understanding State-Sponsored Trolls on Twitter and Their Influence on the Web

Authors: Savvas Zannettou, Tristan Caulfield, Emiliano De Cristofaro, Michael Sirivianos, Gianluca Stringhini, Jeremy Blackburn

Abstract: Over the past couple of years, anecdotal evidence has emerged linking coordinated campaigns by state-sponsored actors with efforts to manipulate public opinion on the Web, often around major political events, through dedicated accounts, or "trolls." Although they are often involved in spreading disinformation on social media, there is little understanding of how these trolls operate, what type of… ▽ More Over the past couple of years, anecdotal evidence has emerged linking coordinated campaigns by state-sponsored actors with efforts to manipulate public opinion on the Web, often around major political events, through dedicated accounts, or "trolls." Although they are often involved in spreading disinformation on social media, there is little understanding of how these trolls operate, what type of content they disseminate, and most importantly their influence on the information ecosystem. In this paper, we shed light on these questions by analyzing 27K tweets posted by 1K Twitter users identified as having ties with Russia's Internet Research Agency and thus likely state-sponsored trolls. We compare their behavior to a random set of Twitter users, finding interesting differences in terms of the content they disseminate, the evolution of their account, as well as their general behavior and use of Twitter. Then, using Hawkes Processes, we quantify the influence that trolls had on the dissemination of news on social platforms like Twitter, Reddit, and 4chan. Overall, our findings indicate that Russian trolls managed to stay active for long periods of time and to reach a substantial number of Twitter users with their tweets. When looking at their ability of spreading news content and making it viral, however, we find that their effect on social platforms was minor, with the significant exception of news published by the Russian state-sponsored news outlet RT (Russia Today). △ Less

Submitted 4 March, 2019; v1 submitted 28 January, 2018; originally announced January 2018.

Comments: A preliminary version of this paper appears in the 4th Workshop on The Fourth Workshop on Computational Methods in Online Misbehavior (CyberSafety 2019) -- WWW'19 Companion Proceedings. This is the full version

arXiv:1801.08115 [pdf, other]

Eight Years of Rider Measurement in the Android Malware Ecosystem: Evolution and Lessons Learned

Authors: Guillermo Suarez-Tangil, Gianluca Stringhini

Abstract: Despite the growing threat posed by Android malware, the research community is still lacking a comprehensive view of common behaviors and trends exposed by malware families active on the platform. Without such view, the researchers incur the risk of develo** systems that only detect outdated threats, missing the most recent ones. In this paper, we conduct the largest measurement of Android malwa… ▽ More Despite the growing threat posed by Android malware, the research community is still lacking a comprehensive view of common behaviors and trends exposed by malware families active on the platform. Without such view, the researchers incur the risk of develo** systems that only detect outdated threats, missing the most recent ones. In this paper, we conduct the largest measurement of Android malware behavior to date, analyzing over 1.2 million malware samples that belong to 1.2K families over a period of eight years (from 2010 to 2017). We aim at understanding how the behavior of Android malware has evolved over time, focusing on repackaging malware. In this type of threats different innocuous apps are piggybacked with a malicious payload (rider), allowing inexpensive malware manufacturing. One of the main challenges posed when studying repackaged malware is slicing the app to split benign components apart from the malicious ones. To address this problem, we use differential analysis to isolate software components that are irrelevant to the campaign and study the behavior of malicious riders alone. Our analysis framework relies on collective repositories and recent advances on the systematization of intelligence extracted from multiple anti-virus vendors. We find that since its infancy in 2010, the Android malware ecosystem has changed significantly, both in the type of malicious activity performed by the malicious samples and in the level of obfuscation used by malware to avoid detection. We then show that our framework can aid analysts who attempt to study unknown malware families. Finally, we discuss what our findings mean for Android malware detection research, highlighting areas that need further attention by the research community. △ Less

Submitted 19 March, 2020; v1 submitted 24 January, 2018; originally announced January 2018.

Comments: A shorter version of this paper appears in IEEE Transactions on Dependable and Secure Computing. This is a more detailed version, extended from the pre-print

arXiv:1711.07477 [pdf, other]

MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models (Extended Version)

Authors: Lucky Onwuzurike, Enrico Mariconti, Panagiotis Andriotis, Emiliano De Cristofaro, Gordon Ross, Gianluca Stringhini

Abstract: As Android has become increasingly popular, so has malware targeting it, thus pushing the research community to propose different detection techniques. However, the constant evolution of the Android ecosystem, and of malware itself, makes it hard to design robust tools that can operate for long periods of time without the need for modifications or costly re-training. Aiming to address this issue,… ▽ More As Android has become increasingly popular, so has malware targeting it, thus pushing the research community to propose different detection techniques. However, the constant evolution of the Android ecosystem, and of malware itself, makes it hard to design robust tools that can operate for long periods of time without the need for modifications or costly re-training. Aiming to address this issue, we set to detect malware from a behavioral point of view, modeled as the sequence of abstracted API calls. We introduce MaMaDroid, a static-analysis based system that abstracts the API calls performed by an app to their class, package, or family, and builds a model from their sequences obtained from the call graph of an app as Markov chains. This ensures that the model is more resilient to API changes and the features set is of manageable size. We evaluate MaMaDroid using a dataset of 8.5K benign and 35.5K malicious apps collected over a period of six years, showing that it effectively detects malware (with up to 0.99 F-measure) and keeps its detection capabilities for long periods of time (up to 0.87 F-measure two years after training). We also show that MaMaDroid remarkably outperforms DroidAPIMiner, a state-of-the-art detection system that relies on the frequency of (raw) API calls. Aiming to assess whether MaMaDroid's effectiveness mainly stems from the API abstraction or from the sequencing modeling, we also evaluate a variant of it that uses frequency (instead of sequences), of abstracted API calls. We find that it is not as accurate, failing to capture maliciousness when trained on malware samples that include API calls that are equally or more frequently used by benign apps. △ Less

Submitted 2 March, 2019; v1 submitted 20 November, 2017; originally announced November 2017.

Comments: A preliminary version of this paper appears in the Proceedings of the 24th Network and Distributed System Security Symposium (NDSS 2017) [arXiv:1612.04433]. This is the extended version, to appear in the ACM Transactions on Privacy and Security (ACM TOPS)

arXiv:1708.09058 [pdf, other]

POISED: Spotting Twitter Spam Off the Beaten Paths

Authors: Shirin Nilizadeh, Francois Labreche, Alireza Sedighian, Ali Zand, Jose Fernandez, Christopher Kruegel, Gianluca Stringhini, Giovanni Vigna

Abstract: Cybercriminals have found in online social networks a propitious medium to spread spam and malicious content. Existing techniques for detecting spam include predicting the trustworthiness of accounts and analyzing the content of these messages. However, advanced attackers can still successfully evade these defenses. Online social networks bring people who have personal connections or share commo… ▽ More Cybercriminals have found in online social networks a propitious medium to spread spam and malicious content. Existing techniques for detecting spam include predicting the trustworthiness of accounts and analyzing the content of these messages. However, advanced attackers can still successfully evade these defenses. Online social networks bring people who have personal connections or share common interests to form communities. In this paper, we first show that users within a networked community share some topics of interest. Moreover, content shared on these social network tend to propagate according to the interests of people. Dissemination paths may emerge where some communities post similar messages, based on the interests of those communities. Spam and other malicious content, on the other hand, follow different spreading patterns. In this paper, we follow this insight and present POISED, a system that leverages the differences in propagation between benign and malicious messages on social networks to identify spam and other unwanted content. We test our system on a dataset of 1.3M tweets collected from 64K users, and we show that our approach is effective in detecting malicious messages, reaching 91% precision and 93% recall. We also show that POISED's detection is more comprehensive than previous systems, by comparing it to three state-of-the-art spam detection systems that have been proposed by the research community in the past. POISED significantly outperforms each of these systems. Moreover, through simulations, we show how POISED is effective in the early detection of spam messages and how it is resilient against two well-known adversarial machine learning attacks. △ Less

Submitted 29 August, 2017; originally announced August 2017.

arXiv:1705.06947 [pdf, other]

The Web Centipede: Understanding How Web Communities Influence Each Other Through the Lens of Mainstream and Alternative News Sources

Authors: Savvas Zannettou, Tristan Caulfield, Emiliano De Cristofaro, Nicolas Kourtellis, Ilias Leontiadis, Michael Sirivianos, Gianluca Stringhini, Jeremy Blackburn

Abstract: As the number and the diversity of news outlets on the Web grow, so does the opportunity for "alternative" sources of information to emerge. Using large social networks like Twitter and Facebook, misleading, false, or agenda-driven information can quickly and seamlessly spread online, deceiving people or influencing their opinions. Also, the increased engagement of tightly knit communities, such a… ▽ More As the number and the diversity of news outlets on the Web grow, so does the opportunity for "alternative" sources of information to emerge. Using large social networks like Twitter and Facebook, misleading, false, or agenda-driven information can quickly and seamlessly spread online, deceiving people or influencing their opinions. Also, the increased engagement of tightly knit communities, such as Reddit and 4chan, further compounds the problem, as their users initiate and propagate alternative information, not only within their own communities, but also to different ones as well as various social media. In fact, these platforms have become an important piece of the modern information ecosystem, which, thus far, has not been studied as a whole. In this paper, we begin to fill this gap by studying mainstream and alternative news shared on Twitter, Reddit, and 4chan. By analyzing millions of posts around several axes, we measure how mainstream and alternative news flows between these platforms. Our results indicate that alt-right communities within 4chan and Reddit can have a surprising level of influence on Twitter, providing evidence that "fringe" communities often succeed in spreading alternative news to mainstream social networks and the greater Web. △ Less

Submitted 30 September, 2017; v1 submitted 19 May, 2017; originally announced May 2017.

Comments: To appear in the 17th ACM Internet Measurement Conference (IMC 2017)

arXiv:1705.03345 [pdf, other]

Hate is not Binary: Studying Abusive Behavior of #GamerGate on Twitter

Authors: Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Athena Vakali

Abstract: Over the past few years, online bullying and aggression have become increasingly prominent, and manifested in many different forms on social media. However, there is little work analyzing the characteristics of abusive users and what distinguishes them from typical social media users. In this paper, we start addressing this gap by analyzing tweets containing a great large amount of abusiveness. We… ▽ More Over the past few years, online bullying and aggression have become increasingly prominent, and manifested in many different forms on social media. However, there is little work analyzing the characteristics of abusive users and what distinguishes them from typical social media users. In this paper, we start addressing this gap by analyzing tweets containing a great large amount of abusiveness. We focus on a Twitter dataset revolving around the Gamergate controversy, which led to many incidents of cyberbullying and cyberaggression on various gaming and social media platforms. We study the properties of the users tweeting about Gamergate, the content they post, and the differences in their behavior compared to typical Twitter users. We find that while their tweets are often seemingly about aggressive and hateful subjects, "Gamergaters" do not exhibit common expressions of online anger, and in fact primarily differ from typical users in that their tweets are less joyful. They are also more engaged than typical Twitter users, which is an indication as to how and why this controversy is still ongoing. Surprisingly, we find that Gamergaters are less likely to be suspended by Twitter, thus we analyze their properties to identify differences from typical users and what may have led to their suspension. We perform an unsupervised machine learning analysis to detect clusters of users who, though currently active, could be considered for suspension since they exhibit similar behaviors with suspended users. Finally, we confirm the usefulness of our analyzed features by emulating the Twitter suspension mechanism with a supervised learning method, achieving very good precision and recall. △ Less

Submitted 9 May, 2017; originally announced May 2017.

Comments: In 28th ACM Conference on Hypertext and Social Media (ACM HyperText 2017)

arXiv:1704.07759 [pdf, other]

Email Babel: Does Language Affect Criminal Activity in Compromised Webmail Accounts?

Authors: Emeric Bernard-Jones, Jeremiah Onaolapo, Gianluca Stringhini

Abstract: We set out to understand the effects of differing language on the ability of cybercriminals to navigate webmail accounts and locate sensitive information in them. To this end, we configured thirty Gmail honeypot accounts with English, Romanian, and Greek language settings. We populated the accounts with email messages in those languages by subscribing them to selected online newsletters. We hid em… ▽ More We set out to understand the effects of differing language on the ability of cybercriminals to navigate webmail accounts and locate sensitive information in them. To this end, we configured thirty Gmail honeypot accounts with English, Romanian, and Greek language settings. We populated the accounts with email messages in those languages by subscribing them to selected online newsletters. We hid email messages about fake bank accounts in fifteen of the accounts to mimic real-world webmail users that sometimes store sensitive information in their accounts. We then leaked credentials to the honey accounts via paste sites on the Surface Web and the Dark Web, and collected data for fifteen days. Our statistical analyses on the data show that cybercriminals are more likely to discover sensitive information (bank account information) in the Greek accounts than the remaining accounts, contrary to the expectation that Greek ought to constitute a barrier to the understanding of non-Greek visitors to the Greek accounts. We also extracted the important words among the emails that cybercriminals accessed (as an approximation of the keywords that they searched for within the honey accounts), and found that financial terms featured among the top words. In summary, we show that language plays a significant role in the ability of cybercriminals to access sensitive information hidden in compromised webmail accounts. △ Less

Submitted 25 April, 2017; originally announced April 2017.

arXiv:1702.07784 [pdf, other]

Measuring #GamerGate: A Tale of Hate, Sexism, and Bullying

Authors: Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Athena Vakali

Abstract: Over the past few years, online aggression and abusive behaviors have occurred in many different forms and on a variety of platforms. In extreme cases, these incidents have evolved into hate, discrimination, and bullying, and even materialized into real-world threats and attacks against individuals or groups. In this paper, we study the Gamergate controversy. Started in August 2014 in the online g… ▽ More Over the past few years, online aggression and abusive behaviors have occurred in many different forms and on a variety of platforms. In extreme cases, these incidents have evolved into hate, discrimination, and bullying, and even materialized into real-world threats and attacks against individuals or groups. In this paper, we study the Gamergate controversy. Started in August 2014 in the online gaming world, it quickly spread across various social networking platforms, ultimately leading to many incidents of cyberbullying and cyberaggression. We focus on Twitter, presenting a measurement study of a dataset of 340k unique users and 1.6M tweets to study the properties of these users, the content they post, and how they differ from random Twitter users. We find that users involved in this "Twitter war" tend to have more friends and followers, are generally more engaged and post tweets with negative sentiment, less joy, and more hate than random users. We also perform preliminary measurements on how the Twitter suspension mechanism deals with such abusive behaviors. While we focus on Gamergate, our methodology to collect and analyze tweets related to aggressive and bullying activities is of independent interest. △ Less

Submitted 24 February, 2017; originally announced February 2017.

Comments: WWW Cybersafety Workshop 2017

arXiv:1702.06877 [pdf, other]

Mean Birds: Detecting Aggression and Bullying on Twitter

Authors: Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Athena Vakali

Abstract: In recent years, bullying and aggression against users on social media have grown significantly, causing serious consequences to victims of all demographics. In particular, cyberbullying affects more than half of young social media users worldwide, and has also led to teenage suicides, prompted by prolonged and/or coordinated digital harassment. Nonetheless, tools and technologies for understandin… ▽ More In recent years, bullying and aggression against users on social media have grown significantly, causing serious consequences to victims of all demographics. In particular, cyberbullying affects more than half of young social media users worldwide, and has also led to teenage suicides, prompted by prolonged and/or coordinated digital harassment. Nonetheless, tools and technologies for understanding and mitigating it are scarce and mostly ineffective. In this paper, we present a principled and scalable approach to detect bullying and aggressive behavior on Twitter. We propose a robust methodology for extracting text, user, and network-based attributes, studying the properties of cyberbullies and aggressors, and what features distinguish them from regular users. We find that bully users post less, participate in fewer online communities, and are less popular than normal users, while aggressors are quite popular and tend to include more negativity in their posts. We evaluate our methodology using a corpus of 1.6M tweets posted over 3 months, and show that machine learning classification algorithms can accurately detect users exhibiting bullying and aggressive behavior, achieving over 90% AUC. △ Less

Submitted 12 May, 2017; v1 submitted 22 February, 2017; originally announced February 2017.

arXiv:1702.04256 [pdf, other]

What's in a Name? Understanding Profile Name Reuse on Twitter

Authors: Enrico Mariconti, Jeremiah Onaolapo, Syed Sharique Ahmad, Nicolas Nikiforou, Manuel Egele, Nick Nikiforakis, Gianluca Stringhini

Abstract: Users on Twitter are commonly identified by their profile names. These names are used when directly addressing users on Twitter, are part of their profile page URLs, and can become a trademark for popular accounts, with people referring to celebrities by their real name and their profile name, interchangeably. Twitter, however, has chosen to not permanently link profile names to their correspondin… ▽ More Users on Twitter are commonly identified by their profile names. These names are used when directly addressing users on Twitter, are part of their profile page URLs, and can become a trademark for popular accounts, with people referring to celebrities by their real name and their profile name, interchangeably. Twitter, however, has chosen to not permanently link profile names to their corresponding user accounts. In fact, Twitter allows users to change their profile name, and afterwards makes the old profile names available for other users to take. In this paper, we provide a large-scale study of the phenomenon of profile name reuse on Twitter. We show that this phenomenon is not uncommon, investigate the dynamics of profile name reuse, and characterize the accounts that are involved in it. We find that many of these accounts adopt abandoned profile names for questionable purposes, such as spreading malicious content, and using the profile name's popularity for search engine optimization. Finally, we show that this problem is not unique to Twitter (as other popular online social networks also release profile names) and argue that the risks involved with profile-name reuse outnumber the advantages provided by this feature. △ Less

Submitted 14 February, 2017; originally announced February 2017.

Comments: International World Wide Web Conference 2017

arXiv:1612.04433 [pdf, other]

MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models

Authors: Enrico Mariconti, Lucky Onwuzurike, Panagiotis Andriotis, Emiliano De Cristofaro, Gordon Ross, Gianluca Stringhini

Abstract: The rise in popularity of the Android platform has resulted in an explosion of malware threats targeting it. As both Android malware and the operating system itself constantly evolve, it is very challenging to design robust malware mitigation techniques that can operate for long periods of time without the need for modifications or costly re-training. In this paper, we present MaMaDroid, an Androi… ▽ More The rise in popularity of the Android platform has resulted in an explosion of malware threats targeting it. As both Android malware and the operating system itself constantly evolve, it is very challenging to design robust malware mitigation techniques that can operate for long periods of time without the need for modifications or costly re-training. In this paper, we present MaMaDroid, an Android malware detection system that relies on app behavior. MaMaDroid builds a behavioral model, in the form of a Markov chain, from the sequence of abstracted API calls performed by an app, and uses it to extract features and perform classification. By abstracting calls to their packages or families, MaMaDroid maintains resilience to API changes and keeps the feature set size manageable. We evaluate its accuracy on a dataset of 8.5K benign and 35.5K malicious apps collected over a period of six years, showing that it not only effectively detects malware (with up to 99% F-measure), but also that the model built by the system keeps its detection capabilities for long periods of time (on average, 86% and 75% F-measure, respectively, one and two years after training). Finally, we compare against DroidAPIMiner, a state-of-the-art system that relies on the frequency of API calls performed by apps, showing that MaMaDroid significantly outperforms it. △ Less

Submitted 20 November, 2017; v1 submitted 13 December, 2016; originally announced December 2016.

Comments: This paper appears in the Proceedings of 24th Network and Distributed System Security Symposium (NDSS 2017). Some experiments have been slightly updated in this version

arXiv:1610.08469 [pdf, other]

Kissing Cuisines: Exploring Worldwide Culinary Habits on the Web

Authors: Sina Sajadmanesh, Sina Jafarzadeh, Seyed Ali Osia, Hamid R. Rabiee, Hamed Haddadi, Yelena Mejova, Mirco Musolesi, Emiliano De Cristofaro, Gianluca Stringhini

Abstract: Food and nutrition occupy an increasingly prevalent space on the web, and dishes and recipes shared online provide an invaluable mirror into culinary cultures and attitudes around the world. More specifically, ingredients, flavors, and nutrition information become strong signals of the taste preferences of individuals and civilizations. However, there is little understanding of these palate variet… ▽ More Food and nutrition occupy an increasingly prevalent space on the web, and dishes and recipes shared online provide an invaluable mirror into culinary cultures and attitudes around the world. More specifically, ingredients, flavors, and nutrition information become strong signals of the taste preferences of individuals and civilizations. However, there is little understanding of these palate varieties. In this paper, we present a large-scale study of recipes published on the web and their content, aiming to understand cuisines and culinary habits around the world. Using a database of more than 157K recipes from over 200 different cuisines, we analyze ingredients, flavors, and nutritional values which distinguish dishes from different regions, and use this knowledge to assess the predictability of recipes from different cuisines. We then use country health statistics to understand the relation between these factors and health indicators of different nations, such as obesity, diabetes, migration, and health expenditure. Our results confirm the strong effects of geographical and cultural similarities on recipes, health indicators, and culinary preferences across the globe. △ Less

Submitted 25 April, 2017; v1 submitted 26 October, 2016; originally announced October 2016.

Comments: In the Web Science Track of 26th International World Wide Web Conference (WWW 2017)

arXiv:1610.03452 [pdf, other]

Kek, Cucks, and God Emperor Trump: A Measurement Study of 4chan's Politically Incorrect Forum and Its Effects on the Web

Authors: Gabriel Emile Hine, Jeremiah Onaolapo, Emiliano De Cristofaro, Nicolas Kourtellis, Ilias Leontiadis, Riginos Samaras, Gianluca Stringhini, Jeremy Blackburn

Abstract: The discussion-board site 4chan has been part of the Internet's dark underbelly since its inception, and recent political events have put it increasingly in the spotlight. In particular, /pol/, the "Politically Incorrect" board, has been a central figure in the outlandish 2016 US election season, as it has often been linked to the alt-right movement and its rhetoric of hate and racism. However, 4c… ▽ More The discussion-board site 4chan has been part of the Internet's dark underbelly since its inception, and recent political events have put it increasingly in the spotlight. In particular, /pol/, the "Politically Incorrect" board, has been a central figure in the outlandish 2016 US election season, as it has often been linked to the alt-right movement and its rhetoric of hate and racism. However, 4chan remains relatively unstudied by the scientific community: little is known about its user base, the content it generates, and how it affects other parts of the Web. In this paper, we start addressing this gap by analyzing /pol/ along several axes, using a dataset of over 8M posts we collected over two and a half months. First, we perform a general characterization, showing that /pol/ users are well distributed around the world and that 4chan's unique features encourage fresh discussions. We also analyze content, finding, for instance, that YouTube links and hate speech are predominant on /pol/. Overall, our analysis not only provides the first measurement study of /pol/, but also insight into online harassment and hate speech trends in social media. △ Less

Submitted 1 October, 2017; v1 submitted 11 October, 2016; originally announced October 2016.

Comments: A shorter version of this paper appears in the Proceedings of the 11th International AAAI Conference on Web and Social Media (ICWSM'17). Please cite the ICWSM'17 paper. Corresponding author: [email protected]

arXiv:1607.00801 [pdf, other]

Honey Sheets: What Happens to Leaked Google Spreadsheets?

Authors: Martin Lazarov, Jeremiah Onaolapo, Gianluca Stringhini

Abstract: Cloud-based documents are inherently valuable, due to the volume and nature of sensitive personal and business content stored in them. Despite the importance of such documents to Internet users, there are still large gaps in the understanding of what cybercriminals do when they illicitly get access to them by for example compromising the account credentials they are associated with. In this paper,… ▽ More Cloud-based documents are inherently valuable, due to the volume and nature of sensitive personal and business content stored in them. Despite the importance of such documents to Internet users, there are still large gaps in the understanding of what cybercriminals do when they illicitly get access to them by for example compromising the account credentials they are associated with. In this paper, we present a system able to monitor user activity on Google spreadsheets. We populated 5 Google spreadsheets with fake bank account details and fake funds transfer links. Each spreadsheet was configured to report details of accesses and clicks on links back to us. To study how people interact with these spreadsheets in case they are leaked, we posted unique links pointing to the spreadsheets on a popular paste site. We then monitored activity in the accounts for 72 days, and observed 165 accesses in total. We were able to observe interesting modifications to these spreadsheets performed by illicit accesses. For instance, we observed deletion of some fake bank account information, in addition to insults and warnings that some visitors entered in some of the spreadsheets. Our preliminary results show that our system can be used to shed light on cybercriminal behavior with regards to leaked online documents. △ Less

Submitted 4 July, 2016; originally announced July 2016.

arXiv:1607.00117 [pdf, other]

All Your Cards Are Belong To Us: Understanding Online Carding Forums

Authors: Andreas Haslebacher, Jeremiah Onaolapo, Gianluca Stringhini

Abstract: Underground online forums are platforms that enable trades of illicit services and stolen goods. Carding forums, in particular, are known for being focused on trading financial information. However, little evidence exists about the sellers that are present on carding forums, the precise types of products they advertise, and the prices buyers pay. Existing literature mainly focuses on the organisat… ▽ More Underground online forums are platforms that enable trades of illicit services and stolen goods. Carding forums, in particular, are known for being focused on trading financial information. However, little evidence exists about the sellers that are present on carding forums, the precise types of products they advertise, and the prices buyers pay. Existing literature mainly focuses on the organisation and structure of the forums. Furthermore, studies on carding forums are usually based on literature review, expert interviews, or data from forums that have already been shut down. This paper provides first-of-its-kind empirical evidence on active forums where stolen financial data is traded. We monitored 5 out of 25 discovered forums, collected posts from the forums over a three-month period, and analysed them quantitatively and qualitatively. We focused our analyses on products, prices, seller prolificacy, seller specialisation, and seller reputation. △ Less

Submitted 24 January, 2017; v1 submitted 1 July, 2016; originally announced July 2016.

arXiv:1511.06090 [pdf, other]

Master of Puppets: Analyzing And Attacking A Botnet For Fun And Profit

Authors: Genki Saito, Gianluca Stringhini

Abstract: A botnet is a network of compromised machines (bots), under the control of an attacker. Many of these machines are infected without their owners' knowledge, and botnets are the driving force behind several misuses and criminal activities on the Internet (for example spam emails). Depending on its topology, a botnet can have zero or more command and control (C&C) servers, which are centralized mach… ▽ More A botnet is a network of compromised machines (bots), under the control of an attacker. Many of these machines are infected without their owners' knowledge, and botnets are the driving force behind several misuses and criminal activities on the Internet (for example spam emails). Depending on its topology, a botnet can have zero or more command and control (C&C) servers, which are centralized machines controlled by the cybercriminal that issue commands and receive reports back from the co-opted bots. In this paper, we present a comprehensive analysis of the command and control infrastructure of one of the world's largest proprietary spamming botnets between 2007 and 2012: Cutwail/Pushdo. We identify the key functionalities needed by a spamming botnet to operate effectively. We then develop a number of attacks against the command and control logic of Cutwail that target those functionalities, and make the spamming operations of the botnet less effective. This analysis was made possible by having access to the source code of the C&C software, as well as setting up our own Cutwail C&C server, and by implementing a clone of the Cutwail bot. With the help of this tool, we were able to enumerate the number of bots currently registered with the C&C server, impersonate an existing bot to report false information to the C&C server, and manipulate spamming statistics of an arbitrary bot stored in the C&C database. Furthermore, we were able to make the control server inaccessible by conducting a distributed denial of service (DDoS) attack. Our results may be used by law enforcement and practitioners to develop better techniques to mitigate and cripple other botnets, since many of findings are generic and are due to the workflow of C&C communication in general. △ Less

Submitted 19 November, 2015; originally announced November 2015.

arXiv:1509.03531 [pdf, ps, other]

Towards Detecting Compromised Accounts on Social Networks

Authors: Manuel Egele, Gianluca Stringhini, Christopher Kruegel, Giovanni Vigna

Abstract: Compromising social network accounts has become a profitable course of action for cybercriminals. By hijacking control of a popular media or business account, attackers can distribute their malicious messages or disseminate fake information to a large user base. The impacts of these incidents range from a tarnished reputation to multi-billion dollar monetary losses on financial markets. In our pre… ▽ More Compromising social network accounts has become a profitable course of action for cybercriminals. By hijacking control of a popular media or business account, attackers can distribute their malicious messages or disseminate fake information to a large user base. The impacts of these incidents range from a tarnished reputation to multi-billion dollar monetary losses on financial markets. In our previous work, we demonstrated how we can detect large-scale compromises (i.e., so-called campaigns) of regular online social network users. In this work, we show how we can use similar techniques to identify compromises of individual high-profile accounts. High-profile accounts frequently have one characteristic that makes this detection reliable -- they show consistent behavior over time. We show that our system, were it deployed, would have been able to detect and prevent three real-world attacks against popular companies and news agencies. Furthermore, our system, in contrast to popular media, would not have fallen for a staged compromise instigated by a US restaurant chain for publicity reasons. △ Less

Submitted 11 September, 2015; originally announced September 2015.

Journal ref: TDSC-2014-10-0271.R1

arXiv:1410.6629 [pdf, other]

That Ain't You: Detecting Spearphishing Emails Before They Are Sent

Authors: Gianluca Stringhini, Olivier Thonnard

Abstract: One of the ways in which attackers try to steal sensitive information from corporations is by sending spearphishing emails. This type of emails typically appear to be sent by one of the victim's coworkers, but have instead been crafted by an attacker. A particularly insidious type of spearphishing emails are the ones that do not only claim to come from a trusted party, but were actually sent from… ▽ More One of the ways in which attackers try to steal sensitive information from corporations is by sending spearphishing emails. This type of emails typically appear to be sent by one of the victim's coworkers, but have instead been crafted by an attacker. A particularly insidious type of spearphishing emails are the ones that do not only claim to come from a trusted party, but were actually sent from that party's legitimate email account that was compromised in the first place. In this paper, we propose a radical change of focus in the techniques used for detecting such malicious emails: instead of looking for particular features that are indicative of attack emails, we look for possible indicators of impersonation of the legitimate owners. We present IdentityMailer, a system that validates the authorship of emails by learning the typical email-sending behavior of users over time, and comparing any subsequent email sent from their accounts against this model. Our experiments on real world e-mail datasets demonstrate that our system can effectively block advanced email attacks sent from genuine email accounts, which traditional protection systems are unable to detect. Moreover, we show that it is resilient to an attacker willing to evade the system. To the best of our knowledge, IdentityMailer is the first system able to identify spearphishing emails that are sent from within an organization, by a skilled attacker having access to a compromised email account. △ Less

Submitted 24 October, 2014; originally announced October 2014.

Showing 51–73 of 73 results for author: Stringhini, G