-
A Decade of Mal-Activity Reporting: A Retrospective Analysis of Internet Malicious Activity Blacklists
Authors:
Benjamin Zi Hao Zhao,
Muhammad Ikram,
Hassan Jameel Asghar,
Mohamed Ali Kaafar,
Abdelberi Chaabane,
Kanchana Thilakarathna
Abstract:
This paper focuses on reporting of Internet malicious activity (or mal-activity in short) by public blacklists with the objective of providing a systematic characterization of what has been reported over the years, and more importantly, the evolution of reported activities. Using an initial seed of 22 blacklists, covering the period from January 2007 to June 2017, we collect more than 51 million m…
▽ More
This paper focuses on reporting of Internet malicious activity (or mal-activity in short) by public blacklists with the objective of providing a systematic characterization of what has been reported over the years, and more importantly, the evolution of reported activities. Using an initial seed of 22 blacklists, covering the period from January 2007 to June 2017, we collect more than 51 million mal-activity reports involving 662K unique IP addresses worldwide. Leveraging the Wayback Machine, antivirus (AV) tool reports and several additional public datasets (e.g., BGP Route Views and Internet registries) we enrich the data with historical meta-information including geo-locations (countries), autonomous system (AS) numbers and types of mal-activity. Furthermore, we use the initially labelled dataset of approx 1.57 million mal-activities (obtained from public blacklists) to train a machine learning classifier to classify the remaining unlabeled dataset of approx 44 million mal-activities obtained through additional sources. We make our unique collected dataset (and scripts used) publicly available for further research.
The main contributions of the paper are a novel means of report collection, with a machine learning approach to classify reported activities, characterization of the dataset and, most importantly, temporal analysis of mal-activity reporting behavior. Inspired by P2P behavior modeling, our analysis shows that some classes of mal-activities (e.g., phishing) and a small number of mal-activity sources are persistent, suggesting that either blacklist-based prevention systems are ineffective or have unreasonably long update periods. Our analysis also indicates that resources can be better utilized by focusing on heavy mal-activity contributors, which constitute the bulk of mal-activities.
△ Less
Submitted 23 April, 2019;
originally announced April 2019.
-
Thou Shalt Not Depend on Me: Analysing the Use of Outdated JavaScript Libraries on the Web
Authors:
Tobias Lauinger,
Abdelberi Chaabane,
Sajjad Arshad,
William Robertson,
Christo Wilson,
Engin Kirda
Abstract:
Web developers routinely rely on third-party Java-Script libraries such as jQuery to enhance the functionality of their sites. However, if not properly maintained, such dependencies can create attack vectors allowing a site to be compromised.
In this paper, we conduct the first comprehensive study of client-side JavaScript library usage and the resulting security implications across the Web. Usi…
▽ More
Web developers routinely rely on third-party Java-Script libraries such as jQuery to enhance the functionality of their sites. However, if not properly maintained, such dependencies can create attack vectors allowing a site to be compromised.
In this paper, we conduct the first comprehensive study of client-side JavaScript library usage and the resulting security implications across the Web. Using data from over 133 k websites, we show that 37% of them include at least one library with a known vulnerability; the time lag behind the newest release of a library is measured in the order of years. In order to better understand why websites use so many vulnerable or outdated libraries, we track causal inclusion relationships and quantify different scenarios. We observe sites including libraries in ad hoc and often transitive ways, which can lead to different versions of the same library being loaded into the same document at the same time. Furthermore, we find that libraries included transitively, or via ad and tracking code, are more likely to be vulnerable. This demonstrates that not only website administrators, but also the dynamic architecture and developers of third-party services are to blame for the Web's poor state of library management.
The results of our work underline the need for more thorough approaches to dependency management, code maintenance and third-party code inclusion on the Web.
△ Less
Submitted 13 February, 2020; v1 submitted 2 November, 2018;
originally announced November 2018.
-
Censorship in the Wild: Analyzing Internet Filtering in Syria
Authors:
Abdelberi Chaabane,
Terence Chen,
Mathieu Cunche,
Emiliano De Cristofaro,
Arik Friedman,
Mohamed Ali Kaafar
Abstract:
Internet censorship is enforced by numerous governments worldwide, however, due to the lack of publicly available information, as well as the inherent risks of performing active measurements, it is often hard for the research community to investigate censorship practices in the wild. Thus, the leak of 600GB worth of logs from 7 Blue Coat SG-9000 proxies, deployed in Syria to filter Internet traffi…
▽ More
Internet censorship is enforced by numerous governments worldwide, however, due to the lack of publicly available information, as well as the inherent risks of performing active measurements, it is often hard for the research community to investigate censorship practices in the wild. Thus, the leak of 600GB worth of logs from 7 Blue Coat SG-9000 proxies, deployed in Syria to filter Internet traffic at a country scale, represents a unique opportunity to provide a detailed snapshot of a real-world censorship ecosystem. This paper presents the methodology and the results of a measurement analysis of the leaked Blue Coat logs, revealing a relatively stealthy, yet quite targeted, censorship. We find that traffic is filtered in several ways: using IP addresses and domain names to block subnets or websites, and keywords or categories to target specific content. We show that keyword-based censorship produces some collateral damage as many requests are blocked even if they do not relate to sensitive content. We also discover that Instant Messaging is heavily censored, while filtering of social media is limited to specific pages. Finally, we show that Syrian users try to evade censorship by using web/socks proxies, Tor, VPNs, and BitTorrent. To the best of our knowledge, our work provides the first analytical look into Internet filtering in Syria.
△ Less
Submitted 5 November, 2014; v1 submitted 14 February, 2014;
originally announced February 2014.
-
When Privacy meets Security: Leveraging personal information for password cracking
Authors:
Claude Castelluccia,
Abdelberi Chaabane,
Markus Dürmuth,
Daniele Perito
Abstract:
Passwords are widely used for user authentication and, despite their weaknesses, will likely remain in use in the foreseeable future. Human-generated passwords typically have a rich structure, which makes them susceptible to guessing attacks. In this paper, we study the effectiveness of guessing attacks based on Markov models. Our contributions are two-fold. First, we propose a novel password crac…
▽ More
Passwords are widely used for user authentication and, despite their weaknesses, will likely remain in use in the foreseeable future. Human-generated passwords typically have a rich structure, which makes them susceptible to guessing attacks. In this paper, we study the effectiveness of guessing attacks based on Markov models. Our contributions are two-fold. First, we propose a novel password cracker based on Markov models, which builds upon and extends ideas used by Narayanan and Shmatikov (CCS 2005). In extensive experiments we show that it can crack up to 69% of passwords at 10 billion guesses, more than all probabilistic password crackers we compared again t. Second, we systematically analyze the idea that additional personal information about a user helps in speeding up password guessing. We find that, on average and by carefully choosing parameters, we can guess up to 5% more passwords, especially when the number of attempts is low. Furthermore, we show that the gain can go up to 30% for passwords that are actually based on personal attributes. These passwords are clearly weaker and should be avoided. Our cracker could be used by an organization to detect and reject them. To the best of our knowledge, we are the first to systematically study the relationship between chosen passwords and users' personal information. We test and validate our results over a wide collection of leaked password databases.
△ Less
Submitted 24 April, 2013;
originally announced April 2013.
-
Privacy in Content-Oriented Networking: Threats and Countermeasures
Authors:
Abdelberi Chaabane,
Emiliano De Cristofaro,
Mohammed-Ali Kaafar,
Ersin Uzun
Abstract:
As the Internet struggles to cope with scalability, mobility, and security issues, new network architectures are being proposed to better accommodate the needs of modern systems and applications. In particular, Content-Oriented Networking (CON) has emerged as a promising next-generation Internet architecture: it sets to decouple content from hosts, at the network layer, by naming data rather than…
▽ More
As the Internet struggles to cope with scalability, mobility, and security issues, new network architectures are being proposed to better accommodate the needs of modern systems and applications. In particular, Content-Oriented Networking (CON) has emerged as a promising next-generation Internet architecture: it sets to decouple content from hosts, at the network layer, by naming data rather than hosts. CON comes with a potential for a wide range of benefits, including reduced congestion and improved delivery speed by means of content caching, simpler configuration of network devices, and security at the data level. However, it remains an interesting open question whether or not, and to what extent, this emerging networking paradigm bears new privacy challenges. In this paper, we provide a systematic privacy analysis of CON and the common building blocks among its various architectural instances in order to highlight emerging privacy threats, and analyze a few potential countermeasures. Finally, we present a comparison between CON and today's Internet in the context of a few privacy concepts, such as, anonymity, censoring, traceability, and confidentiality.
△ Less
Submitted 30 July, 2013; v1 submitted 21 November, 2012;
originally announced November 2012.
-
De-anonymizing BitTorrent Users on Tor
Authors:
Stevens Le Blond,
Pere Manils,
Abdelberi Chaabane,
Mohamed Ali Kaafar,
Arnaud Legout,
Claude Castellucia,
Walid Dabbous
Abstract:
Some BitTorrent users are running BitTorrent on top of Tor to preserve their privacy. In this extended abstract, we discuss three different attacks to reveal the IP address of BitTorrent users on top of Tor. In addition, we exploit the multiplexing of streams from different applications into the same circuit to link non-BitTorrent applications to revealed IP addresses.
Some BitTorrent users are running BitTorrent on top of Tor to preserve their privacy. In this extended abstract, we discuss three different attacks to reveal the IP address of BitTorrent users on top of Tor. In addition, we exploit the multiplexing of streams from different applications into the same circuit to link non-BitTorrent applications to revealed IP addresses.
△ Less
Submitted 8 April, 2010;
originally announced April 2010.