Search | arXiv e-print repository

Measuring and Modeling the Free Content Web

Authors: Abdulrahman Alabduljabbar, Runyu Ma, Ahmed Abusnaina, Rhongho Jang, Songqing Chen, DaeHun Nyang, and David Mohaisen

Abstract: Free content websites that provide free books, music, games, movies, etc., have existed on the Internet for many years. While it is a common belief that such websites might be different from premium websites providing the same content types, an analysis that supports this belief is lacking in the literature. In particular, it is unclear if those websites are as safe as their premium counterparts.… ▽ More Free content websites that provide free books, music, games, movies, etc., have existed on the Internet for many years. While it is a common belief that such websites might be different from premium websites providing the same content types, an analysis that supports this belief is lacking in the literature. In particular, it is unclear if those websites are as safe as their premium counterparts. In this paper, we set out to investigate, by analysis and quantification, the similarities and differences between free content and premium websites, including their risk profiles. To conduct this analysis, we assembled a list of 834 free content websites offering books, games, movies, music, and software, and 728 premium websites offering content of the same type. We then contribute domain-, content-, and risk-level analysis, examining and contrasting the websites' domain names, creation times, SSL certificates, HTTP requests, page size, average load time, and content type. For risk analysis, we consider and examine the maliciousness of these websites at the website- and component-level. Among other interesting findings, we show that free content websites tend to be vastly distributed across the TLDs and exhibit more dynamics with an upward trend for newly registered domains. Moreover, the free content websites are 4.5 times more likely to utilize an expired certificate, 19 times more likely to be malicious at the website level, and 2.64 times more likely to be malicious at the component level. Encouraged by the clear differences between the two types of websites, we explore the automation and generalization of the risk modeling of the free content risky websites, showing that a simple machine learning-based technique can produce 86.81\% accuracy in identifying them. △ Less

Submitted 26 April, 2023; originally announced April 2023.

Comments: 30 pages, 3 tables, 9 figures. Under review by Computer Networks

arXiv:2304.13278 [pdf, other]

Understanding the Security and Performance of the Web Presence of Hospitals: A Measurement Study

Authors: Mohammed Alkinoon, Abdulrahman Alabduljabbar, Hattan Althebeiti, Rhongho Jang, DaeHun Nyang, David Mohaisen

Abstract: Using a total of 4,774 hospitals categorized as government, non-profit, and proprietary hospitals, this study provides the first measurement-based analysis of hospitals' websites and connects the findings with data breaches through a correlation analysis. We study the security attributes of three categories, collectively and in contrast, against domain name, content, and SSL certificate-level feat… ▽ More Using a total of 4,774 hospitals categorized as government, non-profit, and proprietary hospitals, this study provides the first measurement-based analysis of hospitals' websites and connects the findings with data breaches through a correlation analysis. We study the security attributes of three categories, collectively and in contrast, against domain name, content, and SSL certificate-level features. We find that each type of hospital has a distinctive characteristic of its utilization of domain name registrars, top-level domain distribution, and domain creation distribution, as well as content type and HTTP request features. Security-wise, and consistent with the general population of websites, only 1\% of government hospitals utilized DNSSEC, in contrast to 6\% of the proprietary hospitals. Alarmingly, we found that 25\% of the hospitals used plain HTTP, in contrast to 20\% in the general web population. Alarmingly too, we found that 8\%-84\% of the hospitals, depending on their type, had some malicious contents, which are mostly attributed to the lack of maintenance. We conclude with a correlation analysis against 414 confirmed and manually vetted hospitals' data breaches. Among other interesting findings, our study highlights that the security attributes highlighted in our analysis of hospital websites are forming a very strong indicator of their likelihood of being breached. Our analyses are the first step towards understanding patient online privacy, highlighting the lack of basic security in many hospitals' websites and opening various potential research directions. △ Less

Submitted 26 April, 2023; originally announced April 2023.

Comments: 10 pages, 5 tables, 10 figures

arXiv:2210.12083 [pdf, other]

Do Content Management Systems Impact the Security of Free Content Websites? A Correlation Analysis

Authors: Mohammed Alaqdhi, Abdulrahman Alabduljabbar, Kyle Thomas, Saeed Salem, DaeHun Nyang, David Mohaisen

Abstract: This paper investigates the potential causes of the vulnerabilities of free content websites to address risks and maliciousness. Assembling more than 1,500 websites with free and premium content, we identify their content management system (CMS) and malicious attributes. We use frequency analysis at both the aggregate and per category of content (books, games, movies, music, and software), utilizi… ▽ More This paper investigates the potential causes of the vulnerabilities of free content websites to address risks and maliciousness. Assembling more than 1,500 websites with free and premium content, we identify their content management system (CMS) and malicious attributes. We use frequency analysis at both the aggregate and per category of content (books, games, movies, music, and software), utilizing the unpatched vulnerabilities, total vulnerabilities, malicious count, and percentiles to uncover trends and affinities of usage and maliciousness of CMS{'s} and their contribution to those websites. Moreover, we find that, despite the significant number of custom code websites, the use of CMS{'s} is pervasive, with varying trends across types and categories. Finally, we find that even a small number of unpatched vulnerabilities in popular CMS{'s} could be a potential cause for significant maliciousness. △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: 7 pages, 1 figure, 6 tables

arXiv:2108.13373 [pdf, other]

ML-based IoT Malware Detection Under Adversarial Settings: A Systematic Evaluation

Authors: Ahmed Abusnaina, Afsah Anwar, Sultan Alshamrani, Abdulrahman Alabduljabbar, RhongHo Jang, Daehun Nyang, David Mohaisen

Abstract: The rapid growth of the Internet of Things (IoT) devices is paralleled by them being on the front-line of malicious attacks. This has led to an explosion in the number of IoT malware, with continued mutations, evolution, and sophistication. These malicious software are detected using machine learning (ML) algorithms alongside the traditional signature-based methods. Although ML-based detectors imp… ▽ More The rapid growth of the Internet of Things (IoT) devices is paralleled by them being on the front-line of malicious attacks. This has led to an explosion in the number of IoT malware, with continued mutations, evolution, and sophistication. These malicious software are detected using machine learning (ML) algorithms alongside the traditional signature-based methods. Although ML-based detectors improve the detection performance, they are susceptible to malware evolution and sophistication, making them limited to the patterns that they have been trained upon. This continuous trend motivates the large body of literature on malware analysis and detection research, with many systems emerging constantly, and outperforming their predecessors. In this work, we systematically examine the state-of-the-art malware detection approaches, that utilize various representation and learning techniques, under a range of adversarial settings. Our analyses highlight the instability of the proposed detectors in learning patterns that distinguish the benign from the malicious software. The results exhibit that software mutations with functionality-preserving operations, such as strip** and padding, significantly deteriorate the accuracy of such detectors. Additionally, our analysis of the industry-standard malware detectors shows their instability to the malware mutations. △ Less

Submitted 30 August, 2021; originally announced August 2021.

Comments: 11 pages

arXiv:2103.14221 [pdf, other]

ShellCore: Automating Malicious IoT Software Detection by Using Shell Commands Representation

Authors: Hisham Alasmary, Afsah Anwar, Ahmed Abusnaina, Abdulrahman Alabduljabbar, Mohammad Abuhamad, An Wang, DaeHun Nyang, Amro Awad, David Mohaisen

Abstract: The Linux shell is a command-line interpreter that provides users with a command interface to the operating system, allowing them to perform a variety of functions. Although very useful in building capabilities at the edge, the Linux shell can be exploited, giving adversaries a prime opportunity to use them for malicious activities. With access to IoT devices, malware authors can abuse the Linux s… ▽ More The Linux shell is a command-line interpreter that provides users with a command interface to the operating system, allowing them to perform a variety of functions. Although very useful in building capabilities at the edge, the Linux shell can be exploited, giving adversaries a prime opportunity to use them for malicious activities. With access to IoT devices, malware authors can abuse the Linux shell of those devices to propagate infections and launch large-scale attacks, e.g., DDoS. In this work, we provide a first look at shell commands used in Linux-based IoT malware towards detection. We analyze malicious shell commands found in IoT malware and build a neural network-based model, ShellCore, to detect malicious shell commands. Namely, we collected a large dataset of shell commands, including malicious commands extracted from 2,891 IoT malware samples and benign commands collected from real-world network traffic analysis and volunteered data from Linux users. Using conventional machine and deep learning-based approaches trained with term- and character-level features, ShellCore is shown to achieve an accuracy of more than 99% in detecting malicious shell commands and files (i.e., binaries). △ Less

Submitted 25 March, 2021; originally announced March 2021.

arXiv:2103.14217 [pdf, other]

Understanding Internet of Things Malware by Analyzing Endpoints in their Static Artifacts

Authors: Afsah Anwar, **chun Choi, Abdulrahman Alabduljabbar, Hisham Alasmary, Jeffrey Spaulding, An Wang, Songqing Chen, DaeHun Nyang, Amro Awad, David Mohaisen

Abstract: The lack of security measures among the Internet of Things (IoT) devices and their persistent online connection gives adversaries a prime opportunity to target them or even abuse them as intermediary targets in larger attacks such as distributed denial-of-service (DDoS) campaigns. In this paper, we analyze IoT malware and focus on the endpoints reachable on the public Internet, that play an essent… ▽ More The lack of security measures among the Internet of Things (IoT) devices and their persistent online connection gives adversaries a prime opportunity to target them or even abuse them as intermediary targets in larger attacks such as distributed denial-of-service (DDoS) campaigns. In this paper, we analyze IoT malware and focus on the endpoints reachable on the public Internet, that play an essential part in the IoT malware ecosystem. Namely, we analyze endpoints acting as dropzones and their targets to gain insights into the underlying dynamics in this ecosystem, such as the affinity between the dropzones and their target IP addresses, and the different patterns among endpoints. Towards this goal, we reverse-engineer 2,423 IoT malware samples and extract strings from them to obtain IP addresses. We further gather information about these endpoints from public Internet-wide scanners, such as Shodan and Censys. For the masked IP addresses, we examine the Classless Inter-Domain Routing (CIDR) networks accumulating to more than 100 million (78.2% of total active public IPv4 addresses) endpoints. Our investigation from four different perspectives provides profound insights into the role of endpoints in IoT malware attacks, which deepens our understanding of IoT malware ecosystems and can assist future defenses. △ Less

Submitted 25 March, 2021; originally announced March 2021.

Showing 1–6 of 6 results for author: Alabduljabbar, A