-
Untargeted Code Authorship Evasion with Seq2Seq Transformation
Authors:
Soohyeon Choi,
Rhongho Jang,
DaeHun Nyang,
David Mohaisen
Abstract:
Code authorship attribution is the problem of identifying authors of programming language codes through the stylistic features in their codes, a topic that recently witnessed significant interest with outstanding performance. In this work, we present SCAE, a code authorship obfuscation technique that leverages a Seq2Seq code transformer called StructCoder. SCAE customizes StructCoder, a system des…
▽ More
Code authorship attribution is the problem of identifying authors of programming language codes through the stylistic features in their codes, a topic that recently witnessed significant interest with outstanding performance. In this work, we present SCAE, a code authorship obfuscation technique that leverages a Seq2Seq code transformer called StructCoder. SCAE customizes StructCoder, a system designed initially for function-level code translation from one language to another (e.g., Java to C#), using transfer learning. SCAE improved the efficiency at a slight accuracy degradation compared to existing work. We also reduced the processing time by about 68% while maintaining an 85% transformation success rate and up to 95.77% evasion success rate in the untargeted setting.
△ Less
Submitted 26 November, 2023;
originally announced November 2023.
-
Measuring and Modeling the Free Content Web
Authors:
Abdulrahman Alabduljabbar,
Runyu Ma,
Ahmed Abusnaina,
Rhongho Jang,
Songqing Chen,
DaeHun Nyang,
and David Mohaisen
Abstract:
Free content websites that provide free books, music, games, movies, etc., have existed on the Internet for many years. While it is a common belief that such websites might be different from premium websites providing the same content types, an analysis that supports this belief is lacking in the literature. In particular, it is unclear if those websites are as safe as their premium counterparts.…
▽ More
Free content websites that provide free books, music, games, movies, etc., have existed on the Internet for many years. While it is a common belief that such websites might be different from premium websites providing the same content types, an analysis that supports this belief is lacking in the literature. In particular, it is unclear if those websites are as safe as their premium counterparts. In this paper, we set out to investigate, by analysis and quantification, the similarities and differences between free content and premium websites, including their risk profiles. To conduct this analysis, we assembled a list of 834 free content websites offering books, games, movies, music, and software, and 728 premium websites offering content of the same type. We then contribute domain-, content-, and risk-level analysis, examining and contrasting the websites' domain names, creation times, SSL certificates, HTTP requests, page size, average load time, and content type. For risk analysis, we consider and examine the maliciousness of these websites at the website- and component-level. Among other interesting findings, we show that free content websites tend to be vastly distributed across the TLDs and exhibit more dynamics with an upward trend for newly registered domains. Moreover, the free content websites are 4.5 times more likely to utilize an expired certificate, 19 times more likely to be malicious at the website level, and 2.64 times more likely to be malicious at the component level. Encouraged by the clear differences between the two types of websites, we explore the automation and generalization of the risk modeling of the free content risky websites, showing that a simple machine learning-based technique can produce 86.81\% accuracy in identifying them.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
Understanding the Security and Performance of the Web Presence of Hospitals: A Measurement Study
Authors:
Mohammed Alkinoon,
Abdulrahman Alabduljabbar,
Hattan Althebeiti,
Rhongho Jang,
DaeHun Nyang,
David Mohaisen
Abstract:
Using a total of 4,774 hospitals categorized as government, non-profit, and proprietary hospitals, this study provides the first measurement-based analysis of hospitals' websites and connects the findings with data breaches through a correlation analysis. We study the security attributes of three categories, collectively and in contrast, against domain name, content, and SSL certificate-level feat…
▽ More
Using a total of 4,774 hospitals categorized as government, non-profit, and proprietary hospitals, this study provides the first measurement-based analysis of hospitals' websites and connects the findings with data breaches through a correlation analysis. We study the security attributes of three categories, collectively and in contrast, against domain name, content, and SSL certificate-level features. We find that each type of hospital has a distinctive characteristic of its utilization of domain name registrars, top-level domain distribution, and domain creation distribution, as well as content type and HTTP request features. Security-wise, and consistent with the general population of websites, only 1\% of government hospitals utilized DNSSEC, in contrast to 6\% of the proprietary hospitals. Alarmingly, we found that 25\% of the hospitals used plain HTTP, in contrast to 20\% in the general web population. Alarmingly too, we found that 8\%-84\% of the hospitals, depending on their type, had some malicious contents, which are mostly attributed to the lack of maintenance.
We conclude with a correlation analysis against 414 confirmed and manually vetted hospitals' data breaches. Among other interesting findings, our study highlights that the security attributes highlighted in our analysis of hospital websites are forming a very strong indicator of their likelihood of being breached. Our analyses are the first step towards understanding patient online privacy, highlighting the lack of basic security in many hospitals' websites and opening various potential research directions.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
SHIELD: Thwarting Code Authorship Attribution
Authors:
Mohammed Abuhamad,
Changhun Jung,
David Mohaisen,
DaeHun Nyang
Abstract:
Authorship attribution has become increasingly accurate, posing a serious privacy risk for programmers who wish to remain anonymous. In this paper, we introduce SHIELD to examine the robustness of different code authorship attribution approaches against adversarial code examples. We define four attacks on attribution techniques, which include targeted and non-targeted attacks, and realize them usi…
▽ More
Authorship attribution has become increasingly accurate, posing a serious privacy risk for programmers who wish to remain anonymous. In this paper, we introduce SHIELD to examine the robustness of different code authorship attribution approaches against adversarial code examples. We define four attacks on attribution techniques, which include targeted and non-targeted attacks, and realize them using adversarial code perturbation. We experiment with a dataset of 200 programmers from the Google Code Jam competition to validate our methods targeting six state-of-the-art authorship attribution methods that adopt a variety of techniques for extracting authorship traits from source-code, including RNN, CNN, and code stylometry. Our experiments demonstrate the vulnerability of current authorship attribution methods against adversarial attacks. For the non-targeted attack, our experiments demonstrate the vulnerability of current authorship attribution methods against the attack with an attack success rate exceeds 98.5\% accompanied by a degradation of the identification confidence that exceeds 13\%. For the targeted attacks, we show the possibility of impersonating a programmer using targeted-adversarial perturbations with a success rate ranging from 66\% to 88\% for different authorship attribution techniques under several adversarial scenarios.
△ Less
Submitted 25 April, 2023;
originally announced April 2023.
-
Do Content Management Systems Impact the Security of Free Content Websites? A Correlation Analysis
Authors:
Mohammed Alaqdhi,
Abdulrahman Alabduljabbar,
Kyle Thomas,
Saeed Salem,
DaeHun Nyang,
David Mohaisen
Abstract:
This paper investigates the potential causes of the vulnerabilities of free content websites to address risks and maliciousness. Assembling more than 1,500 websites with free and premium content, we identify their content management system (CMS) and malicious attributes. We use frequency analysis at both the aggregate and per category of content (books, games, movies, music, and software), utilizi…
▽ More
This paper investigates the potential causes of the vulnerabilities of free content websites to address risks and maliciousness. Assembling more than 1,500 websites with free and premium content, we identify their content management system (CMS) and malicious attributes. We use frequency analysis at both the aggregate and per category of content (books, games, movies, music, and software), utilizing the unpatched vulnerabilities, total vulnerabilities, malicious count, and percentiles to uncover trends and affinities of usage and maliciousness of CMS{'s} and their contribution to those websites. Moreover, we find that, despite the significant number of custom code websites, the use of CMS{'s} is pervasive, with varying trends across types and categories. Finally, we find that even a small number of unpatched vulnerabilities in popular CMS{'s} could be a potential cause for significant maliciousness.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Robust Natural Language Processing: Recent Advances, Challenges, and Future Directions
Authors:
Marwan Omar,
Soohyeon Choi,
DaeHun Nyang,
David Mohaisen
Abstract:
Recent natural language processing (NLP) techniques have accomplished high performance on benchmark datasets, primarily due to the significant improvement in the performance of deep learning. The advances in the research community have led to great enhancements in state-of-the-art production systems for NLP tasks, such as virtual assistants, speech recognition, and sentiment analysis. However, suc…
▽ More
Recent natural language processing (NLP) techniques have accomplished high performance on benchmark datasets, primarily due to the significant improvement in the performance of deep learning. The advances in the research community have led to great enhancements in state-of-the-art production systems for NLP tasks, such as virtual assistants, speech recognition, and sentiment analysis. However, such NLP systems still often fail when tested with adversarial attacks. The initial lack of robustness exposed troubling gaps in current models' language understanding capabilities, creating problems when NLP systems are deployed in real life. In this paper, we present a structured overview of NLP robustness research by summarizing the literature in a systemic way across various dimensions. We then take a deep-dive into the various dimensions of robustness, across techniques, metrics, embeddings, and benchmarks. Finally, we argue that robustness should be multi-dimensional, provide insights into current research, identify gaps in the literature to suggest directions worth pursuing to address these gaps.
△ Less
Submitted 3 January, 2022;
originally announced January 2022.
-
Count-Less: A Counting Sketch for the Data Plane of High Speed Switches
Authors:
SunYoung Kim,
Changhun Jung,
RhongHo Jang,
David Mohaisen,
DaeHun Nyang
Abstract:
Demands are increasing to measure per-flow statistics in the data plane of high-speed switches. Measuring flows with exact counting is infeasible due to processing and memory constraints, but a sketch is a promising candidate for collecting approximately per-flow statistics in data plane in real-time. Among them, Count-Min sketch is a versatile tool to measure spectral density of high volume data…
▽ More
Demands are increasing to measure per-flow statistics in the data plane of high-speed switches. Measuring flows with exact counting is infeasible due to processing and memory constraints, but a sketch is a promising candidate for collecting approximately per-flow statistics in data plane in real-time. Among them, Count-Min sketch is a versatile tool to measure spectral density of high volume data using a small amount of memory and low processing overhead. Due to its simplicity and versatility, Count-Min sketch and its variants have been adopted in many works as a stand alone or even as a supporting measurement tool. However, Count-Min's estimation accuracy is limited owing to its data structure not fully accommodating Zipfian distribution and the indiscriminate update algorithm without considering a counter value. This in turn degrades the accuracy of heavy hitter, heavy changer, cardinality, and entropy. To enhance measurement accuracy of Count-Min, there have been many and various attempts. One of the most notable approaches is to cascade multiple sketches in a sequential manner so that either mouse or elephant flows should be filtered to separate elephants from mouse flows such as Elastic sketch (an elephant filter leveraging TCAM + Count-Min) and FCM sketch (Count-Min-based layered mouse filters). In this paper, we first show that these cascaded filtering approaches adopting a Pyramid-shaped data structure (allocating more counters for mouse flows) still suffer from under-utilization of memory, which gives us a room for better estimation. To this end, we are facing two challenges: one is (a) how to make Count-Min's data structure accommodate more effectively Zipfian distribution, and the other is (b) how to make update and query work without delaying packet processing in the switch's data plane. Count-Less adopts a different combination ...
△ Less
Submitted 4 November, 2021;
originally announced November 2021.
-
ML-based IoT Malware Detection Under Adversarial Settings: A Systematic Evaluation
Authors:
Ahmed Abusnaina,
Afsah Anwar,
Sultan Alshamrani,
Abdulrahman Alabduljabbar,
RhongHo Jang,
Daehun Nyang,
David Mohaisen
Abstract:
The rapid growth of the Internet of Things (IoT) devices is paralleled by them being on the front-line of malicious attacks. This has led to an explosion in the number of IoT malware, with continued mutations, evolution, and sophistication. These malicious software are detected using machine learning (ML) algorithms alongside the traditional signature-based methods. Although ML-based detectors imp…
▽ More
The rapid growth of the Internet of Things (IoT) devices is paralleled by them being on the front-line of malicious attacks. This has led to an explosion in the number of IoT malware, with continued mutations, evolution, and sophistication. These malicious software are detected using machine learning (ML) algorithms alongside the traditional signature-based methods. Although ML-based detectors improve the detection performance, they are susceptible to malware evolution and sophistication, making them limited to the patterns that they have been trained upon. This continuous trend motivates the large body of literature on malware analysis and detection research, with many systems emerging constantly, and outperforming their predecessors. In this work, we systematically examine the state-of-the-art malware detection approaches, that utilize various representation and learning techniques, under a range of adversarial settings. Our analyses highlight the instability of the proposed detectors in learning patterns that distinguish the benign from the malicious software. The results exhibit that software mutations with functionality-preserving operations, such as strip** and padding, significantly deteriorate the accuracy of such detectors. Additionally, our analysis of the industry-standard malware detectors shows their instability to the malware mutations.
△ Less
Submitted 30 August, 2021;
originally announced August 2021.
-
ShellCore: Automating Malicious IoT Software Detection by Using Shell Commands Representation
Authors:
Hisham Alasmary,
Afsah Anwar,
Ahmed Abusnaina,
Abdulrahman Alabduljabbar,
Mohammad Abuhamad,
An Wang,
DaeHun Nyang,
Amro Awad,
David Mohaisen
Abstract:
The Linux shell is a command-line interpreter that provides users with a command interface to the operating system, allowing them to perform a variety of functions. Although very useful in building capabilities at the edge, the Linux shell can be exploited, giving adversaries a prime opportunity to use them for malicious activities. With access to IoT devices, malware authors can abuse the Linux s…
▽ More
The Linux shell is a command-line interpreter that provides users with a command interface to the operating system, allowing them to perform a variety of functions. Although very useful in building capabilities at the edge, the Linux shell can be exploited, giving adversaries a prime opportunity to use them for malicious activities. With access to IoT devices, malware authors can abuse the Linux shell of those devices to propagate infections and launch large-scale attacks, e.g., DDoS. In this work, we provide a first look at shell commands used in Linux-based IoT malware towards detection. We analyze malicious shell commands found in IoT malware and build a neural network-based model, ShellCore, to detect malicious shell commands. Namely, we collected a large dataset of shell commands, including malicious commands extracted from 2,891 IoT malware samples and benign commands collected from real-world network traffic analysis and volunteered data from Linux users. Using conventional machine and deep learning-based approaches trained with term- and character-level features, ShellCore is shown to achieve an accuracy of more than 99% in detecting malicious shell commands and files (i.e., binaries).
△ Less
Submitted 25 March, 2021;
originally announced March 2021.
-
Understanding Internet of Things Malware by Analyzing Endpoints in their Static Artifacts
Authors:
Afsah Anwar,
**chun Choi,
Abdulrahman Alabduljabbar,
Hisham Alasmary,
Jeffrey Spaulding,
An Wang,
Songqing Chen,
DaeHun Nyang,
Amro Awad,
David Mohaisen
Abstract:
The lack of security measures among the Internet of Things (IoT) devices and their persistent online connection gives adversaries a prime opportunity to target them or even abuse them as intermediary targets in larger attacks such as distributed denial-of-service (DDoS) campaigns. In this paper, we analyze IoT malware and focus on the endpoints reachable on the public Internet, that play an essent…
▽ More
The lack of security measures among the Internet of Things (IoT) devices and their persistent online connection gives adversaries a prime opportunity to target them or even abuse them as intermediary targets in larger attacks such as distributed denial-of-service (DDoS) campaigns. In this paper, we analyze IoT malware and focus on the endpoints reachable on the public Internet, that play an essential part in the IoT malware ecosystem. Namely, we analyze endpoints acting as dropzones and their targets to gain insights into the underlying dynamics in this ecosystem, such as the affinity between the dropzones and their target IP addresses, and the different patterns among endpoints. Towards this goal, we reverse-engineer 2,423 IoT malware samples and extract strings from them to obtain IP addresses. We further gather information about these endpoints from public Internet-wide scanners, such as Shodan and Censys. For the masked IP addresses, we examine the Classless Inter-Domain Routing (CIDR) networks accumulating to more than 100 million (78.2% of total active public IPv4 addresses) endpoints. Our investigation from four different perspectives provides profound insights into the role of endpoints in IoT malware attacks, which deepens our understanding of IoT malware ecosystems and can assist future defenses.
△ Less
Submitted 25 March, 2021;
originally announced March 2021.
-
Hate, Obscenity, and Insults: Measuring the Exposure of Children to Inappropriate Comments in YouTube
Authors:
Sultan Alshamrani,
Ahmed Abusnaina,
Mohammed Abuhamad,
Daehun Nyang,
David Mohaisen
Abstract:
Social media has become an essential part of the daily routines of children and adolescents. Moreover, enormous efforts have been made to ensure the psychological and emotional well-being of young users as well as their safety when interacting with various social media platforms. In this paper, we investigate the exposure of those users to inappropriate comments posted on YouTube videos targeting…
▽ More
Social media has become an essential part of the daily routines of children and adolescents. Moreover, enormous efforts have been made to ensure the psychological and emotional well-being of young users as well as their safety when interacting with various social media platforms. In this paper, we investigate the exposure of those users to inappropriate comments posted on YouTube videos targeting this demographic. We collected a large-scale dataset of approximately four million records and studied the presence of five age-inappropriate categories and the amount of exposure to each category. Using natural language processing and machine learning techniques, we constructed ensemble classifiers that achieved high accuracy in detecting inappropriate comments. Our results show a large percentage of worrisome comments with inappropriate content: we found 11% of the comments on children's videos to be toxic, highlighting the importance of monitoring comments, particularly on children's platforms.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
e-PoS: Making Proof-of-Stake Decentralized and Fair
Authors:
Muhammad Saad,
Zhan Qin,
Kui Ren,
DaeHun Nyang,
David Mohaisen
Abstract:
Blockchain applications that rely on the Proof-of-Work (PoW) have increasingly become energy inefficient with a staggering carbon footprint. In contrast, energy-efficient alternative consensus protocols such as Proof-of-Stake (PoS) may cause centralization and unfairness in the blockchain system. To address these challenges, we propose a modular version of PoS-based blockchain systems called epos…
▽ More
Blockchain applications that rely on the Proof-of-Work (PoW) have increasingly become energy inefficient with a staggering carbon footprint. In contrast, energy-efficient alternative consensus protocols such as Proof-of-Stake (PoS) may cause centralization and unfairness in the blockchain system. To address these challenges, we propose a modular version of PoS-based blockchain systems called epos that resists the centralization of network resources by extending mining opportunities to a wider set of stakeholders. Moreover, epos leverages the in-built system operations to promote fair mining practices by penalizing malicious entities. We validate epos's achievable objectives through theoretical analysis and simulations. Our results show that epos ensures fairness and decentralization, and can be applied to existing blockchain applications.
△ Less
Submitted 1 January, 2021;
originally announced January 2021.
-
Generating Adversarial Examples with an Optimized Quality
Authors:
Aminollah Khormali,
DaeHun Nyang,
David Mohaisen
Abstract:
Deep learning models are widely used in a range of application areas, such as computer vision, computer security, etc. However, deep learning models are vulnerable to Adversarial Examples (AEs),carefully crafted samples to deceive those models. Recent studies have introduced new adversarial attack methods, but, to the best of our knowledge, none provided guaranteed quality for the crafted examples…
▽ More
Deep learning models are widely used in a range of application areas, such as computer vision, computer security, etc. However, deep learning models are vulnerable to Adversarial Examples (AEs),carefully crafted samples to deceive those models. Recent studies have introduced new adversarial attack methods, but, to the best of our knowledge, none provided guaranteed quality for the crafted examples as part of their creation, beyond simple quality measures such as Misclassification Rate (MR). In this paper, we incorporateImage Quality Assessment (IQA) metrics into the design and generation process of AEs. We propose an evolutionary-based single- and multi-objective optimization approaches that generate AEs with high misclassification rate and explicitly improve the quality, thus indistinguishability, of the samples, while perturbing only a limited number of pixels. In particular, several IQA metrics, including edge analysis, Fourier analysis, and feature descriptors, are leveraged into the process of generating AEs. Unique characteristics of the evolutionary-based algorithm enable us to simultaneously optimize the misclassification rate and the IQA metrics of the AEs. In order to evaluate the performance of the proposed method, we conduct intensive experiments on different well-known benchmark datasets(MNIST, CIFAR, GTSRB, and Open Image Dataset V5), while considering various objective optimization configurations. The results obtained from our experiments, when compared with the exist-ing attack methods, validate our initial hypothesis that the use ofIQA metrics within generation process of AEs can substantially improve their quality, while maintaining high misclassification rate.Finally, transferability and human perception studies are provided, demonstrating acceptable performance.
△ Less
Submitted 30 June, 2020;
originally announced July 2020.
-
A Deep Learning-based Fine-grained Hierarchical Learning Approach for Robust Malware Classification
Authors:
Ahmed Abusnaina,
Mohammed Abuhamad,
Hisham Alasmary,
Afsah Anwar,
Rhongho Jang,
Saeed Salem,
DaeHun Nyang,
David Mohaisen
Abstract:
The wide acceptance of Internet of Things (IoT) for both household and industrial applications is accompanied by several security concerns. A major security concern is their probable abuse by adversaries towards their malicious intent. Understanding and analyzing IoT malicious behaviors is crucial, especially with their rapid growth and adoption in wide-range of applications. However, recent studi…
▽ More
The wide acceptance of Internet of Things (IoT) for both household and industrial applications is accompanied by several security concerns. A major security concern is their probable abuse by adversaries towards their malicious intent. Understanding and analyzing IoT malicious behaviors is crucial, especially with their rapid growth and adoption in wide-range of applications. However, recent studies have shown that machine learning-based approaches are susceptible to adversarial attacks by adding junk codes to the binaries, for example, with an intention to fool those machine learning or deep learning-based detection systems. Realizing the importance of addressing this challenge, this study proposes a malware detection system that is robust to adversarial attacks. To do so, examine the performance of the state-of-the-art methods against adversarial IoT software crafted using the graph embedding and augmentation techniques. In particular, we study the robustness of such methods against two black-box adversarial methods, GEA and SGEA, to generate Adversarial Examples (AEs) with reduced overhead, and kee** their practicality intact. Our comprehensive experimentation with GEA-based AEs show the relation between misclassification and the graph size of the injected sample. Upon optimization and with small perturbation, by use of SGEA, all the IoT malware samples are misclassified as benign. This highlights the vulnerability of current detection systems under adversarial settings. With the landscape of possible adversarial attacks, we then propose DL-FHMC, a fine-grained hierarchical learning approach for malware detection and classification, that is robust to AEs with a capability to detect 88.52% of the malicious AEs.
△ Less
Submitted 15 May, 2020; v1 submitted 14 May, 2020;
originally announced May 2020.
-
Contra-*: Mechanisms for Countering Spam Attacks on Blockchain's Memory Pools
Authors:
Muhammad Saad,
Joongheon Kim,
DaeHun Nyang,
David Mohaisen
Abstract:
Blockchain-based cryptocurrencies, such as Bitcoin, have seen on the rise in their popularity and value, making them a target to several forms of Denial-of-Service (DoS) attacks, and calling for a better understanding of their attack surface from both security and distributed systems standpoints. In this paper, and in the pursuit of understanding the attack surface of blockchains, we explore a new…
▽ More
Blockchain-based cryptocurrencies, such as Bitcoin, have seen on the rise in their popularity and value, making them a target to several forms of Denial-of-Service (DoS) attacks, and calling for a better understanding of their attack surface from both security and distributed systems standpoints. In this paper, and in the pursuit of understanding the attack surface of blockchains, we explore a new form of attack that can be carried out on the memory pools (mempools) and mainly targets blockchain-based cryptocurrencies. We study this attack on Bitcoin mempool and explore the attack effects on transactions fee paid by benign users. To counter this attack, this paper further proposes Contra-*:, a set of countermeasures utilizing fee, age, and size (thus, Contra-F, Contra-A, and Contra-S) as prioritization mechanisms. Contra-*: optimize the mempool size and help in countering the effects of DoS attacks due to spam transactions. We evaluate Contra-* by simulations and analyze their effectiveness under various attack conditions.
△ Less
Submitted 1 January, 2021; v1 submitted 10 May, 2020;
originally announced May 2020.
-
Sensor-based Continuous Authentication of Smartphones' Users Using Behavioral Biometrics: A Contemporary Survey
Authors:
Mohammed Abuhamad,
Ahmed Abusnaina,
DaeHun Nyang,
David Mohaisen
Abstract:
Mobile devices and technologies have become increasingly popular, offering comparable storage and computational capabilities to desktop computers allowing users to store and interact with sensitive and private information. The security and protection of such personal information are becoming more and more important since mobile devices are vulnerable to unauthorized access or theft. User authentic…
▽ More
Mobile devices and technologies have become increasingly popular, offering comparable storage and computational capabilities to desktop computers allowing users to store and interact with sensitive and private information. The security and protection of such personal information are becoming more and more important since mobile devices are vulnerable to unauthorized access or theft. User authentication is a task of paramount importance that grants access to legitimate users at the point-of-entry and continuously through the usage session. This task is made possible with today's smartphones' embedded sensors that enable continuous and implicit user authentication by capturing behavioral biometrics and traits. In this paper, we survey more than 140 recent behavioral biometric-based approaches for continuous user authentication, including motion-based methods (28 studies), gait-based methods (19 studies), keystroke dynamics-based methods (20 studies), touch gesture-based methods (29 studies), voice-based methods (16 studies), and multimodal-based methods (34 studies). The survey provides an overview of the current state-of-the-art approaches for continuous user authentication using behavioral biometrics captured by smartphones' embedded sensors, including insights and open challenges for adoption, usability, and performance.
△ Less
Submitted 10 May, 2020; v1 submitted 23 January, 2020;
originally announced January 2020.
-
W-Net: A CNN-based Architecture for White Blood Cells Image Classification
Authors:
Changhun Jung,
Mohammed Abuhamad,
Jumabek Alikhanov,
Aziz Mohaisen,
Kyungja Han,
DaeHun Nyang
Abstract:
Computer-aided methods for analyzing white blood cells (WBC) have become widely popular due to the complexity of the manual process. Recent works have shown highly accurate segmentation and detection of white blood cells from microscopic blood images. However, the classification of the observed cells is still a challenge and highly demanded as the distribution of the five types reflects on the con…
▽ More
Computer-aided methods for analyzing white blood cells (WBC) have become widely popular due to the complexity of the manual process. Recent works have shown highly accurate segmentation and detection of white blood cells from microscopic blood images. However, the classification of the observed cells is still a challenge and highly demanded as the distribution of the five types reflects on the condition of the immune system. This work proposes W-Net, a CNN-based method for WBC classification. We evaluate W-Net on a real-world large-scale dataset, obtained from The Catholic University of Korea, that includes 6,562 real images of the five WBC types. W-Net achieves an average accuracy of 97%.
△ Less
Submitted 2 October, 2019;
originally announced October 2019.
-
COPYCAT: Practical Adversarial Attacks on Visualization-Based Malware Detection
Authors:
Aminollah Khormali,
Ahmed Abusnaina,
Songqing Chen,
DaeHun Nyang,
Aziz Mohaisen
Abstract:
Despite many attempts, the state-of-the-art of adversarial machine learning on malware detection systems generally yield unexecutable samples. In this work, we set out to examine the robustness of visualization-based malware detection system against adversarial examples (AEs) that not only are able to fool the model, but also maintain the executability of the original input. As such, we first inve…
▽ More
Despite many attempts, the state-of-the-art of adversarial machine learning on malware detection systems generally yield unexecutable samples. In this work, we set out to examine the robustness of visualization-based malware detection system against adversarial examples (AEs) that not only are able to fool the model, but also maintain the executability of the original input. As such, we first investigate the application of existing off-the-shelf adversarial attack approaches on malware detection systems through which we found that those approaches do not necessarily maintain the functionality of the original inputs. Therefore, we proposed an approach to generate adversarial examples, COPYCAT, which is specifically designed for malware detection systems considering two main goals; achieving a high misclassification rate and maintaining the executability and functionality of the original input. We designed two main configurations for COPYCAT, namely AE padding and sample injection. While the first configuration results in untargeted misclassification attacks, the sample injection configuration is able to force the model to generate a targeted output, which is highly desirable in the malware attribution setting. We evaluate the performance of COPYCAT through an extensive set of experiments on two malware datasets, and report that we were able to generate adversarial samples that are misclassified at a rate of 98.9% and 96.5% with Windows and IoT binary datasets, respectively, outperforming the misclassification rates in the literature. Most importantly, we report that those AEs were executable unlike AEs generated by off-the-shelf approaches. Our transferability study demonstrates that the generated AEs through our proposed method can be generalized to other models.
△ Less
Submitted 20 September, 2019;
originally announced September 2019.
-
Exploring the Attack Surface of Blockchain: A Systematic Overview
Authors:
Muhammad Saad,
Jeffrey Spaulding,
Laurent Njilla,
Charles Kamhoua,
Sachin Shetty,
DaeHun Nyang,
Aziz Mohaisen
Abstract:
In this paper, we systematically explore the attack surface of the Blockchain technology, with an emphasis on public Blockchains. Towards this goal, we attribute attack viability in the attack surface to 1) the Blockchain cryptographic constructs, 2) the distributed architecture of the systems using Blockchain, and 3) the Blockchain application context. To each of those contributing factors, we ou…
▽ More
In this paper, we systematically explore the attack surface of the Blockchain technology, with an emphasis on public Blockchains. Towards this goal, we attribute attack viability in the attack surface to 1) the Blockchain cryptographic constructs, 2) the distributed architecture of the systems using Blockchain, and 3) the Blockchain application context. To each of those contributing factors, we outline several attacks, including selfish mining, the 51% attack, Domain Name System (DNS) attacks, distributed denial-of-service (DDoS) attacks, consensus delay (due to selfish behavior or distributed denial-of-service attacks), Blockchain forks, orphaned and stale blocks, block ingestion, wallet thefts, smart contract attacks, and privacy attacks. We also explore the causal relationships between these attacks to demonstrate how various attack vectors are connected to one another. A secondary contribution of this work is outlining effective defense measures taken by the Blockchain technology or proposed by researchers to mitigate the effects of these attacks and patch associated vulnerabilities
△ Less
Submitted 6 April, 2019;
originally announced April 2019.
-
Scaling Up Anomaly Detection Using In-DRAM Working Set of Active Flows Table
Authors:
Rhongho Jang,
Seongkwang Moon,
Youngtae Noh,
Aziz Mohaisen,
DaeHun Nyang
Abstract:
In the zettabyte era, per-flow measurement becomes more challenging owing to the growth of both traffic volumes and the number of flows. Also, swiftness of detection of anomalies (e.g., DDoS attack, congestion, link failure, and so on) becomes paramount. For fast and accurate anomaly detection, managing an accurate working set of active flows (WSAF) from massive volumes of packet influxes at line…
▽ More
In the zettabyte era, per-flow measurement becomes more challenging owing to the growth of both traffic volumes and the number of flows. Also, swiftness of detection of anomalies (e.g., DDoS attack, congestion, link failure, and so on) becomes paramount. For fast and accurate anomaly detection, managing an accurate working set of active flows (WSAF) from massive volumes of packet influxes at line rates is a key challenge. WSAF is usually located in a very fast but expensive memory, such as TCAM or SRAM, and thus the number of entries to be stored is quite limited. To cope with the scalability issue of WSAF, we propose to use In-DRAM WSAF with scales, and put a compact data structure called FlowRegulator in front of WSAF to compensate for DRAM's slow access time by substantially reducing massive influxes to WSAF without compromising measurement accuracy. We prototype and evaluated our system in a large scale real-world experiment (connected to monitoring port of our campus main gateway router for 113 hours, and capturing 122.3 million flows). As one key application, FlowRegulator detected heavy hitters with 99.8% accuracy.
△ Less
Submitted 11 February, 2019;
originally announced February 2019.
-
Analyzing, Comparing, and Detecting Emerging Malware: A Graph-based Approach
Authors:
Hisham Alasmary,
Aminollah Khormali,
Afsah Anwar,
Jeman Park,
**chun Choi,
DaeHun Nyang,
Aziz Mohaisen
Abstract:
The growth in the number of Android and Internet of Things (IoT) devices has witnessed a parallel increase in the number of malicious software (malware), calling for new analysis approaches. We represent binaries using their graph properties of the Control Flow Graph (CFG) structure and conduct an in-depth analysis of malicious graphs extracted from the Android and IoT malware to understand their…
▽ More
The growth in the number of Android and Internet of Things (IoT) devices has witnessed a parallel increase in the number of malicious software (malware), calling for new analysis approaches. We represent binaries using their graph properties of the Control Flow Graph (CFG) structure and conduct an in-depth analysis of malicious graphs extracted from the Android and IoT malware to understand their differences. Using 2,874 and 2,891 malware binaries corresponding to IoT and Android samples, we analyze both general characteristics and graph algorithmic properties. Using the CFG as an abstract structure, we then emphasize various interesting findings, such as the prevalence of unreachable code in Android malware, noted by the multiple components in their CFGs, and larger number of nodes in the Android malware, compared to the IoT malware, highlighting a higher order of complexity. We implement a Machine Learning based classifiers to detect IoT malware from benign ones, and achieved an accuracy of 97.9% using Random Forests (RF).
△ Less
Submitted 11 February, 2019;
originally announced February 2019.
-
Analyzing Endpoints in the Internet of Things Malware
Authors:
**chun Choi,
Afsah Anwar,
Hisham Alasmary,
Jeffrey Spaulding,
DaeHun Nyang,
Aziz Mohaisen
Abstract:
The lack of security measures in the Internet of Things (IoT) devices and their persistent online connectivity give adversaries an opportunity to target them or abuse them as intermediary targets for larger attacks such as distributed denial-of-service (DDoS) campaigns. In this paper, we analyze IoT malware with a focus on endpoints to understand the affinity between the dropzones and their target…
▽ More
The lack of security measures in the Internet of Things (IoT) devices and their persistent online connectivity give adversaries an opportunity to target them or abuse them as intermediary targets for larger attacks such as distributed denial-of-service (DDoS) campaigns. In this paper, we analyze IoT malware with a focus on endpoints to understand the affinity between the dropzones and their target IP addresses, and to understand the different patterns among them. Towards this goal, we reverse-engineer 2,423 IoT malware samples to obtain IP addresses. We further augment additional information about the endpoints from Internet-wide scanners, including Shodan and Censys. We then perform a deep data-driven analysis of the dropzones and their target IP addresses and further examine the attack surface of the target device space.
△ Less
Submitted 9 February, 2019;
originally announced February 2019.
-
Network-based Analysis and Classification of Malware using Behavioral Artifacts Ordering
Authors:
Aziz Mohaisen,
Omar Alrawi,
Jeman Park,
Joongheon Kim,
DaeHun Nyang,
Manar Mohaisen
Abstract:
Using runtime execution artifacts to identify malware and its associated family is an established technique in the security domain. Many papers in the literature rely on explicit features derived from network, file system, or registry interaction. While effective, the use of these fine-granularity data points makes these techniques computationally expensive. Moreover, the signatures and heuristics…
▽ More
Using runtime execution artifacts to identify malware and its associated family is an established technique in the security domain. Many papers in the literature rely on explicit features derived from network, file system, or registry interaction. While effective, the use of these fine-granularity data points makes these techniques computationally expensive. Moreover, the signatures and heuristics are often circumvented by subsequent malware authors. In this work, we propose Chatter, a system that is concerned only with the order in which high-level system events take place. Individual events are mapped onto an alphabet and execution traces are captured via terse concatenations of those letters. Then, leveraging an analyst labeled corpus of malware, n-gram document classification techniques are applied to produce a classifier predicting malware family. This paper describes that technique and its proof-of-concept evaluation. In its prototype form, only network events are considered and eleven malware families are used. We show the technique achieves 83%-94% accuracy in isolation and makes non-trivial performance improvements when integrated with a baseline classifier of combined order features to reach an accuracy of up to 98.8%.
△ Less
Submitted 4 January, 2019;
originally announced January 2019.
-
Gruut: A Fully-Decentralized P2P Public Ledger
Authors:
DaeHun Nyang
Abstract:
Owing to Satoshi Nakamoto's brilliant idea, a P2P public ledger is shown to be implementable in anonymous network. Any Internet user can then join the anonymous network and contribute to the P2P public ledger by providing their computing power or proof-of-work. The proof-of-work is a clever implementation of one-CPU-one-vote by anonymous participants, and it protects the Bitcoin ledger from illega…
▽ More
Owing to Satoshi Nakamoto's brilliant idea, a P2P public ledger is shown to be implementable in anonymous network. Any Internet user can then join the anonymous network and contribute to the P2P public ledger by providing their computing power or proof-of-work. The proof-of-work is a clever implementation of one-CPU-one-vote by anonymous participants, and it protects the Bitcoin ledger from illegal modification. To compensate the nodes for their work, a cryptocurrency called Bitcoin is issued and given to nodes. However, the very nature of anonymity of the ledger and the cryptocurrency prevent the technology from being used in fiat money economy. Cryptocurrencies are not traceable even if they are used for money laundering or tax evasion, and the value of cryptocurrencies is not stable but fluctuates wildly. In this white paper, we introduce Gruut, a P2P ledger to implement a universal financial platform for fiat money. For this purpose, we introduce a new consensus algorithm called `proof-of-population,' which is one instance of `proof of public collaboration.' It can be used for multiple purposes; as a P2P ledger for banks, as a powerful tool for payment, including micropayment, and as a tool for any type of financial transactions. Even better, it distributes the profit obtained from transaction fee, currently dominated by a third party, to peers that cannot be centralized. Energy requirements of Gruut are so low that it is possible to run our software on a smartphone or on a personal computer without a graphic card.
△ Less
Submitted 29 June, 2018;
originally announced June 2018.
-
Decryptable to Your Eyes: Visualization of Security Protocols at the User Interface
Authors:
DaeHun Nyang,
Abedelaziz Mohaisen,
Taekyoung Kwon,
Brent Kang,
Angelos Stavrou
Abstract:
The design of authentication protocols, for online banking services in particular and any service that is of sensitive nature in general, is quite challenging. Indeed, enforcing security guarantees has overhead thus imposing additional computation and design considerations that do not always meet usability and user requirements. On the other hand, relaxing assumptions and rigorous security design…
▽ More
The design of authentication protocols, for online banking services in particular and any service that is of sensitive nature in general, is quite challenging. Indeed, enforcing security guarantees has overhead thus imposing additional computation and design considerations that do not always meet usability and user requirements. On the other hand, relaxing assumptions and rigorous security design to improve the user experience can lead to security breaches that can harm the users' trust in the system.
In this paper, we demonstrate how careful visualization design can enhance not only the security but also the usability of the authentication process. To that end, we propose a family of visualized authentication protocols, a visualized transaction verification, and a "decryptable to your eyes only" protocol. Through rigorous analysis, we verify that our protocols are immune to many of the challenging authentication attacks applicable in the literature. Furthermore, using an extensive case study on a prototype of our protocols, we highlight the potential of our approach for real-world deployment: we were able to achieve a high level of usability while satisfying stringent security requirements.
△ Less
Submitted 9 December, 2011;
originally announced December 2011.
-
Privacy in Location Based Services: Primitives Toward the Solution
Authors:
Abedelaziz Mohaisen,
Dowon Hong,
DaeHun Nyang
Abstract:
Location based services (LBS) are one of the most promising and innovative directions of convergence technologies resulting of emergence of several fields including database systems, mobile communication, Internet technology, and positioning systems. Although being initiated as early as middle of 1990's, it is only recently that the LBS received a systematic profound research interest due to its…
▽ More
Location based services (LBS) are one of the most promising and innovative directions of convergence technologies resulting of emergence of several fields including database systems, mobile communication, Internet technology, and positioning systems. Although being initiated as early as middle of 1990's, it is only recently that the LBS received a systematic profound research interest due to its commercial and technological impact. As the LBS is related to the user's location which can be used to trace the user's activities, a strong privacy concern has been raised. To preserve the user's location, several intelligent works have been introduced though many challenges are still awaiting solutions. This paper introduces a survey on LBS systems considering both localization technologies, model and architectures guaranteeing privacy. We also overview cryptographic primitive to possibly use in preserving LBS's privacy followed by fruitful research directions basically concerned with the privacy issue.
△ Less
Submitted 15 March, 2009;
originally announced March 2009.
-
Hierarchical Grid-Based Pairwise Key Pre-distribution in Wireless Sensor Networks
Authors:
Abedelaziz Mohaisen,
DaeHun Nyang,
KyungHee Lee
Abstract:
The security of wireless sensor networks is an active topic of research where both symmetric and asymmetric key cryptography issues have been studied. Due to their computational feasibility on typical sensor nodes, symmetric key algorithms that use the same key to encrypt and decrypt messages have been intensively studied and perfectly deployed in such environment. Because of the wireless sensor…
▽ More
The security of wireless sensor networks is an active topic of research where both symmetric and asymmetric key cryptography issues have been studied. Due to their computational feasibility on typical sensor nodes, symmetric key algorithms that use the same key to encrypt and decrypt messages have been intensively studied and perfectly deployed in such environment. Because of the wireless sensor's limited infrastructure, the bottleneck challenge for deploying these algorithms is the key distribution. For the same reason of resources restriction, key distribution mechanisms which are used in traditional wireless networks are not efficient for sensor networks.
To overcome the key distribution problem, several key pre-distribution algorithms and techniques that assign keys or keying material for the networks nodes in an offline phase have been introduced recently. In this paper, we introduce a supplemental distribution technique based on the communication pattern and deployment knowledge modeling. Our technique is based on the hierarchical grid deployment. For granting a proportional security level with number of dependent sensors, we use different polynomials in different orders with different weights. In seek of our proposed work's value, we provide a detailed analysis on the used resources, resulting security, resiliency, and connectivity compared with other related works.
△ Less
Submitted 7 March, 2008;
originally announced March 2008.
-
A Survey on Deep Packet Inspection for Intrusion Detection Systems
Authors:
Tamer AbuHmed,
Abedelaziz Mohaisen,
DaeHun Nyang
Abstract:
Deep packet inspection is widely recognized as a powerful way which is used for intrusion detection systems for inspecting, deterring and deflecting malicious attacks over the network. Fundamentally, almost intrusion detection systems have the ability to search through packets and identify contents that match with known attacks. In this paper, we survey the deep packet inspection implementations…
▽ More
Deep packet inspection is widely recognized as a powerful way which is used for intrusion detection systems for inspecting, deterring and deflecting malicious attacks over the network. Fundamentally, almost intrusion detection systems have the ability to search through packets and identify contents that match with known attacks. In this paper, we survey the deep packet inspection implementations techniques, research challenges and algorithms. Finally, we provide a comparison between the different applied systems.
△ Less
Submitted 29 February, 2008;
originally announced March 2008.