Search | arXiv e-print repository

An accurate IoT Intrusion Detection Framework using Apache Spark

Authors: Mohamed Abushwereb, Mouhammd Alkasassbeh, Mohammad Almseidin, Muhannad Mustafa

Abstract: The internet has caused tremendous changes since its appearance in the 1980s, and now, the Internet of Things (IoT) seems to be doing the same. The potential of IoT has made it the center of attention for many people, but, where some see an opportunity to contribute, others may see IoT networks as a target to be exploited. The high number of IoT devices makes them the perfect setup for staging den… ▽ More The internet has caused tremendous changes since its appearance in the 1980s, and now, the Internet of Things (IoT) seems to be doing the same. The potential of IoT has made it the center of attention for many people, but, where some see an opportunity to contribute, others may see IoT networks as a target to be exploited. The high number of IoT devices makes them the perfect setup for staging denial-of-service attacks (DoS) that can have devastating consequences. This renders the need for cybersecurity measures such as intrusion detection systems (IDSs) evident. The aim of this paper is to build an IDS using the big data platform, Apache Spark. Apache Spark was used along with its ML library (MLlib) and the BoT-IoT dataset. The IDS was then tested and evaluated based on F-Measure (f1), as was the standard when evaluating imbalanced data. Two rounds of tests were performed, a partial dataset for minimizing bias, and the full BoT-IoT dataset for exploring big data and ML capabilities in a security setting. For the partial dataset, the Random Forest algorithm had the highest performance for binary classification at an average f1 measure of 99.7%, as well as 99.6% for main category classification, and an 88.5% f1 measure for sub category classification. As for the complete dataset, the Decision Tree algorithm scored the highest f1 measures for all conducted tests; 97.9% for binary classification, 79% for main category classification, and 77% for sub category classification. △ Less

Submitted 21 February, 2022; originally announced March 2022.

Comments: 15 pages

arXiv:2107.12299 [pdf]

Anomaly-based Intrusion Detection System Using Fuzzy Logic

Authors: Mohammad Almseidin, Jamil Al-Sawwa, Mouhammd Alkasassbeh

Abstract: Recently, the Distributed Denial of Service (DDOS) attacks has been used for different aspects to denial the number of services for the end-users. Therefore, there is an urgent need to design an effective detection method against this type of attack. A fuzzy inference system offers the results in a more readable and understandable form. This paper introduces an anomaly-based Intrusion Detection (I… ▽ More Recently, the Distributed Denial of Service (DDOS) attacks has been used for different aspects to denial the number of services for the end-users. Therefore, there is an urgent need to design an effective detection method against this type of attack. A fuzzy inference system offers the results in a more readable and understandable form. This paper introduces an anomaly-based Intrusion Detection (IDS) system using fuzzy logic. The fuzzy logic inference system implemented as a detection method for Distributed Denial of Service (DDOS) attacks. The suggested method was applied to an open-source DDOS dataset. Experimental results show that the anomaly-based Intrusion Detection system using fuzzy logic obtained the best result by utilizing the InfoGain features selection method besides the fuzzy inference system, the results were 91.1% for the true-positive rate and 0.006% for the false-positive rate. △ Less

Submitted 22 June, 2021; originally announced July 2021.

arXiv:2106.12545 [pdf]

Diabetic Retinopathy Detection using Ensemble Machine Learning

Authors: Israa Odeh, Mouhammd Alkasassbeh, Mohammad Alauthman

Abstract: Diabetic Retinopathy (DR) is among the worlds leading vision loss causes in diabetic patients. DR is a microvascular disease that affects the eye retina, which causes vessel blockage and therefore cuts the main source of nutrition for the retina tissues. Treatment for this visual disorder is most effective when it is detected in its earliest stages, as severe DR can result in irreversible blindnes… ▽ More Diabetic Retinopathy (DR) is among the worlds leading vision loss causes in diabetic patients. DR is a microvascular disease that affects the eye retina, which causes vessel blockage and therefore cuts the main source of nutrition for the retina tissues. Treatment for this visual disorder is most effective when it is detected in its earliest stages, as severe DR can result in irreversible blindness. Nonetheless, DR identification requires the expertise of Ophthalmologists which is often expensive and time-consuming. Therefore, automatic detection systems were introduced aiming to facilitate the identification process, making it available globally in a time and cost-efficient manner. However, due to the limited reliable datasets and medical records for this particular eye disease, the obtained predictions accuracies were relatively unsatisfying for eye specialists to rely on them as diagnostic systems. Thus, we explored an ensemble-based learning strategy, merging a substantial selection of well-known classification algorithms in one sophisticated diagnostic model. The proposed framework achieved the highest accuracy rates among all other common classification algorithms in the area. 4 subdatasets were generated to contain the top 5 and top 10 features of the Messidor dataset, selected by InfoGainEval. and WrapperSubsetEval., accuracies of 70.7% and 75.1% were achieved on the InfoGainEval. top 5 and original dataset respectively. The results imply the impressive performance of the subdataset, which significantly conduces to a less complex classification process △ Less

Submitted 22 June, 2021; originally announced June 2021.

arXiv:2010.13852 [pdf]

A State-of-the-Art Review on IoT botnet Attack Detection

Authors: Zainab Al-Othman, Mouhammd Alkasassbeh, Sherenaz AL-Haj Baddar

Abstract: The Internet as we know it Today, comprises several fundamental interrelated networks, among which is the Internet of Things (IoT). Despite their versatility, several IoT devices are vulnerable from a security perspective, which renders them as a favorable target for multiple security breaches, especially botnet attacks. In this study, the conceptual frameworks of IoT botnet attacks will be explor… ▽ More The Internet as we know it Today, comprises several fundamental interrelated networks, among which is the Internet of Things (IoT). Despite their versatility, several IoT devices are vulnerable from a security perspective, which renders them as a favorable target for multiple security breaches, especially botnet attacks. In this study, the conceptual frameworks of IoT botnet attacks will be explored, alongside several machinelearning based botnet detection techniques. This study also analyzes and contrasts several botnet Detection techniques based on the Bot-IoT Dataset; a recent realistic IoT dataset that comprises state-of-the-art IoT botnet attack scenarios. △ Less

Submitted 2 October, 2020; originally announced October 2020.

Comments: NA

arXiv:2002.07223 [pdf]

Intelligent Methods for Accurately Detecting Phishing Websites

Authors: Almaha Abuzuraiq, Mouhammd Alkasassbeh, Mohammad Almseidin

Abstract: With increasing technology developments, there is a massive number of websites with varying purposes. But a particular type exists within this large collection, the so-called phishing sites which aim to deceive their users. The main challenge in detecting phishing websites is discovering the techniques that have been used. Where phishers are continually improving their strategies and creating web… ▽ More With increasing technology developments, there is a massive number of websites with varying purposes. But a particular type exists within this large collection, the so-called phishing sites which aim to deceive their users. The main challenge in detecting phishing websites is discovering the techniques that have been used. Where phishers are continually improving their strategies and creating web pages that can protect themselves against many forms of detection methods. Therefore, it is very necessary to develop reliable, active and contemporary methods of phishing detection to combat the adaptive techniques used by phishers. In this paper, different phishing detection approaches are reviewed by classifying them into three main groups. Then, the proposed model is presented in two stages. In the first stage, different machine learning algorithms are applied to validate the chosen dataset and applying features selection methods on it. Thus, the best accuracy was achieved by utilizing only 20 features out of 48 features combined with Random Forest is 98.11%. While in the second stage, the same dataset is applied to various fuzzy logic algorithms. As well the experimental results from the application of Fuzzy logic algorithms were incredible. Where in applying the FURIA algorithm with only five features the accuracy rate was 99.98%. Finally, comparison and discussion of the results between applying machine learning algorithms and fuzzy logic algorithms is done. Where the performance of using fuzzy logic algorithms exceeds the use of machine learning algorithms. △ Less

Submitted 19 January, 2020; originally announced February 2020.

arXiv:1909.02547 [pdf]

Collecting MIB Data from Network Managed by SNMP using Multi Mobile Agents

Authors: Nisreen Madi, Mouhammd Alkasassbeh

Abstract: Network anomalies are destructive to networks. Intrusion detection systems monitor network component behavior to detect unusual activity (i.e., possible threats). Application-layer Simple Network Management Protocol (SNMP) has been used for decades via TCP/IP protocol to manage network devices. Raw data security evaluation in intrusion detection incurs latency in detection. Management Information… ▽ More Network anomalies are destructive to networks. Intrusion detection systems monitor network component behavior to detect unusual activity (i.e., possible threats). Application-layer Simple Network Management Protocol (SNMP) has been used for decades via TCP/IP protocol to manage network devices. Raw data security evaluation in intrusion detection incurs latency in detection. Management Information Base (MIB) combined with SNMP is a solution for this, the traditional approach of SNMP is centralized. Thus, rendering it unreliable and non-adaptive to network changes when it comes to distributed network. In distributed network, using single or multiple light Mobile Agents are an optimal solution for data gathering as they can move from one source node to another, executing naturally at each. This helps complete tasks without increasing the network overheads, and contributes to decreasing latency △ Less

Submitted 18 July, 2019; originally announced September 2019.

arXiv:1906.00865 [pdf]

Network Attacks Anomaly Detection Using SNMP MIB Interface Parameters

Authors: Ghazi Al-Naymatm, Ahmed Hambouz, Mouhammd Alkasassbeh

Abstract: Many approaches have evolved to enhance network attacks detection anomaly using SNMP-MIBs. Most of these approaches focus on machine learning algorithms with a lot of SNMP-MIB database parameters, which may consume most of hardware resources (CPU, memory, and bandwidth). In this paper we introduce an efficient detection model to detect network attacks anomaly using Lazy.IBk as a machine learning c… ▽ More Many approaches have evolved to enhance network attacks detection anomaly using SNMP-MIBs. Most of these approaches focus on machine learning algorithms with a lot of SNMP-MIB database parameters, which may consume most of hardware resources (CPU, memory, and bandwidth). In this paper we introduce an efficient detection model to detect network attacks anomaly using Lazy.IBk as a machine learning classifier and Correlation, and ReliefF as attribute evaluators on SNMP-MIB interface parameters. This model achieved accurate results (100%) with minimal hardware resources consumption. Thus, this model can be adopted in intrusion detection system (IDS) to increase its performance and efficiency. △ Less

Submitted 19 October, 2019; v1 submitted 14 May, 2019; originally announced June 2019.

arXiv:1906.00863 [pdf]

Detecting network anomalies using machine learning and SNMP-MIB dataset with IP group

Authors: Abdelrahman Manna, Mouhammd Alkasassbeh

Abstract: SNMP-MIB is a widely used approach that uses machine learning to classify data and obtain results, but using SNMP-MIB huge dataset is not efficient and it is also time and resources consuming. In this paper, a REP Tree, J48(Decision Tree) and Random Forest classifiers were used to train a model that can detect the anomalies and predict the network attacks that my affect the Internet Protocol(IP) g… ▽ More SNMP-MIB is a widely used approach that uses machine learning to classify data and obtain results, but using SNMP-MIB huge dataset is not efficient and it is also time and resources consuming. In this paper, a REP Tree, J48(Decision Tree) and Random Forest classifiers were used to train a model that can detect the anomalies and predict the network attacks that my affect the Internet Protocol(IP) group. This trained model can be used in the devices that are used to detect the anomalies such as intrusion detection systems. △ Less

Submitted 14 May, 2019; originally announced June 2019.

arXiv:1811.08954 [pdf]

Fuzzy Rule Interpolation and SNMP-MIB for Emerging Network Abnormality

Authors: Mohammad Almseidin, Mouhammd Alkasassbeh, Szilveszter Kovacs

Abstract: It is difficult to implement an efficient detection approach for Intrusion Detection Systems (IDS) and many factors contribute to this challenge. One such challenge concerns establishing adequate boundaries and finding a proper data source. Typical IDS detection approaches deal with raw traffics. These traffics need to be studied in depth and thoroughly investigated in order to extract the require… ▽ More It is difficult to implement an efficient detection approach for Intrusion Detection Systems (IDS) and many factors contribute to this challenge. One such challenge concerns establishing adequate boundaries and finding a proper data source. Typical IDS detection approaches deal with raw traffics. These traffics need to be studied in depth and thoroughly investigated in order to extract the required knowledge base. Another challenge involves implementing the binary decision. This is because there are no reasonable limits between normal and attack traffics patterns. In this paper, we introduce a novel idea capable of supporting the proper data source while avoiding the issues associated with the binary decision. This paper aims to introduce a detection approach for defining abnormality by using the Fuzzy Rule Interpolation (FRI) with Simple Network Management Protocol (SNMP) Management Information Base (MIB) parameters. The strength of the proposed detection approach is based on adapting the SNMP-MIB parameters with the FRI. This proposed method eliminates the raw traffic processing component which is time consuming and requires extensive computational measures. It also eliminates the need for a complete fuzzy rule based intrusion definition. The proposed approach was tested and evaluated using an open source SNMP-MIB dataset and obtained a 93% detection rate. Additionally, when compared to other literature in which the same test-bed environment was employed along with the same number of parameters, the proposed detection approach outperformed the support vector machine and neural network. Therefore, combining the SNMP-MIB parameters with the FRI based reasoning could be beneficial for detecting intrusions, even in the case if the fuzzy rule based intrusion definition is incomplete (not fully defined). △ Less

Submitted 21 November, 2018; originally announced November 2018.

Comments: 10

arXiv:1810.07252 [pdf]

Classification of malware based on file content and characteristics

Authors: Mouhammd Alkasassbeh, Samail Al-Daleen

Abstract: In general, the industry of malware has come to be a market which brings on loads of money by investing and implementing high end technology to escape traditional detection while vendors of anti-malware spend thousands if not millions of dollars to stop the malware breach since it not only causes financial losses but also emotional ones. This paper study the classification of malware based on file… ▽ More In general, the industry of malware has come to be a market which brings on loads of money by investing and implementing high end technology to escape traditional detection while vendors of anti-malware spend thousands if not millions of dollars to stop the malware breach since it not only causes financial losses but also emotional ones. This paper study the classification of malware based on file content and characteristics, this was done through use of Clamp Integrated dataset that includes 5210 instances. There are different algorithms were applied using Weka software, which are; ZeroR, bayesNet, SMO, KNN, J48, as well as Random Forest. The obtained results showed that Random Forest that achieved the highest overall accuracy of (99.0979%). This means that Random Forest algorithm is efficient to be used in malware classification based on file content and characteristics. △ Less

Submitted 26 September, 2018; originally announced October 2018.

Comments: 12

arXiv:1809.02610 [pdf]

Machine Learning Methods for Network Intrusion Detection

Authors: Mouhammad Alkasassbeh, Mohammad Almseidin

Abstract: Network security engineers work to keep services available all the time by handling intruder attacks. Intrusion Detection System (IDS) is one of the obtainable mechanisms that is used to sense and classify any abnormal actions. Therefore, the IDS must be always up to date with the latest intruder attacks signatures to preserve confidentiality, integrity, and availability of the services. The speed… ▽ More Network security engineers work to keep services available all the time by handling intruder attacks. Intrusion Detection System (IDS) is one of the obtainable mechanisms that is used to sense and classify any abnormal actions. Therefore, the IDS must be always up to date with the latest intruder attacks signatures to preserve confidentiality, integrity, and availability of the services. The speed of the IDS is a very important issue as well learning the new attacks. This research work illustrates how the Knowledge Discovery and Data Mining (or Knowledge Discovery in Databases) KDD dataset is very handy for testing and evaluating different Machine Learning Techniques. It mainly focuses on the KDD preprocess part in order to prepare a decent and fair experimental data set. The J48, MLP, and Bayes Network classifiers have been chosen for this study. It has been proven that the J48 classifier has achieved the highest accuracy rate for detecting and classifying all KDD dataset attacks, which are of type DOS, R2L, U2R, and PROBE. △ Less

Submitted 1 September, 2018; originally announced September 2018.

Comments: ICCCNT 2018 - The 20th International Conference on Computing, Communication. arXiv admin note: substantial text overlap with arXiv:1805.10458

arXiv:1801.05309 [pdf]

A Novel Hybrid Method for Network Anomaly Detection Based on Traffic Prediction and Change Point Detection

Authors: Mouhammd Alkasassbeh

Abstract: In recent years, computer networks have become more and more advanced in terms of size, applications, complexity and level of heterogeneity. Moreover, availability and performance are important issues for end users. New types of cyber-attacks that can affect and damage network performance and availability are constantly emerging and some threats, such as Distributed Denial of Service (DDoS) attack… ▽ More In recent years, computer networks have become more and more advanced in terms of size, applications, complexity and level of heterogeneity. Moreover, availability and performance are important issues for end users. New types of cyber-attacks that can affect and damage network performance and availability are constantly emerging and some threats, such as Distributed Denial of Service (DDoS) attacks, can be very dangerous and cannot be easily prevented. In this study, we present a novel hybrid approach to detecting a DDoS attack by means of monitoring abnormal traffic in the network. This approach reads traffic data and from that it is possible to build a model, by means of which future data may be predicted and compared with observed data, in order to detect any abnormal traffic. This approach combines two methods: traffic prediction and changing detection. To the best of our knowledge, such a combination has never been used in this area before. The approach achieved a highly significant accuracy rate of 98.3% and sensitivity was 100%, which means that all potential attacks are detected and prevented from penetrating the network system. △ Less

Submitted 5 January, 2018; originally announced January 2018.

arXiv:1801.02330 [pdf]

Evaluation of Machine Learning Algorithms for Intrusion Detection System

Authors: Mohammad Almseidin, Maen Alzubi, Szilveszter Kovacs, Mouhammd Alkasassbeh

Abstract: Intrusion detection system (IDS) is one of the implemented solutions against harmful attacks. Furthermore, attackers always keep changing their tools and techniques. However, implementing an accepted IDS system is also a challenging task. In this paper, several experiments have been performed and evaluated to assess various machine learning classifiers based on KDD intrusion dataset. It succeeded… ▽ More Intrusion detection system (IDS) is one of the implemented solutions against harmful attacks. Furthermore, attackers always keep changing their tools and techniques. However, implementing an accepted IDS system is also a challenging task. In this paper, several experiments have been performed and evaluated to assess various machine learning classifiers based on KDD intrusion dataset. It succeeded to compute several performance metrics in order to evaluate the selected classifiers. The focus was on false negative and false positive performance metrics in order to enhance the detection rate of the intrusion detection system. The implemented experiments demonstrated that the decision table classifier achieved the lowest value of false negative while the random forest classifier has achieved the highest average accuracy rate. △ Less

Submitted 8 January, 2018; originally announced January 2018.

Journal ref: Intelligent Systems and Informatics (SISY), 2017 IEEE 15th International Symposium

arXiv:1712.09623 [pdf]

An empirical evaluation for the intrusion detection features based on machine learning and feature selection methods

Authors: Mouhammd Alkasassbeh

Abstract: Despite the great developments in information technology, particularly the Internet, computer networks, global information exchange, and its positive impact in all areas of daily life, it has also contributed to the development of penetration and intrusion which forms a high risk to the security of information organizations, government agencies, and causes large economic losses. There are many tec… ▽ More Despite the great developments in information technology, particularly the Internet, computer networks, global information exchange, and its positive impact in all areas of daily life, it has also contributed to the development of penetration and intrusion which forms a high risk to the security of information organizations, government agencies, and causes large economic losses. There are many techniques designed for protection such as firewall and intrusion detection systems (IDS). IDS is a set of software and/or hardware techniques used to detect hacker's activities in computer systems. Two types of anomalies are used in IDS to detect intrusive activities different from normal user behavior. Misuse relies on the knowledge base that contains all known attack techniques and intrusion is discovered through research in this knowledge base. Artificial intelligence techniques have been introduced to improve the performance of these systems. The importance of IDS is to identify unauthorized access attempting to compromise confidentiality, integrity or availability of the computer network. This paper investigates the Intrusion Detection (ID) problem using three machine learning algorithms namely, BayesNet algorithm, Multi-Layer Perceptron (MLP), and Support Vector Machine (SVM). The algorithms are applied on a real, Management Information Based (MIB) dataset that is collected from real life environment. To enhance the detection process accuracy, a set of feature selection approaches is used; Infogain (IG), ReleifF (RF), and Genetic Search (GS). Our experiments show that the three feature selection methods have enhanced the classification performance. GS with bayesNet, MLP and SVM give high accuracy rates, more specifically the BayesNet with the GS accuracy rate is 99.9%. △ Less

Submitted 27 December, 2017; originally announced December 2017.

Journal ref: Journal of Theoretical and Applied Information Technology 30th November 2017 -- Vol. 95. No. 22 -- 2017

arXiv:1602.08313 [pdf]

Enhancing Genetic Algorithms using Multi Mutations

Authors: Ahmad B. A. Hassanat, Esra'a Alkafaween, Nedal A. Al-Nawaiseh, Mohammad A. Abbadi, Mouhammd Alkasassbeh, Mahmoud B. Alhasanat

Abstract: Mutation is one of the most important stages of the genetic algorithm because of its impact on the exploration of global optima, and to overcome premature convergence. There are many types of mutation, and the problem lies in selection of the appropriate type, where the decision becomes more difficult and needs more trial and error. This paper investigates the use of more than one mutation operato… ▽ More Mutation is one of the most important stages of the genetic algorithm because of its impact on the exploration of global optima, and to overcome premature convergence. There are many types of mutation, and the problem lies in selection of the appropriate type, where the decision becomes more difficult and needs more trial and error. This paper investigates the use of more than one mutation operator to enhance the performance of genetic algorithms. Novel mutation operators are proposed, in addition to two selection strategies for the mutation operators, one of which is based on selecting the best mutation operator and the other randomly selects any operator. Several experiments on some Travelling Salesman Problems (TSP) were conducted to evaluate the proposed methods, and these were compared to the well-known exchange mutation and rearrangement mutation. The results show the importance of some of the proposed methods, in addition to the significant enhancement of the genetic algorithm's performance, particularly when using more than one mutation operator. △ Less

Submitted 10 January, 2018; v1 submitted 26 February, 2016; originally announced February 2016.

Comments: 17 pages, 11 figures, 1 table, 41 references

Journal ref: International Journal of Computer Science and Information Security 14, no. 7 (2016): 785

arXiv:1501.00687 [pdf]

On Enhancing The Performance Of Nearest Neighbour Classifiers Using Hassanat Distance Metric

Authors: Mouhammd Alkasassbeh, Ghada A. Altarawneh, Ahmad B. A. Hassanat

Abstract: We showed in this work how the Hassanat distance metric enhances the performance of the nearest neighbour classifiers. The results demonstrate the superiority of this distance metric over the traditional and most-used distances, such as Manhattan distance and Euclidian distance. Moreover, we proved that the Hassanat distance metric is invariant to data scale, noise and outliers. Throughout this wo… ▽ More We showed in this work how the Hassanat distance metric enhances the performance of the nearest neighbour classifiers. The results demonstrate the superiority of this distance metric over the traditional and most-used distances, such as Manhattan distance and Euclidian distance. Moreover, we proved that the Hassanat distance metric is invariant to data scale, noise and outliers. Throughout this work, it is clearly notable that both ENN and IINC performed very well with the distance investigated, as their accuracy increased significantly by 3.3% and 3.1% respectively, with no significant advantage of the ENN over the IINC in terms of accuracy. Correspondingly, it can be noted from our results that there is no optimal algorithm that can solve all real-life problems perfectly; this is supported by the no-free-lunch theorem △ Less

Submitted 4 January, 2015; originally announced January 2015.

Comments: Canadian Journal of Pure and Applied Sciences (CJPAS). volume 9, issue 1, Feb 2015

Showing 1–16 of 16 results for author: Alkasassbeh, M