Search | arXiv e-print repository

Explainable Deep Learning Models for Dynamic and Online Malware Classification

Authors: Quincy Card, Daniel Simpson, Kshitiz Aryal, Maanak Gupta, Sheikh Rabiul Islam

Abstract: In recent years, there has been a significant surge in malware attacks, necessitating more advanced preventive measures and remedial strategies. While several successful AI-based malware classification approaches exist categorized into static, dynamic, or online analysis, most successful AI models lack easily interpretable decisions and explanations for their processes. Our paper aims to delve int… ▽ More In recent years, there has been a significant surge in malware attacks, necessitating more advanced preventive measures and remedial strategies. While several successful AI-based malware classification approaches exist categorized into static, dynamic, or online analysis, most successful AI models lack easily interpretable decisions and explanations for their processes. Our paper aims to delve into explainable malware classification across various execution environments (such as dynamic and online), thoroughly analyzing their respective strengths, weaknesses, and commonalities. To evaluate our approach, we train Feed Forward Neural Networks (FFNN) and Convolutional Neural Networks (CNN) to classify malware based on features obtained from dynamic and online analysis environments. The feature attribution for malware classification is performed by explainability tools, SHAP, LIME and Permutation Importance. We perform a detailed evaluation of the calculated global and local explanations from the experiments, discuss limitations and, ultimately, offer recommendations for achieving a balanced approach. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2311.16180 [pdf]

Aiming to Minimize Alcohol-Impaired Road Fatalities: Utilizing Fairness-Aware and Domain Knowledge-Infused Artificial Intelligence

Authors: Tejas Venkateswaran, Sheikh Rabiul Islam, Md Golam Moula Mehedi Hasan, Mohiuddin Ahmed

Abstract: Approximately 30% of all traffic fatalities in the United States are attributed to alcohol-impaired driving. This means that, despite stringent laws against this offense in every state, the frequency of drunk driving accidents is alarming, resulting in approximately one person being killed every 45 minutes. The process of charging individuals with Driving Under the Influence (DUI) is intricate and… ▽ More Approximately 30% of all traffic fatalities in the United States are attributed to alcohol-impaired driving. This means that, despite stringent laws against this offense in every state, the frequency of drunk driving accidents is alarming, resulting in approximately one person being killed every 45 minutes. The process of charging individuals with Driving Under the Influence (DUI) is intricate and can sometimes be subjective, involving multiple stages such as observing the vehicle in motion, interacting with the driver, and conducting Standardized Field Sobriety Tests (SFSTs). Biases have been observed through racial profiling, leading to some groups and geographical areas facing fewer DUI tests, resulting in many actual DUI incidents going undetected, ultimately leading to a higher number of fatalities. To tackle this issue, our research introduces an Artificial Intelligence-based predictor that is both fairness-aware and incorporates domain knowledge to analyze DUI-related fatalities in different geographic locations. Through this model, we gain intriguing insights into the interplay between various demographic groups, including age, race, and income. By utilizing the provided information to allocate policing resources in a more equitable and efficient manner, there is potential to reduce DUI-related fatalities and have a significant impact on road safety. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: IEEE Big Data 2023

arXiv:2311.14883 [pdf]

Predicting Potential School Shooters from Social Media Posts

Authors: Alana Cedeno, Rachel Liang, Sheikh Rabiul Islam

Abstract: The rate of terror attacks has surged over the past decade, resulting in the tragic and senseless loss or alteration of numerous lives. Offenders behind mass shootings, bombings, or other domestic terrorism incidents have historically exhibited warning signs on social media before carrying out actual incidents. However, due to inadequate and comprehensive police procedures, authorities and social… ▽ More The rate of terror attacks has surged over the past decade, resulting in the tragic and senseless loss or alteration of numerous lives. Offenders behind mass shootings, bombings, or other domestic terrorism incidents have historically exhibited warning signs on social media before carrying out actual incidents. However, due to inadequate and comprehensive police procedures, authorities and social media platforms are often unable to detect these early indicators of intent. To tackle this issue, we aim to create a multimodal model capable of predicting sentiments simultaneously from both images (i.e., social media photos) and text (i.e., social media posts), generating a unified prediction. The proposed method involves segregating the image and text components of an online post and utilizing a captioning model to generate sentences summarizing the image's contents. Subsequently, a sentiment analyzer evaluates this caption, or description, along with the original post's text to determine whether the post is positive (i.e., concerning) or negative (i.e., benign). This undertaking represents a significant step toward implementing the developed system in real-world scenarios. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Journal ref: IEEE Big Data 2023

arXiv:2107.14095 [pdf, other]

Exploring the Scope and Potential of Local Newspaper-based Dengue Surveillance in Bangladesh

Authors: Nazia Tasnim, Md. Istiak Hossain Shihab, Moqsadur Rahman, Sheikh Rabiul Islam, Mohammad Ruhul Amin

Abstract: Dengue fever has been considered to be one of the global public health problems of the twenty-first century, especially in tropical and subtropical countries of the global south. The high morbidity and mortality rates of Dengue fever impose a huge economic and health burden for middle and low-income countries. It is so prevalent in such regions that enforcing a granular level of surveillance is qu… ▽ More Dengue fever has been considered to be one of the global public health problems of the twenty-first century, especially in tropical and subtropical countries of the global south. The high morbidity and mortality rates of Dengue fever impose a huge economic and health burden for middle and low-income countries. It is so prevalent in such regions that enforcing a granular level of surveillance is quite impossible. Therefore, it is crucial to explore an alternative cost-effective solution that can provide updates of the ongoing situation in a timely manner. In this paper, we explore the scope and potential of a local newspaper-based dengue surveillance system, using well-known data-mining techniques, in Bangladesh from the analysis of the news contents written in the native language. In addition, we explain the working procedure of develo** a novel database, using human-in-the-loop technique, for further analysis, and classification of dengue and its intervention-related news. Our classification method has an f-score of 91.45%, and matches the ground truth of reported cases quite closely. Based on the dengue and intervention-related news, we identified the regions where more intervention efforts are needed to reduce the rate of dengue infection. A demo of this project can be accessed at: http://erdos.dsm.fordham.edu:3009/ △ Less

Submitted 7 July, 2021; originally announced July 2021.

Comments: 5 Pages, Joint KDD 2021 Health Day and 2021 KDD Workshop on Applied Data Science for Healthcare

arXiv:2101.09429 [pdf, other]

Explainable Artificial Intelligence Approaches: A Survey

Authors: Sheikh Rabiul Islam, William Eberle, Sheikh Khaled Ghafoor, Mohiuddin Ahmed

Abstract: The lack of explainability of a decision from an Artificial Intelligence (AI) based "black box" system/model, despite its superiority in many real-world applications, is a key stumbling block for adopting AI in many high stakes applications of different domain or industry. While many popular Explainable Artificial Intelligence (XAI) methods or approaches are available to facilitate a human-friendl… ▽ More The lack of explainability of a decision from an Artificial Intelligence (AI) based "black box" system/model, despite its superiority in many real-world applications, is a key stumbling block for adopting AI in many high stakes applications of different domain or industry. While many popular Explainable Artificial Intelligence (XAI) methods or approaches are available to facilitate a human-friendly explanation of the decision, each has its own merits and demerits, with a plethora of open challenges. We demonstrate popular XAI methods with a mutual case study/task (i.e., credit default prediction), analyze for competitive advantages from multiple perspectives (e.g., local, global), provide meaningful insight on quantifying explainability, and recommend paths towards responsible or human-centered AI using XAI as a medium. Practitioners can use this work as a catalog to understand, compare, and correlate competitive advantages of popular XAI methods. In addition, this survey elicits future research directions towards responsible or human-centric AI systems, which is crucial to adopt AI in high stakes applications. △ Less

Submitted 23 January, 2021; originally announced January 2021.

arXiv:1911.10104 [pdf, other]

Towards Quantification of Explainability in Explainable Artificial Intelligence Methods

Authors: Sheikh Rabiul Islam, William Eberle, Sheikh K. Ghafoor

Abstract: Artificial Intelligence (AI) has become an integral part of domains such as security, finance, healthcare, medicine, and criminal justice. Explaining the decisions of AI systems in human terms is a key challenge--due to the high complexity of the model, as well as the potential implications on human interests, rights, and lives . While Explainable AI is an emerging field of research, there is no c… ▽ More Artificial Intelligence (AI) has become an integral part of domains such as security, finance, healthcare, medicine, and criminal justice. Explaining the decisions of AI systems in human terms is a key challenge--due to the high complexity of the model, as well as the potential implications on human interests, rights, and lives . While Explainable AI is an emerging field of research, there is no consensus on the definition, quantification, and formalization of explainability. In fact, the quantification of explainability is an open challenge. In our previous work, we incorporated domain knowledge for better explainability, however, we were unable to quantify the extent of explainability. In this work, we (1) briefly analyze the definitions of explainability from the perspective of different disciplines (e.g., psychology, social science), properties of explanation, explanation methods, and human-friendly explanations; and (2) propose and formulate an approach to quantify the extent of explainability. Our experimental result suggests a reasonable and model-agnostic way to quantify explainability △ Less

Submitted 22 November, 2019; originally announced November 2019.

Comments: Submitted to FLAIRS-33

arXiv:1911.09858 [pdf, other]

Investigating bankruptcy prediction models in the presence of extreme class imbalance and multiple stages of economy

Authors: Sheikh Rabiul Islam, William Eberle, Sheikh K. Ghafoor, Sid C. Bundy, Douglas A. Talbert, Ambareen Siraj

Abstract: In the area of credit risk analytics, current Bankruptcy Prediction Models (BPMs) struggle with (a) the availability of comprehensive and real-world data sets and (b) the presence of extreme class imbalance in the data (i.e., very few samples for the minority class) that degrades the performance of the prediction model. Moreover, little research has compared the relative performance of well-known… ▽ More In the area of credit risk analytics, current Bankruptcy Prediction Models (BPMs) struggle with (a) the availability of comprehensive and real-world data sets and (b) the presence of extreme class imbalance in the data (i.e., very few samples for the minority class) that degrades the performance of the prediction model. Moreover, little research has compared the relative performance of well-known BPM's on public datasets addressing the class imbalance problem. In this work, we apply eight classes of well-known BPMs, as suggested by a review of decades of literature, on a new public dataset named Freddie Mac Single-Family Loan-Level Dataset with resampling (i.e., adding synthetic minority samples) of the minority class to tackle class imbalance. Additionally, we apply some recent AI techniques (e.g., tree-based ensemble techniques) that demonstrate potentially better results on models trained with resampled data. In addition, from the analysis of 19 years (1999-2017) of data, we discover that models behave differently when presented with sudden changes in the economy (e.g., a global financial crisis) resulting in abrupt fluctuations in the national default rate. In summary, this study should aid practitioners/researchers in determining the appropriate model with respect to data that contains a class imbalance and various economic stages. △ Less

Submitted 22 November, 2019; originally announced November 2019.

Comments: Under review in Expert Systems with Applications

arXiv:1911.09853 [pdf, other]

Domain Knowledge Aided Explainable Artificial Intelligence for Intrusion Detection and Response

Authors: Sheikh Rabiul Islam, William Eberle, Sheikh K. Ghafoor, Ambareen Siraj, Mike Rogers

Abstract: Artificial Intelligence (AI) has become an integral part of modern-day security solutions for its ability to learn very complex functions and handling "Big Data". However, the lack of explainability and interpretability of successful AI models is a key stumbling block when trust in a model's prediction is critical. This leads to human intervention, which in turn results in a delayed response or de… ▽ More Artificial Intelligence (AI) has become an integral part of modern-day security solutions for its ability to learn very complex functions and handling "Big Data". However, the lack of explainability and interpretability of successful AI models is a key stumbling block when trust in a model's prediction is critical. This leads to human intervention, which in turn results in a delayed response or decision. While there have been major advancements in the speed and performance of AI-based intrusion detection systems, the response is still at human speed when it comes to explaining and interpreting a specific prediction or decision. In this work, we infuse popular domain knowledge (i.e., CIA principles) in our model for better explainability and validate the approach on a network intrusion detection test case. Our experimental results suggest that the infusion of domain knowledge provides better explainability as well as a faster decision or response. In addition, the infused domain knowledge generalizes the model to work well with unknown attacks, as well as opens the path to adapt to a large stream of network traffic from numerous IoT devices. △ Less

Submitted 22 February, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

Comments: Accepted to be published in the Proceedings of the AAAI 2020 Spring Symposium on Combining Machine Learning and Knowledge Engineering in Practice (AAAI-MAKE 2020). Stanford University, Palo Alto, California, USA, March 23-25, 2020

arXiv:1910.06469 [pdf, other]

doi 10.13140/RG.2.2.12785.84326

Automated Ransomware Behavior Analysis: Pattern Extraction and Early Detection

Authors: Qian Chen, Sheikh Rabiul Islam, Henry Haswell, Robert A. Bridges

Abstract: Security operation centers (SOCs) typically use a variety of tools to collect large volumes of host logs for detection and forensic of intrusions. Our experience, supported by recent user studies on SOC operators, indicates that operators spend ample time (e.g., hundreds of man-hours) on investigations into logs seeking adversarial actions. Similarly, reconfiguration of tools to adapt detectors fo… ▽ More Security operation centers (SOCs) typically use a variety of tools to collect large volumes of host logs for detection and forensic of intrusions. Our experience, supported by recent user studies on SOC operators, indicates that operators spend ample time (e.g., hundreds of man-hours) on investigations into logs seeking adversarial actions. Similarly, reconfiguration of tools to adapt detectors for future similar attacks is commonplace upon gaining novel insights (e.g., through internal investigation or shared indicators). This paper presents an automated malware pattern-extraction and early detection tool, testing three machine learning approaches: TF-IDF (term frequency-inverse document frequency), Fisher's LDA (linear discriminant analysis) and ET (extra trees/extremely randomized trees) that can (1) analyze freshly discovered malware samples in sandboxes and generate dynamic analysis reports (host logs); (2) automatically extract the sequence of events induced by malware given a large volume of ambient (un-attacked) host logs, and the relatively few logs from hosts that are infected with potentially polymorphic malware; (3) rank the most discriminating features (unique patterns) of malware and from the learned behavior detect malicious activity; and (4) allows operators to visualize the discriminating features and their correlations to facilitate malware forensic efforts. To validate the accuracy and efficiency of our tool, we design three experiments and test seven ransomware attacks (i.e., WannaCry, DBGer, Cerber, Defray, GandCrab, Locky, and nRansom). The experimental results show that TF-IDF is the best of the three methods to identify discriminating features, and ET is the most time-efficient and robust approach. △ Less

Submitted 14 October, 2019; originally announced October 2019.

Comments: The 2nd International Conference on Science of Cyber Security - SciSec 2019; Springer's Lecture Notes in Computer Science (LNCS) series

arXiv:1905.11474 [pdf, other]

Infusing domain knowledge in AI-based "black box" models for better explainability with application in bankruptcy prediction

Authors: Sheikh Rabiul Islam, William Eberle, Sid Bundy, Sheikh Khaled Ghafoor

Abstract: Although "black box" models such as Artificial Neural Networks, Support Vector Machines, and Ensemble Approaches continue to show superior performance in many disciplines, their adoption in the sensitive disciplines (e.g., finance, healthcare) is questionable due to the lack of interpretability and explainability of the model. In fact, future adoption of "black box" models is difficult because of… ▽ More Although "black box" models such as Artificial Neural Networks, Support Vector Machines, and Ensemble Approaches continue to show superior performance in many disciplines, their adoption in the sensitive disciplines (e.g., finance, healthcare) is questionable due to the lack of interpretability and explainability of the model. In fact, future adoption of "black box" models is difficult because of the recent rule of "right of explanation" by the European Union where a user can ask for an explanation behind an algorithmic decision, and the newly proposed bill by the US government, the "Algorithmic Accountability Act", which would require companies to assess their machine learning systems for bias and discrimination and take corrective measures. Top Bankruptcy Prediction Models are A.I.-based and are in need of better explainability -the extent to which the internal working mechanisms of an AI system can be explained in human terms. Although explainable artificial intelligence is an emerging field of research, infusing domain knowledge for better explainability might be a possible solution. In this work, we demonstrate a way to collect and infuse domain knowledge into a "black box" model for bankruptcy prediction. Our understanding from the experiments reveals that infused domain knowledge makes the output from the black box model more interpretable and explainable. △ Less

Submitted 30 May, 2019; v1 submitted 27 May, 2019; originally announced May 2019.

Comments: Under review in KDD, 2019 : 2nd KDD Workshop on Anomaly Detection in Finance

arXiv:1809.02769 [pdf, other]

Worldcoin: A Hypothetical Cryptocurrency for the People and its Government

Authors: Sheikh Rabiul Islam

Abstract: The world of cryptocurrency is not transparent enough though it was established for innate transparent tracking of capital flows. The most contributing factor is the violation of securities laws and scam in Initial Coin Offering (ICO) which is used to raise capital through crowdfunding. There is a lack of proper regularization and appreciation from governments around the world which is a serious p… ▽ More The world of cryptocurrency is not transparent enough though it was established for innate transparent tracking of capital flows. The most contributing factor is the violation of securities laws and scam in Initial Coin Offering (ICO) which is used to raise capital through crowdfunding. There is a lack of proper regularization and appreciation from governments around the world which is a serious problem for the integrity of cryptocurrency market. We present a hypothetical case study of a new cryptocurrency to establish the transparency and equal right for every citizen to be part of a global system through the collaboration between people and government. The possible outcome is a model of a regulated and trusted cryptocurrency infrastructure that can be further tailored to different sectors with a different scheme. △ Less

Submitted 8 September, 2018; originally announced September 2018.

Comments: Under dual review in GSU FinTech Conference and The Review of Financial Studies Journal

arXiv:1807.01176 [pdf]

Credit Default Mining Using Combined Machine Learning and Heuristic Approach

Authors: Sheikh Rabiul Islam, William Eberle, Sheikh Khaled Ghafoor

Abstract: Predicting potential credit default accounts in advance is challenging. Traditional statistical techniques typically cannot handle large amounts of data and the dynamic nature of fraud and humans. To tackle this problem, recent research has focused on artificial and computational intelligence based approaches. In this work, we present and validate a heuristic approach to mine potential default acc… ▽ More Predicting potential credit default accounts in advance is challenging. Traditional statistical techniques typically cannot handle large amounts of data and the dynamic nature of fraud and humans. To tackle this problem, recent research has focused on artificial and computational intelligence based approaches. In this work, we present and validate a heuristic approach to mine potential default accounts in advance where a risk probability is precomputed from all previous data and the risk probability for recent transactions are computed as soon they happen. Beside our heuristic approach, we also apply a recently proposed machine learning approach that has not been applied previously on our targeted dataset [15]. As a result, we find that these applied approaches outperform existing state-of-the-art approaches. △ Less

Submitted 2 July, 2018; originally announced July 2018.

Comments: Accepted for ICDATA, 2018

arXiv:1807.00939 [pdf, other]

doi 10.1109/BigData.2018.8622303

Mining Illegal Insider Trading of Stocks: A Proactive Approach

Authors: Sheikh Rabiul Islam, Sheikh Khaled Ghafoor, William Eberle

Abstract: Illegal insider trading of stocks is based on releasing non-public information (e.g., new product launch, quarterly financial report, acquisition or merger plan) before the information is made public. Detecting illegal insider trading is difficult due to the complex, nonlinear, and non-stationary nature of the stock market. In this work, we present an approach that detects and predicts illegal ins… ▽ More Illegal insider trading of stocks is based on releasing non-public information (e.g., new product launch, quarterly financial report, acquisition or merger plan) before the information is made public. Detecting illegal insider trading is difficult due to the complex, nonlinear, and non-stationary nature of the stock market. In this work, we present an approach that detects and predicts illegal insider trading proactively from large heterogeneous sources of structured and unstructured data using a deep-learning based approach combined with discrete signal processing on the time series data. In addition, we use a tree-based approach that visualizes events and actions to aid analysts in their understanding of large amounts of unstructured data. Using existing data, we have discovered that our approach has a good success rate in detecting illegal insider trading patterns. △ Less

Submitted 7 November, 2018; v1 submitted 2 July, 2018; originally announced July 2018.

Comments: Accepted in IEEE BigData 2018

Journal ref: 2018 IEEE International Conference on Big Data (Big Data)

arXiv:1807.00819 [pdf]

doi 10.1145/3093241.3093279

Mining Bad Credit Card Accounts from OLAP and OLTP

Authors: Sheikh Rabiul Islam, William Eberle, Sheikh Khaled Ghafoor

Abstract: Credit card companies classify accounts as a good or bad based on historical data where a bad account may default on payments in the near future. If an account is classified as a bad account, then further action can be taken to investigate the actual nature of the account and take preventive actions. In addition, marking an account as "good" when it is actually bad, could lead to loss of revenue -… ▽ More Credit card companies classify accounts as a good or bad based on historical data where a bad account may default on payments in the near future. If an account is classified as a bad account, then further action can be taken to investigate the actual nature of the account and take preventive actions. In addition, marking an account as "good" when it is actually bad, could lead to loss of revenue - and marking an account as "bad" when it is actually good, could lead to loss of business. However, detecting bad credit card accounts in real time from Online Transaction Processing (OLTP) data is challenging due to the volume of data needed to be processed to compute the risk factor. We propose an approach which precomputes and maintains the risk probability of an account based on historical transactions data from offline data or data from a data warehouse. Furthermore, using the most recent OLTP transactional data, risk probability is calculated for the latest transaction and combined with the previously computed risk probability from the data warehouse. If accumulated risk probability crosses a predefined threshold, then the account is treated as a bad account and is flagged for manual verification. △ Less

Submitted 2 July, 2018; originally announced July 2018.

Comments: Conference proceedings of ICCDA, 2017

Journal ref: Islam, S. R., Eberle, W., & Ghafoor, S. K. (2017, May). Mining Bad Credit Card Accounts from OLAP and OLTP. In Proceedings of the International Conference on Compute and Data Analysis (pp. 129-137). ACM

arXiv:1806.08755

Perfect 3-Colorings on 4-Regular Graph of Order 8

Authors: Sk Rabiul Islam, Sayantan Maity, Ashish Kumar Upadhyay

Abstract: We study the perfect $3$-colorings on 4-regular graphs of order 8. We study the perfect $3$-colorings on 4-regular graphs of order 8. △ Less

Submitted 30 June, 2018; v1 submitted 22 June, 2018; originally announced June 2018.

Comments: There is error in the proof

MSC Class: 03E02; 05C15; 68R05

Showing 1–15 of 15 results for author: Islam, S R