Search | arXiv e-print repository

SoK: Leveraging Transformers for Malware Analysis

Authors: Pradip Kunwar, Kshitiz Aryal, Maanak Gupta, Mahmoud Abdelsalam, Elisa Bertino

Abstract: The introduction of transformers has been an important breakthrough for AI research and application as transformers are the foundation behind Generative AI. A promising application domain for transformers is cybersecurity, in particular the malware domain analysis. The reason is the flexibility of the transformer models in handling long sequential features and understanding contextual relationship… ▽ More The introduction of transformers has been an important breakthrough for AI research and application as transformers are the foundation behind Generative AI. A promising application domain for transformers is cybersecurity, in particular the malware domain analysis. The reason is the flexibility of the transformer models in handling long sequential features and understanding contextual relationships. However, as the use of transformers for malware analysis is still in the infancy stage, it is critical to evaluate, systematize, and contextualize existing literature to foster future research. This Systematization of Knowledge (SoK) paper aims to provide a comprehensive analysis of transformer-based approaches designed for malware analysis. Based on our systematic analysis of existing knowledge, we structure and propose taxonomies based on: (a) how different transformers are adapted, organized, and modified across various use cases; and (b) how diverse feature types and their representation capabilities are reflected. We also provide an inventory of datasets used to explore multiple research avenues in the use of transformers for malware analysis and discuss open challenges with future research directions. We believe that this SoK paper will assist the research community in gaining detailed insights from existing work and will serve as a foundational resource for implementing novel research using transformers for malware analysis. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.01728 [pdf, other]

Explainability Guided Adversarial Evasion Attacks on Malware Detectors

Authors: Kshitiz Aryal, Maanak Gupta, Mahmoud Abdelsalam, Moustafa Saleh

Abstract: As the focus on security of Artificial Intelligence (AI) is becoming paramount, research on crafting and inserting optimal adversarial perturbations has become increasingly critical. In the malware domain, this adversarial sample generation relies heavily on the accuracy and placement of crafted perturbation with the goal of evading a trained classifier. This work focuses on applying explainabilit… ▽ More As the focus on security of Artificial Intelligence (AI) is becoming paramount, research on crafting and inserting optimal adversarial perturbations has become increasingly critical. In the malware domain, this adversarial sample generation relies heavily on the accuracy and placement of crafted perturbation with the goal of evading a trained classifier. This work focuses on applying explainability techniques to enhance the adversarial evasion attack on a machine-learning-based Windows PE malware detector. The explainable tool identifies the regions of PE malware files that have the most significant impact on the decision-making process of a given malware detector, and therefore, the same regions can be leveraged to inject the adversarial perturbation for maximum efficiency. Profiling all the PE malware file regions based on their impact on the malware detector's decision enables the derivation of an efficient strategy for identifying the optimal location for perturbation injection. The strategy should incorporate the region's significance in influencing the malware detector's decision and the sensitivity of the PE malware file's integrity towards modifying that region. To assess the utility of explainable AI in crafting an adversarial sample of Windows PE malware, we utilize the DeepExplainer module of SHAP for determining the contribution of each region of PE malware to its detection by a CNN-based malware detector, MalConv. Furthermore, we analyzed the significance of SHAP values at a more granular level by subdividing each section of Windows PE into small subsections. We then performed an adversarial evasion attack on the subsections based on the corresponding SHAP values of the byte sequences. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2403.06428 [pdf, other]

Intra-Section Code Cave Injection for Adversarial Evasion Attacks on Windows PE Malware File

Authors: Kshitiz Aryal, Maanak Gupta, Mahmoud Abdelsalam, Moustafa Saleh

Abstract: Windows malware is predominantly available in cyberspace and is a prime target for deliberate adversarial evasion attacks. Although researchers have investigated the adversarial malware attack problem, a multitude of important questions remain unanswered, including (a) Are the existing techniques to inject adversarial perturbations in Windows Portable Executable (PE) malware files effective enough… ▽ More Windows malware is predominantly available in cyberspace and is a prime target for deliberate adversarial evasion attacks. Although researchers have investigated the adversarial malware attack problem, a multitude of important questions remain unanswered, including (a) Are the existing techniques to inject adversarial perturbations in Windows Portable Executable (PE) malware files effective enough for evasion purposes?; (b) Does the attack process preserve the original behavior of malware?; (c) Are there unexplored approaches/locations that can be used to carry out adversarial evasion attacks on Windows PE malware?; and (d) What are the optimal locations and sizes of adversarial perturbations required to evade an ML-based malware detector without significant structural change in the PE file? To answer some of these questions, this work proposes a novel approach that injects a code cave within the section (i.e., intra-section) of Windows PE malware files to make space for adversarial perturbations. In addition, a code loader is also injected inside the PE file, which reverts adversarial malware to its original form during the execution, preserving the malware's functionality and executability. To understand the effectiveness of our approach, we injected adversarial perturbations inside the .text, .data and .rdata sections, generated using the gradient descent and Fast Gradient Sign Method (FGSM), to target the two popular CNN-based malware detectors, MalConv and MalConv2. Our experiments yielded notable results, achieving a 92.31% evasion rate with gradient descent and 96.26% with FGSM against MalConv, compared to the 16.17% evasion rate for append attacks. Similarly, when targeting MalConv2, our approach achieved a remarkable maximum evasion rate of 97.93% with gradient descent and 94.34% with FGSM, significantly surpassing the 4.01% evasion rate observed with append attacks. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2402.09097 [pdf, other]

A Digital Twin prototype for traffic sign recognition of a learning-enabled autonomous vehicle

Authors: Mohamed AbdElSalam, Loai Ali, Saddek Bensalem, Weicheng He, Panagiotis Katsaros, Nikolaos Kekatos, Doron Peled, Anastasios Temperekidis, Changshun Wu

Abstract: In this paper, we present a novel digital twin prototype for a learning-enabled self-driving vehicle. The primary objective of this digital twin is to perform traffic sign recognition and lane kee**. The digital twin architecture relies on co-simulation and uses the Functional Mock-up Interface and SystemC Transaction Level Modeling standards. The digital twin consists of four clients, i) a vehi… ▽ More In this paper, we present a novel digital twin prototype for a learning-enabled self-driving vehicle. The primary objective of this digital twin is to perform traffic sign recognition and lane kee**. The digital twin architecture relies on co-simulation and uses the Functional Mock-up Interface and SystemC Transaction Level Modeling standards. The digital twin consists of four clients, i) a vehicle model that is designed in Amesim tool, ii) an environment model developed in Prescan, iii) a lane-kee** controller designed in Robot Operating System, and iv) a perception and speed control module developed in the formal modeling language of BIP (Behavior, Interaction, Priority). These clients interface with the digital twin platform, PAVE360-Veloce System Interconnect (PAVE360-VSI). PAVE360-VSI acts as the co-simulation orchestrator and is responsible for synchronization, interconnection, and data exchange through a server. The server establishes connections among the different clients and also ensures adherence to the Ethernet protocol. We conclude with illustrative digital twin simulations and recommendations for future work. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2310.08312 [pdf, other]

GePSAn: Generative Procedure Step Anticipation in Cooking Videos

Authors: Mohamed Ashraf Abdelsalam, Samrudhdhi B. Rangrej, Isma Hadji, Nikita Dvornik, Konstantinos G. Derpanis, Afsaneh Fazly

Abstract: We study the problem of future step anticipation in procedural videos. Given a video of an ongoing procedural activity, we predict a plausible next procedure step described in rich natural language. While most previous work focus on the problem of data scarcity in procedural video datasets, another core challenge of future anticipation is how to account for multiple plausible future realizations i… ▽ More We study the problem of future step anticipation in procedural videos. Given a video of an ongoing procedural activity, we predict a plausible next procedure step described in rich natural language. While most previous work focus on the problem of data scarcity in procedural video datasets, another core challenge of future anticipation is how to account for multiple plausible future realizations in natural settings. This problem has been largely overlooked in previous work. To address this challenge, we frame future step prediction as modelling the distribution of all possible candidates for the next step. Specifically, we design a generative model that takes a series of video clips as input, and generates multiple plausible and diverse candidates (in natural language) for the next step. Following previous work, we side-step the video annotation scarcity by pretraining our model on a large text-based corpus of procedural activities, and then transfer the model to the video domain. Our experiments, both in textual and video domains, show that our model captures diversity in the next step prediction and generates multiple plausible future predictions. Moreover, our model establishes new state-of-the-art results on YouCookII, where it outperforms existing baselines on the next step anticipation. Finally, we also show that our model can successfully transfer from text to the video domain zero-shot, ie, without fine-tuning or adaptation, and produces good-quality future step predictions from video. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: published at ICCV 2023

arXiv:2303.01679 [pdf, other]

Automated Machine Learning for Deep Learning based Malware Detection

Authors: Austin Brown, Maanak Gupta, Mahmoud Abdelsalam

Abstract: Deep learning (DL) has proven to be effective in detecting sophisticated malware that is constantly evolving. Even though deep learning has alleviated the feature engineering problem, finding the most optimal DL model, in terms of neural architecture search (NAS) and the model's optimal set of hyper-parameters, remains a challenge that requires domain expertise. In addition, many of the proposed s… ▽ More Deep learning (DL) has proven to be effective in detecting sophisticated malware that is constantly evolving. Even though deep learning has alleviated the feature engineering problem, finding the most optimal DL model, in terms of neural architecture search (NAS) and the model's optimal set of hyper-parameters, remains a challenge that requires domain expertise. In addition, many of the proposed state-of-the-art models are very complex and may not be the best fit for different datasets. A promising approach, known as Automated Machine Learning (AutoML), can reduce the domain expertise required to implement a custom DL model. AutoML reduces the amount of human trial-and-error involved in designing DL models, and in more recent implementations can find new model architectures with relatively low computational overhead. This work provides a comprehensive analysis and insights on using AutoML for static and online malware detection. For static, our analysis is performed on two widely used malware datasets: SOREL-20M to demonstrate efficacy on large datasets; and EMBER-2018, a smaller dataset specifically curated to hinder the performance of machine learning models. In addition, we show the effects of tuning the NAS process parameters on finding a more optimal malware detection model on these static analysis datasets. Further, we also demonstrate that AutoML is performant in online malware detection scenarios using Convolutional Neural Networks (CNNs) for cloud IaaS. We compare an AutoML technique to six existing state-of-the-art CNNs using a newly generated online malware dataset with and without other applications running in the background during malware execution.In general, our experimental results show that the performance of AutoML based static and online malware detection models are on par or even better than state-of-the-art models or hand-designed models presented in literature. △ Less

Submitted 3 November, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

arXiv:2301.01044 [pdf, other]

Analysis of Label-Flip Poisoning Attack on Machine Learning Based Malware Detector

Authors: Kshitiz Aryal, Maanak Gupta, Mahmoud Abdelsalam

Abstract: With the increase in machine learning (ML) applications in different domains, incentives for deceiving these models have reached more than ever. As data is the core backbone of ML algorithms, attackers shifted their interest toward polluting the training data. Data credibility is at even higher risk with the rise of state-of-art research topics like open design principles, federated learning, and… ▽ More With the increase in machine learning (ML) applications in different domains, incentives for deceiving these models have reached more than ever. As data is the core backbone of ML algorithms, attackers shifted their interest toward polluting the training data. Data credibility is at even higher risk with the rise of state-of-art research topics like open design principles, federated learning, and crowd-sourcing. Since the machine learning model depends on different stakeholders for obtaining data, there are no reliable automated mechanisms to verify the veracity of data from each source. Malware detection is arduous due to its malicious nature with the addition of metamorphic and polymorphic ability in the evolving samples. ML has proven to solve the zero-day malware detection problem, which is unresolved by traditional signature-based approaches. The poisoning of malware training data can allow the malware files to go undetected by the ML-based malware detectors, hel** the attackers to fulfill their malicious goals. A feasibility analysis of the data poisoning threat in the malware detection domain is still lacking. Our work will focus on two major sections: training ML-based malware detectors and poisoning the training data using the label-poisoning approach. We will analyze the robustness of different machine learning models against data poisoning with varying volumes of poisoning data. △ Less

Submitted 3 January, 2023; originally announced January 2023.

arXiv:2210.14862 [pdf, other]

Visual Semantic Parsing: From Images to Abstract Meaning Representation

Authors: Mohamed Ashraf Abdelsalam, Zhan Shi, Federico Fancellu, Kalliopi Basioti, Dhaivat J. Bhatt, Vladimir Pavlovic, Afsaneh Fazly

Abstract: The success of scene graphs for visual scene understanding has brought attention to the benefits of abstracting a visual input (e.g., image) into a structured representation, where entities (people and objects) are nodes connected by edges specifying their relations. Building these representations, however, requires expensive manual annotation in the form of images paired with their scene graphs o… ▽ More The success of scene graphs for visual scene understanding has brought attention to the benefits of abstracting a visual input (e.g., image) into a structured representation, where entities (people and objects) are nodes connected by edges specifying their relations. Building these representations, however, requires expensive manual annotation in the form of images paired with their scene graphs or frames. These formalisms remain limited in the nature of entities and relations they can capture. In this paper, we propose to leverage a widely-used meaning representation in the field of natural language processing, the Abstract Meaning Representation (AMR), to address these shortcomings. Compared to scene graphs, which largely emphasize spatial relationships, our visual AMR graphs are more linguistically informed, with a focus on higher-level semantic concepts extrapolated from visual input. Moreover, they allow us to generate meta-AMR graphs to unify information contained in multiple image descriptions under one representation. Through extensive experimentation and analysis, we demonstrate that we can re-purpose an existing text-to-AMR parser to parse images into AMRs. Our findings point to important future research directions for improved scene understanding. △ Less

Submitted 27 October, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

Comments: published in CoNLL 2022

arXiv:2208.04891 [pdf, other]

Online Malware Classification with System-Wide System Calls in Cloud IaaS

Authors: Phillip Brown, Austin Brown, Maanak Gupta, Mahmoud Abdelsalam

Abstract: Accurately classifying malware in an environment allows the creation of better response and remediation strategies by cyber analysts. However, classifying malware in a live environment is a difficult task due to the large number of system data sources. Collecting statistics from these separate sources and processing them together in a form that can be used by a machine learning model is difficult.… ▽ More Accurately classifying malware in an environment allows the creation of better response and remediation strategies by cyber analysts. However, classifying malware in a live environment is a difficult task due to the large number of system data sources. Collecting statistics from these separate sources and processing them together in a form that can be used by a machine learning model is difficult. Fortunately, all of these resources are mediated by the operating system's kernel. User programs, malware included, interacts with system resources by making requests to the kernel with system calls. Collecting these system calls provide insight to the interaction with many system resources in a single location. Feeding these system calls into a performant model such as a random forest allows fast, accurate classification in certain situations. In this paper, we evaluate the feasibility of using system call sequences for online malware classification in both low-activity and heavy-use Cloud IaaS. We collect system calls as they are received by the kernel and take n-gram sequences of calls to use as features for tree-based machine learning models. We discuss the performance of the models on baseline systems with no extra running services and systems under heavy load and the performance gap between them. △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: Accepted at the IEEE 23rd International Conference on Information Reuse and Integration for Data Science

arXiv:2207.04354 [pdf, other]

An Introduction to Lifelong Supervised Learning

Authors: Shagun Sodhani, Mojtaba Faramarzi, Sanket Vaibhav Mehta, Pranshu Malviya, Mohamed Abdelsalam, Janarthanan Janarthanan, Sarath Chandar

Abstract: This primer is an attempt to provide a detailed summary of the different facets of lifelong learning. We start with Chapter 2 which provides a high-level overview of lifelong learning systems. In this chapter, we discuss prominent scenarios in lifelong learning (Section 2.4), provide 8 Introduction a high-level organization of different lifelong learning approaches (Section 2.5), enumerate the des… ▽ More This primer is an attempt to provide a detailed summary of the different facets of lifelong learning. We start with Chapter 2 which provides a high-level overview of lifelong learning systems. In this chapter, we discuss prominent scenarios in lifelong learning (Section 2.4), provide 8 Introduction a high-level organization of different lifelong learning approaches (Section 2.5), enumerate the desiderata for an ideal lifelong learning system (Section 2.6), discuss how lifelong learning is related to other learning paradigms (Section 2.7), describe common metrics used to evaluate lifelong learning systems (Section 2.8). This chapter is more useful for readers who are new to lifelong learning and want to get introduced to the field without focusing on specific approaches or benchmarks. The remaining chapters focus on specific aspects (either learning algorithms or benchmarks) and are more useful for readers who are looking for specific approaches or benchmarks. Chapter 3 focuses on regularization-based approaches that do not assume access to any data from previous tasks. Chapter 4 discusses memory-based approaches that typically use a replay buffer or an episodic memory to save subset of data across different tasks. Chapter 5 focuses on different architecture families (and their instantiations) that have been proposed for training lifelong learning systems. Following these different classes of learning algorithms, we discuss the commonly used evaluation benchmarks and metrics for lifelong learning (Chapter 6) and wrap up with a discussion of future challenges and important research directions in Chapter 7. △ Less

Submitted 12 July, 2022; v1 submitted 9 July, 2022; originally announced July 2022.

Comments: Lifelong Learning Primer

arXiv:2207.01739 [pdf, other]

Machine Learning in Access Control: A Taxonomy and Survey

Authors: Mohammad Nur Nobi, Maanak Gupta, Lopamudra Praharaj, Mahmoud Abdelsalam, Ram Krishnan, Ravi Sandhu

Abstract: An increasing body of work has recognized the importance of exploiting machine learning (ML) advancements to address the need for efficient automation in extracting access control attributes, policy mining, policy verification, access decisions, etc. In this work, we survey and summarize various ML approaches to solve different access control problems. We propose a novel taxonomy of the ML model's… ▽ More An increasing body of work has recognized the importance of exploiting machine learning (ML) advancements to address the need for efficient automation in extracting access control attributes, policy mining, policy verification, access decisions, etc. In this work, we survey and summarize various ML approaches to solve different access control problems. We propose a novel taxonomy of the ML model's application in the access control domain. We highlight current limitations and open challenges such as lack of public real-world datasets, administration of ML-based access control systems, understanding a black-box ML model's decision, etc., and enumerate future research directions. △ Less

Submitted 4 July, 2022; originally announced July 2022.

Comments: Submitted to ACM Computing Survey

arXiv:2203.12469 [pdf]

3D Adapted Random Forest Vision (3DARFV) for Untangling Heterogeneous-Fabric Exceeding Deep Learning Semantic Segmentation Efficiency at the Utmost Accuracy

Authors: Omar Alfarisi, Zeyar Aung, Qingfeng Huang, Ashraf Al-Khateeb, Hamed Alhashmi, Mohamed Abdelsalam, Salem Alzaabi, Haifa Alyazeedi, Anthony Tzes

Abstract: Planetary exploration depends heavily on 3D image data to characterize the static and dynamic properties of the rock and environment. Analyzing 3D images requires many computations, causing efficiency to suffer lengthy processing time alongside large energy consumption. High-Performance Computing (HPC) provides apparent efficiency at the expense of energy consumption. However, for remote explorati… ▽ More Planetary exploration depends heavily on 3D image data to characterize the static and dynamic properties of the rock and environment. Analyzing 3D images requires many computations, causing efficiency to suffer lengthy processing time alongside large energy consumption. High-Performance Computing (HPC) provides apparent efficiency at the expense of energy consumption. However, for remote explorations, the conveyed surveillance and the robotized sensing need faster data analysis with ultimate accuracy to make real-time decisions. In such environments, access to HPC and energy is limited. Therefore, we realize that reducing the number of computations to optimal and maintaining the desired accuracy leads to higher efficiency. This paper demonstrates the semantic segmentation capability of a probabilistic decision tree algorithm, 3D Adapted Random Forest Vision (3DARFV), exceeding deep learning algorithm efficiency at the utmost accuracy. △ Less

Submitted 23 March, 2022; originally announced March 2022.

arXiv:2111.08223 [pdf, other]

A Survey on Adversarial Attacks for Malware Analysis

Authors: Kshitiz Aryal, Maanak Gupta, Mahmoud Abdelsalam

Abstract: Machine learning has witnessed tremendous growth in its adoption and advancement in the last decade. The evolution of machine learning from traditional algorithms to modern deep learning architectures has shaped the way today's technology functions. Its unprecedented ability to discover knowledge/patterns from unstructured data and automate the decision-making process led to its application in wid… ▽ More Machine learning has witnessed tremendous growth in its adoption and advancement in the last decade. The evolution of machine learning from traditional algorithms to modern deep learning architectures has shaped the way today's technology functions. Its unprecedented ability to discover knowledge/patterns from unstructured data and automate the decision-making process led to its application in wide domains. High flying machine learning arena has been recently pegged back by the introduction of adversarial attacks. Adversaries are able to modify data, maximizing the classification error of the models. The discovery of blind spots in machine learning models has been exploited by adversarial attackers by generating subtle intentional perturbations in test samples. Increasing dependency on data has paved the blueprint for ever-high incentives to camouflage machine learning models. To cope with probable catastrophic consequences in the future, continuous research is required to find vulnerabilities in form of adversarial and design remedies in systems. This survey aims at providing the encyclopedic introduction to adversarial attacks that are carried out against malware detection systems. The paper will introduce various machine learning techniques used to generate adversarial and explain the structure of target files. The survey will also model the threat posed by the adversary and followed by brief descriptions of widely accepted adversarial algorithms. Work will provide a taxonomy of adversarial evasion attacks on the basis of attack domain and adversarial generation techniques. Adversarial evasion attacks carried out against malware detectors will be discussed briefly under each taxonomical headings and compared with concomitant researches. Analyzing the current research challenges in an adversarial generation, the survey will conclude by pinpointing the open future research directions. △ Less

Submitted 5 January, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

Comments: 48 Pages, 31 Figures, 11 Tables

arXiv:2111.00099 [pdf, other]

Autoencoder-based Anomaly Detection in Smart Farming Ecosystem

Authors: Mary Adkisson, Jeffrey C Kimmel, Maanak Gupta, Mahmoud Abdelsalam

Abstract: The inclusion of Internet of Things (IoT) devices is growing rapidly in all application domains. Smart Farming supports devices connected, and with the support of Internet, cloud or edge computing infrastructure provide remote control of watering and fertilization, real time monitoring of farm conditions, and provide solutions to more sustainable practices. This could involve using irrigation syst… ▽ More The inclusion of Internet of Things (IoT) devices is growing rapidly in all application domains. Smart Farming supports devices connected, and with the support of Internet, cloud or edge computing infrastructure provide remote control of watering and fertilization, real time monitoring of farm conditions, and provide solutions to more sustainable practices. This could involve using irrigation systems only when the detected soil moisture level is low or stop when the plant reaches a sufficient level of soil moisture content. These improvements to efficiency and ease of use come with added risks to security and privacy. Cyber attacks in large coordinated manner can disrupt economy of agriculture-dependent nations. To the sensors in the system, an attack may appear as anomalous behaviour. In this context, there are possibilities of anomalies generated due to faulty hardware, issues in network connectivity (if present), or simply abrupt changes to the environment due to weather, human accident, or other unforeseen circumstances. To make such systems more secure, it is imperative to detect such data discrepancies, and trigger appropriate mitigation mechanisms. In this paper, we propose an anomaly detection model for Smart Farming using an unsupervised Autoencoder machine learning model. We chose to use an Autoencoder because it encodes and decodes data and attempts to ignore outliers. When it encounters anomalous data the result will be a high reconstruction loss value, signaling that this data was not like the rest. Our model was trained and tested on data collected from our designed greenhouse test-bed. Proposed Autoencoder model based anomaly detection achieved 98.98% and took 262 seconds to train and has a detection time of .0585 seconds. △ Less

Submitted 29 October, 2021; originally announced November 2021.

arXiv:2106.10619 [pdf, other]

A Brief Study on the Effects of Training Generative Dialogue Models with a Semantic loss

Authors: Prasanna Parthasarathi, Mohamed Abdelsalam, Joelle Pineau, Sarath Chandar

Abstract: Neural models trained for next utterance generation in dialogue task learn to mimic the n-gram sequences in the training set with training objectives like negative log-likelihood (NLL) or cross-entropy. Such commonly used training objectives do not foster generating alternate responses to a context. But, the effects of minimizing an alternate training objective that fosters a model to generate alt… ▽ More Neural models trained for next utterance generation in dialogue task learn to mimic the n-gram sequences in the training set with training objectives like negative log-likelihood (NLL) or cross-entropy. Such commonly used training objectives do not foster generating alternate responses to a context. But, the effects of minimizing an alternate training objective that fosters a model to generate alternate response and score it on semantic similarity has not been well studied. We hypothesize that a language generation model can improve on its diversity by learning to generate alternate text during training and minimizing a semantic loss as an auxiliary objective. We explore this idea on two different sized data sets on the task of next utterance generation in goal oriented dialogues. We make two observations (1) minimizing a semantic objective improved diversity in responses in the smaller data set (Frames) but only as-good-as minimizing the NLL in the larger data set (MultiWoZ) (2) large language model embeddings can be more useful as a semantic loss objective than as initialization for token embeddings. △ Less

Submitted 20 June, 2021; originally announced June 2021.

Comments: Accepted at SIGDial 2021

arXiv:2105.09268 [pdf, other]

Analyzing Machine Learning Approaches for Online Malware Detection in Cloud

Authors: Jeffrey C Kimmell, Mahmoud Abdelsalam, Maanak Gupta

Abstract: The variety of services and functionality offered by various cloud service providers (CSP) have exploded lately. Utilizing such services has created numerous opportunities for enterprises infrastructure to become cloud-based and, in turn, assisted the enterprises to easily and flexibly offer services to their customers. The practice of renting out access to servers to clients for computing and sto… ▽ More The variety of services and functionality offered by various cloud service providers (CSP) have exploded lately. Utilizing such services has created numerous opportunities for enterprises infrastructure to become cloud-based and, in turn, assisted the enterprises to easily and flexibly offer services to their customers. The practice of renting out access to servers to clients for computing and storage purposes is known as Infrastructure as a Service (IaaS). The popularity of IaaS has led to serious and critical concerns with respect to the cyber security and privacy. In particular, malware is often leveraged by malicious entities against cloud services to compromise sensitive data or to obstruct their functionality. In response to this growing menace, malware detection for cloud environments has become a widely researched topic with numerous methods being proposed and deployed. In this paper, we present online malware detection based on process level performance metrics, and analyze the effectiveness of different baseline machine learning models including, Support Vector Classifier (SVC), Random Forest Classifier (RFC), KNearest Neighbor (KNN), Gradient Boosted Classifier (GBC), Gaussian Naive Bayes (GNB) and Convolutional Neural Networks (CNN). Our analysis conclude that neural network models can most accurately detect the impact malware have on the process level features of virtual machines in the cloud, and therefore are best suited to detect them. Our models were trained, validated, and tested by using a dataset of 40,680 malicious and benign samples. The dataset was complied by running different families of malware (collected from VirusTotal) in a live cloud environment and collecting the process level features. △ Less

Submitted 19 May, 2021; originally announced May 2021.

arXiv:2012.12477 [pdf, other]

IIRC: Incremental Implicitly-Refined Classification

Authors: Mohamed Abdelsalam, Mojtaba Faramarzi, Shagun Sodhani, Sarath Chandar

Abstract: We introduce the "Incremental Implicitly-Refined Classi-fication (IIRC)" setup, an extension to the class incremental learning setup where the incoming batches of classes have two granularity levels. i.e., each sample could have a high-level (coarse) label like "bear" and a low-level (fine) label like "polar bear". Only one label is provided at a time, and the model has to figure out the other lab… ▽ More We introduce the "Incremental Implicitly-Refined Classi-fication (IIRC)" setup, an extension to the class incremental learning setup where the incoming batches of classes have two granularity levels. i.e., each sample could have a high-level (coarse) label like "bear" and a low-level (fine) label like "polar bear". Only one label is provided at a time, and the model has to figure out the other label if it has already learnfed it. This setup is more aligned with real-life scenarios, where a learner usually interacts with the same family of entities multiple times, discovers more granularity about them, while still trying not to forget previous knowledge. Moreover, this setup enables evaluating models for some important lifelong learning challenges that cannot be easily addressed under the existing setups. These challenges can be motivated by the example "if a model was trained on the class bear in one task and on polar bear in another task, will it forget the concept of bear, will it rightfully infer that a polar bear is still a bear? and will it wrongfully associate the label of polar bear to other breeds of bear?". We develop a standardized benchmark that enables evaluating models on the IIRC setup. We evaluate several state-of-the-art lifelong learning algorithms and highlight their strengths and limitations. For example, distillation-based methods perform relatively well but are prone to incorrectly predicting too many labels per image. We hope that the proposed setup, along with the benchmark, would provide a meaningful problem setting to the practitioners △ Less

Submitted 11 January, 2021; v1 submitted 22 December, 2020; originally announced December 2020.

arXiv:2009.11101 [pdf, other]

AI assisted Malware Analysis: A Course for Next Generation Cybersecurity Workforce

Authors: Maanak Gupta, Sudip Mittal, Mahmoud Abdelsalam

Abstract: The use of Artificial Intelligence (AI) and Machine Learning (ML) to solve cybersecurity problems has been gaining traction within industry and academia, in part as a response to widespread malware attacks on critical systems, such as cloud infrastructures, government offices or hospitals, and the vast amounts of data they generate. AI- and ML-assisted cybersecurity offers data-driven automation t… ▽ More The use of Artificial Intelligence (AI) and Machine Learning (ML) to solve cybersecurity problems has been gaining traction within industry and academia, in part as a response to widespread malware attacks on critical systems, such as cloud infrastructures, government offices or hospitals, and the vast amounts of data they generate. AI- and ML-assisted cybersecurity offers data-driven automation that could enable security systems to identify and respond to cyber threats in real time. However, there is currently a shortfall of professionals trained in AI and ML for cybersecurity. Here we address the shortfall by develo** lab-intensive modules that enable undergraduate and graduate students to gain fundamental and advanced knowledge in applying AI and ML techniques to real-world datasets to learn about Cyber Threat Intelligence (CTI), malware analysis, and classification, among other important topics in cybersecurity. Here we describe six self-contained and adaptive modules in "AI-assisted Malware Analysis." Topics include: (1) CTI and malware attack stages, (2) malware knowledge representation and CTI sharing, (3) malware data collection and feature identification, (4) AI-assisted malware detection, (5) malware classification and attribution, and (6) advanced malware research topics and case studies such as adversarial learning and Advanced Persistent Threat (APT) detection. △ Less

Submitted 21 September, 2020; originally announced September 2020.

arXiv:2004.09246 [pdf, other]

Enabling and Enforcing Social Distancing Measures using Smart City and ITS Infrastructures: A COVID-19 Use Case

Authors: Maanak Gupta, Mahmoud Abdelsalam, Sudip Mittal

Abstract: Internet of Things is a revolutionary domain that has the caliber to impact our lives and bring significant changes to the world. Several IoT applications have been envisioned to facilitate data driven and smart application for the user. Smart City and Intelligent Transportation System (ITS) offer a futuristic vision of smart, secure and safe experience to the end user, and at the same time effici… ▽ More Internet of Things is a revolutionary domain that has the caliber to impact our lives and bring significant changes to the world. Several IoT applications have been envisioned to facilitate data driven and smart application for the user. Smart City and Intelligent Transportation System (ITS) offer a futuristic vision of smart, secure and safe experience to the end user, and at the same time efficiently manage the sparse resources and optimize the efficiency of city operations. However, outbreaks and pandemics like COVID-19 have revealed limitations of the existing deployments, therefore, architecture, applications and technology systems need to be developed for swift and timely enforcement of guidelines, rules and government orders to contain such future outbreaks. This work outlines novel architecture, potential use-cases and some future directions in develo** such applications using Smart City and ITS. △ Less

Submitted 13 April, 2020; originally announced April 2020.

arXiv:2002.06383 [pdf, other]

doi 10.1007/978-3-030-59635-4_5

Analyzing CNN Based Behavioural Malware Detection Techniques on Cloud IaaS

Authors: Andrew McDole, Mahmoud Abdelsalam, Maanak Gupta, Sudip Mittal

Abstract: Cloud Infrastructure as a Service (IaaS) is vulnerable to malware due to its exposure to external adversaries, making it a lucrative attack vector for malicious actors. A datacenter infected with malware can cause data loss and/or major disruptions to service for its users. This paper analyzes and compares various Convolutional Neural Networks (CNNs) for online detection of malware in cloud IaaS.… ▽ More Cloud Infrastructure as a Service (IaaS) is vulnerable to malware due to its exposure to external adversaries, making it a lucrative attack vector for malicious actors. A datacenter infected with malware can cause data loss and/or major disruptions to service for its users. This paper analyzes and compares various Convolutional Neural Networks (CNNs) for online detection of malware in cloud IaaS. The detection is performed based on behavioural data using process level performance metrics including cpu usage, memory usage, disk usage etc. We have used the state of the art DenseNets and ResNets in effectively detecting malware in online cloud system. CNN are designed to extract features from data gathered from a live malware running on a real cloud environment. Experiments are performed on OpenStack (a cloud IaaS software) testbed designed to replicate a typical 3-tier web architecture. Comparative analysis is performed for different metrics for different CNN models used in this research. △ Less

Submitted 15 February, 2020; originally announced February 2020.

arXiv:1609.07750 [pdf, other]

Accurate and Efficient Hyperbolic Tangent Activation Function on FPGA using the DCT Interpolation Filter

Authors: Ahmed M. Abdelsalam, J. M. Pierre Langlois, F. Cheriet

Abstract: Implementing an accurate and fast activation function with low cost is a crucial aspect to the implementation of Deep Neural Networks (DNNs) on FPGAs. We propose a high-accuracy approximation approach for the hyperbolic tangent activation function of artificial neurons in DNNs. It is based on the Discrete Cosine Transform Interpolation Filter (DCTIF). The proposed architecture combines simple arit… ▽ More Implementing an accurate and fast activation function with low cost is a crucial aspect to the implementation of Deep Neural Networks (DNNs) on FPGAs. We propose a high-accuracy approximation approach for the hyperbolic tangent activation function of artificial neurons in DNNs. It is based on the Discrete Cosine Transform Interpolation Filter (DCTIF). The proposed architecture combines simple arithmetic operations on stored samples of the hyperbolic tangent function and on input data. The proposed DCTIF implementation achieves two orders of magnitude greater precision than previous work while using the same or fewer computational resources. Various combinations of DCTIF parameters can be chosen to tradeoff the accuracy and complexity of the hyperbolic tangent function. In one case, the proposed architecture approximates the hyperbolic tangent activation function with 10E-5 maximum error while requiring only 1.52 Kbits memory and 57 LUTs of a Virtex-7 FPGA. We also discuss how the activation function accuracy affects the performance of DNNs in terms of their training and testing accuracies. We show that a high accuracy approximation can be necessary in order to maintain the same DNN training and testing performances realized by the exact function. △ Less

Submitted 25 September, 2016; originally announced September 2016.

Comments: 8 pages, 6 figures, 5 tables, submitted for the 25th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (ISFPGA), 22-24 February 2017, California, USA

Showing 1–21 of 21 results for author: Abdelsalam, M