Search | arXiv e-print repository

NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative

Authors: Asmar Nadeem, Faegheh Sardari, Robert Dawes, Syed Sameed Husain, Adrian Hilton, Armin Mustafa

Abstract: Existing video captioning benchmarks and models lack coherent representations of causal-temporal narrative, which is sequences of events linked through cause and effect, unfolding over time and driven by characters or agents. This lack of narrative restricts models' ability to generate text descriptions that capture the causal and temporal dynamics inherent in video content. To address this gap, w… ▽ More Existing video captioning benchmarks and models lack coherent representations of causal-temporal narrative, which is sequences of events linked through cause and effect, unfolding over time and driven by characters or agents. This lack of narrative restricts models' ability to generate text descriptions that capture the causal and temporal dynamics inherent in video content. To address this gap, we propose NarrativeBridge, an approach comprising of: (1) a novel Causal-Temporal Narrative (CTN) captions benchmark generated using a large language model and few-shot prompting, explicitly encoding cause-effect temporal relationships in video descriptions, evaluated automatically to ensure caption quality and relevance; and (2) a dedicated Cause-Effect Network (CEN) architecture with separate encoders for capturing cause and effect dynamics independently, enabling effective learning and generation of captions with causal-temporal narrative. Extensive experiments demonstrate that CEN is more accurate in articulating the causal and temporal aspects of video content than the second best model (GIT): 17.88 and 17.44 CIDEr on the MSVD and MSR-VTT datasets, respectively. The proposed framework understands and generates nuanced text descriptions with intricate causal-temporal narrative structures present in videos, addressing a critical limitation in video captioning. For project details, visit https://narrativebridge.github.io/. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2310.16754 [pdf, other]

CAD -- Contextual Multi-modal Alignment for Dynamic AVQA

Authors: Asmar Nadeem, Adrian Hilton, Robert Dawes, Graham Thomas, Armin Mustafa

Abstract: In the context of Audio Visual Question Answering (AVQA) tasks, the audio visual modalities could be learnt on three levels: 1) Spatial, 2) Temporal, and 3) Semantic. Existing AVQA methods suffer from two major shortcomings; the audio-visual (AV) information passing through the network isn't aligned on Spatial and Temporal levels; and, inter-modal (audio and visual) Semantic information is often n… ▽ More In the context of Audio Visual Question Answering (AVQA) tasks, the audio visual modalities could be learnt on three levels: 1) Spatial, 2) Temporal, and 3) Semantic. Existing AVQA methods suffer from two major shortcomings; the audio-visual (AV) information passing through the network isn't aligned on Spatial and Temporal levels; and, inter-modal (audio and visual) Semantic information is often not balanced within a context; this results in poor performance. In this paper, we propose a novel end-to-end Contextual Multi-modal Alignment (CAD) network that addresses the challenges in AVQA methods by i) introducing a parameter-free stochastic Contextual block that ensures robust audio and visual alignment on the Spatial level; ii) proposing a pre-training technique for dynamic audio and visual alignment on Temporal level in a self-supervised setting, and iii) introducing a cross-attention mechanism to balance audio and visual information on Semantic level. The proposed novel CAD network improves the overall performance over the state-of-the-art methods on average by 9.4% on the MUSIC-AVQA dataset. We also demonstrate that our proposed contributions to AVQA can be added to the existing methods to improve their performance without additional complexity requirements. △ Less

Submitted 27 October, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

Comments: Accepted to IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024

arXiv:2310.13079 [pdf, other]

Critical Path Prioritization Dashboard for Alert-driven Attack Graphs

Authors: Sònia Leal Díaz, Sergio Pastrana, Azqa Nadeem

Abstract: Although intrusion alerts can provide threat intelligence regarding attacker strategies, extracting such intelligence via existing tools is expensive and time-consuming. Earlier work has proposed SAGE, which generates attack graphs from intrusion alerts using unsupervised sequential machine learning. This paper proposes a querying and prioritization-enabled visual analytics dashboard for SAGE. The… ▽ More Although intrusion alerts can provide threat intelligence regarding attacker strategies, extracting such intelligence via existing tools is expensive and time-consuming. Earlier work has proposed SAGE, which generates attack graphs from intrusion alerts using unsupervised sequential machine learning. This paper proposes a querying and prioritization-enabled visual analytics dashboard for SAGE. The dashboard has three main components: (i) a Graph Explorer that presents a global view of all attacker strategies, (ii) a Timeline Viewer that correlates attacker actions chronologically, and (iii) a Recommender Matrix that highlights prevalent critical alerts via a MITRE ATT&CK-inspired attack stage matrix. We describe the utility of the proposed dashboard using intrusion alerts collected from a distributed multi-stage team-based attack scenario. We evaluate the utility of the dashboard through a user study. Based on the responses of a small set of security practitioners, we find that the dashboard is useful in depicting attacker strategies and attack progression, but can be improved in terms of usability. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.07625 [pdf, other]

doi 10.1145/3626252.3630821

Cybersecurity as a Crosscutting Concept Across an Undergrad Computer Science Curriculum: An Experience Report

Authors: Azqa Nadeem

Abstract: Although many Computer Science (CS) programs offer cybersecurity courses, they are typically optional and placed at the periphery of the program. We advocate to integrate cybersecurity as a crosscutting concept in CS curricula, which is also consistent with latest cybersecurity curricular guidelines, e.g., CSEC2017. We describe our experience of implementing this crosscutting intervention across t… ▽ More Although many Computer Science (CS) programs offer cybersecurity courses, they are typically optional and placed at the periphery of the program. We advocate to integrate cybersecurity as a crosscutting concept in CS curricula, which is also consistent with latest cybersecurity curricular guidelines, e.g., CSEC2017. We describe our experience of implementing this crosscutting intervention across three undergraduate core CS courses at a leading technical university in Europe between 2018 and 2023, collectively educating over 2200 students. The security education was incorporated within CS courses using a partnership between the responsible course instructor and a security expert, i.e., the security expert (after consultation with course instructors) developed and taught lectures covering multiple CSEC2017 knowledge areas. This created a complex dynamic between three stakeholders: the course instructor, the security expert, and the students. We reflect on our intervention from the perspective of the three stakeholders -- we conducted a post-course survey to collect student perceptions, and semi-supervised interviews with responsible course instructors and the security expert to gauge their experience. We found that while the students were extremely enthusiastic about the security content and retained its impact several years later, the misaligned incentives for the instructors and the security expert made it difficult to sustain this intervention without organizational support. By identifying limitations in our intervention, we suggest ideas for sustaining it. △ Less

Submitted 16 January, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: 6 pages; Accepted at SIGCSE TS '24

arXiv:2308.16464 [pdf, other]

MaintainoMATE: A GitHub App for Intelligent Automation of Maintenance Activities

Authors: Anas Nadeem, Muhammad Usman Sarwar, Muhammad Zubair Malik

Abstract: Software development projects rely on issue tracking systems at the core of tracking maintenance tasks such as bug reports, and enhancement requests. Incoming issue-reports on these issue tracking systems must be managed in an effective manner. First, they must be labelled and then assigned to a particular developer with relevant expertise. This handling of issue-reports is critical and requires t… ▽ More Software development projects rely on issue tracking systems at the core of tracking maintenance tasks such as bug reports, and enhancement requests. Incoming issue-reports on these issue tracking systems must be managed in an effective manner. First, they must be labelled and then assigned to a particular developer with relevant expertise. This handling of issue-reports is critical and requires thorough scanning of the text entered in an issue-report making it a labor-intensive task. In this paper, we present a unified framework called MaintainoMATE, which is capable of automatically categorizing the issue-reports in their respective category and further assigning the issue-reports to a developer with relevant expertise. We use the Bidirectional Encoder Representations from Transformers (BERT), as an underlying model for MaintainoMATE to learn the contextual information for automatic issue-report labeling and assignment tasks. We deploy the framework used in this work as a GitHub application. We empirically evaluate our approach on GitHub issue-reports to show its capability of assigning labels to the issue-reports. We were able to achieve an F1-score close to 80\%, which is comparable to existing state-of-the-art results. Similarly, our initial evaluations show that we can assign relevant developers to the issue-reports with an F1 score of 54\%, which is a significant improvement over existing approaches. Our initial findings suggest that MaintainoMATE has the potential of improving software quality and reducing maintenance costs by accurately automating activities involved in the maintenance processes. Our future work would be directed towards improving the issue-assignment module. △ Less

Submitted 31 August, 2023; originally announced August 2023.

arXiv:2303.14829 [pdf, other]

SEM-POS: Grammatically and Semantically Correct Video Captioning

Authors: Asmar Nadeem, Adrian Hilton, Robert Dawes, Graham Thomas, Armin Mustafa

Abstract: Generating grammatically and semantically correct captions in video captioning is a challenging task. The captions generated from the existing methods are either word-by-word that do not align with grammatical structure or miss key information from the input videos. To address these issues, we introduce a novel global-local fusion network, with a Global-Local Fusion Block (GLFB) that encodes and f… ▽ More Generating grammatically and semantically correct captions in video captioning is a challenging task. The captions generated from the existing methods are either word-by-word that do not align with grammatical structure or miss key information from the input videos. To address these issues, we introduce a novel global-local fusion network, with a Global-Local Fusion Block (GLFB) that encodes and fuses features from different parts of speech (POS) components with visual-spatial features. We use novel combinations of different POS components - 'determinant + subject', 'auxiliary verb', 'verb', and 'determinant + object' for supervision of the POS blocks - Det + Subject, Aux Verb, Verb, and Det + Object respectively. The novel global-local fusion network together with POS blocks helps align the visual features with language description to generate grammatically and semantically correct captions. Extensive qualitative and quantitative experiments on benchmark MSVD and MSRVTT datasets demonstrate that the proposed approach generates more grammatically and semantically correct captions compared to the existing methods, achieving the new state-of-the-art. Ablations on the POS blocks and the GLFB demonstrate the impact of the contributions on the proposed method. △ Less

Submitted 4 April, 2023; v1 submitted 26 March, 2023; originally announced March 2023.

arXiv:2208.10605 [pdf, other]

SoK: Explainable Machine Learning for Computer Security Applications

Authors: Azqa Nadeem, Daniël Vos, Clinton Cao, Luca Pajola, Simon Dieck, Robert Baumgartner, Sicco Verwer

Abstract: Explainable Artificial Intelligence (XAI) aims to improve the transparency of machine learning (ML) pipelines. We systematize the increasingly growing (but fragmented) microcosm of studies that develop and utilize XAI methods for defensive and offensive cybersecurity tasks. We identify 3 cybersecurity stakeholders, i.e., model users, designers, and adversaries, who utilize XAI for 4 distinct objec… ▽ More Explainable Artificial Intelligence (XAI) aims to improve the transparency of machine learning (ML) pipelines. We systematize the increasingly growing (but fragmented) microcosm of studies that develop and utilize XAI methods for defensive and offensive cybersecurity tasks. We identify 3 cybersecurity stakeholders, i.e., model users, designers, and adversaries, who utilize XAI for 4 distinct objectives within an ML pipeline, namely 1) XAI-enabled user assistance, 2) XAI-enabled model verification, 3) explanation verification & robustness, and 4) offensive use of explanations. Our analysis of the literature indicates that many of the XAI applications are designed with little understanding of how they might be integrated into analyst workflows -- user studies for explanation evaluation are conducted in only 14% of the cases. The security literature sometimes also fails to disentangle the role of the various stakeholders, e.g., by providing explanations to model users and designers while also exposing them to adversaries. Additionally, the role of model designers is particularly minimized in the security literature. To this end, we present an illustrative tutorial for model designers, demonstrating how XAI can help with model verification. We also discuss scenarios where interpretability by design may be a better alternative. The systematization and the tutorial enable us to challenge several assumptions, and present open problems that can help shape the future of XAI research within cybersecurity. △ Less

Submitted 3 March, 2023; v1 submitted 22 August, 2022; originally announced August 2022.

Comments: 13 pages. Accepted at Euro S&P

arXiv:2206.12190 [pdf, other]

SECLEDS: Sequence Clustering in Evolving Data Streams via Multiple Medoids and Medoid Voting

Authors: Azqa Nadeem, Sicco Verwer

Abstract: Sequence clustering in a streaming environment is challenging because it is computationally expensive, and the sequences may evolve over time. K-medoids or Partitioning Around Medoids (PAM) is commonly used to cluster sequences since it supports alignment-based distances, and the k-centers being actual data items helps with cluster interpretability. However, offline k-medoids has no support for co… ▽ More Sequence clustering in a streaming environment is challenging because it is computationally expensive, and the sequences may evolve over time. K-medoids or Partitioning Around Medoids (PAM) is commonly used to cluster sequences since it supports alignment-based distances, and the k-centers being actual data items helps with cluster interpretability. However, offline k-medoids has no support for concept drift, while also being prohibitively expensive for clustering data streams. We therefore propose SECLEDS, a streaming variant of the k-medoids algorithm with constant memory footprint. SECLEDS has two unique properties: i) it uses multiple medoids per cluster, producing stable high-quality clusters, and ii) it handles concept drift using an intuitive Medoid Voting scheme for approximating cluster distances. Unlike existing adaptive algorithms that create new clusters for new concepts, SECLEDS follows a fundamentally different approach, where the clusters themselves evolve with an evolving stream. Using real and synthetic datasets, we empirically demonstrate that SECLEDS produces high-quality clusters regardless of drift, stream size, data dimensionality, and number of clusters. We compare against three popular stream and batch clustering algorithms. The state-of-the-art BanditPAM is used as an offline benchmark. SECLEDS achieves comparable F1 score to BanditPAM while reducing the number of required distance computations by 83.7%. Importantly, SECLEDS outperforms all baselines by 138.7% when the stream contains drift. We also cluster real network traffic, and provide evidence that SECLEDS can support network bandwidths of up to 1.08 Gbps while using the (expensive) dynamic time war** distance. △ Less

Submitted 24 June, 2022; originally announced June 2022.

Comments: Accepted to appear in ECML/PKDD 2022

arXiv:2206.07776 [pdf, other]

Robust Attack Graph Generation

Authors: Dennis Mouwen, Sicco Verwer, Azqa Nadeem

Abstract: We present a method to learn automaton models that are more robust to input modifications. It iteratively aligns sequences to a learned model, modifies the sequences to their aligned versions, and re-learns the model. Automaton learning algorithms are typically very good at modeling the frequent behavior of a software system. Our solution can be used to also learn the behavior present in infrequen… ▽ More We present a method to learn automaton models that are more robust to input modifications. It iteratively aligns sequences to a learned model, modifies the sequences to their aligned versions, and re-learns the model. Automaton learning algorithms are typically very good at modeling the frequent behavior of a software system. Our solution can be used to also learn the behavior present in infrequent sequences, as these will be aligned to the frequent ones represented by the model. We apply our method to the SAGE tool for modeling attacker behavior from intrusion alerts. In experiments, we demonstrate that our algorithm learns models that can handle noise such as added and removed symbols from sequences. Furthermore, it learns more concise models that fit better to the training data. △ Less

Submitted 15 June, 2022; originally announced June 2022.

Comments: Appeared at LearnAut '22

arXiv:2205.01512 [pdf, other]

Fair Feature Subset Selection using Multiobjective Genetic Algorithm

Authors: Ayaz Ur Rehman, Anas Nadeem, Muhammad Zubair Malik

Abstract: The feature subset selection problem aims at selecting the relevant subset of features to improve the performance of a Machine Learning (ML) algorithm on training data. Some features in data can be inherently noisy, costly to compute, improperly scaled, or correlated to other features, and they can adversely affect the accuracy, cost, and complexity of the induced algorithm. The goal of traditiona… ▽ More The feature subset selection problem aims at selecting the relevant subset of features to improve the performance of a Machine Learning (ML) algorithm on training data. Some features in data can be inherently noisy, costly to compute, improperly scaled, or correlated to other features, and they can adversely affect the accuracy, cost, and complexity of the induced algorithm. The goal of traditional feature selection approaches has been to remove such irrelevant features. In recent years ML is making a noticeable impact on the decision-making processes of our everyday lives. We want to ensure that these decisions do not reflect biased behavior towards certain groups or individuals based on protected attributes such as age, sex, or race. In this paper, we present a feature subset selection approach that improves both fairness and accuracy objectives and computes Pareto-optimal solutions using the NSGA-II algorithm. We use statistical disparity as a fairness metric and F1-Score as a metric for model performance. Our experiments on the most commonly used fairness benchmark datasets with three different machine learning algorithms show that using the evolutionary algorithm we can effectively explore the trade-off between fairness and accuracy. △ Less

Submitted 30 April, 2022; originally announced May 2022.

arXiv:2204.07210 [pdf, other]

doi 10.1145/3510455.3512777

A Case for Microservices Orchestration Using Workflow Engines

Authors: Anas Nadeem, Muhammad Zubair Malik

Abstract: Microservices have become the de-facto software architecture for cloud-native applications. A contentious architectural decision in microservices is to compose them using choreography or orchestration. In choreography, every service works independently, whereas, in orchestration, there is a controller that coordinates service interactions. This paper makes a case for orchestration. The promise of… ▽ More Microservices have become the de-facto software architecture for cloud-native applications. A contentious architectural decision in microservices is to compose them using choreography or orchestration. In choreography, every service works independently, whereas, in orchestration, there is a controller that coordinates service interactions. This paper makes a case for orchestration. The promise of microservices is that each microservice can be independently developed, deployed, tested, upgraded, and scaled. This makes them suitable for systems running on cloud infrastructures. However, microservice-based systems become complicated due to the complex interactions of various services, concurrent events, failing components, developers' lack of global view, and configurations of the environment. This makes maintaining and debugging such systems very challenging. We hypothesize that orchestrated services are easier to debug and to test this we ported the largest publicly available microservices' benchmark TrainTicket, which is implemented using choreography, to a fault-oblivious stateful workflow framework Temporal. We report our experience in porting the code from traditional choreographed microservice architecture to one orchestrated by Temporal and present our initial findings of time to debug the 22 bugs present in the benchmark. Our findings suggest that an effort towards making a transition to orchestrated approach is worthwhile, making the ported code easier to debug. △ Less

Submitted 14 April, 2022; originally announced April 2022.

Comments: 5 pages, International Conference on Software Engineering 2022 (NIER) Track

arXiv:2202.06149 [pdf, other]

doi 10.1109/ISSREW53611.2021.00113

Automatic Issue Classifier: A Transfer Learning Framework for Classifying Issue Reports

Authors: Anas Nadeem, Muhammad Usman Sarwar, Muhammad Zubair Malik

Abstract: Issue tracking systems are used in the software industry for the facilitation of maintenance activities that keep the software robust and up to date with ever-changing industry requirements. Usually, users report issues that can be categorized into different labels such as bug reports, enhancement requests, and questions related to the software. Most of the issue tracking systems make the labellin… ▽ More Issue tracking systems are used in the software industry for the facilitation of maintenance activities that keep the software robust and up to date with ever-changing industry requirements. Usually, users report issues that can be categorized into different labels such as bug reports, enhancement requests, and questions related to the software. Most of the issue tracking systems make the labelling of these issue reports optional for the issue submitter, which leads to a large number of unlabeled issue reports. In this paper, we present a state-of-the-art method to classify the issue reports into their respective categories i.e. bug, enhancement, and question. This is a challenging task because of the common use of informal language in the issue reports. Existing studies use traditional natural language processing approaches adopting key-word based features, which fail to incorporate the contextual relationship between words and therefore result in a high rate of false positives and false negatives. Moreover, previous works utilize a uni-label approach to classify the issue reports however, in reality, an issue-submitter can tag one issue report with more than one label at a time. This paper presents our approach to classify the issue reports in a multi-label setting. We use an off-the-shelf neural network called RoBERTa and fine-tune it to classify the issue reports. We validate our approach on issue reports belonging to numerous industrial projects from GitHub. We were able to achieve promising F-1 scores of 81%, 74%, and 80% for bug reports, enhancements, and questions, respectively. We also develop an industry tool called Automatic Issue Classifier (AIC), which automatically assigns labels to newly reported issues on GitHub repositories with high accuracy. △ Less

Submitted 12 February, 2022; originally announced February 2022.

Comments: 6 pages, International Symposium on Software Reliability Engineering (ISSRE 2021) Industry Track

arXiv:2107.02783 [pdf, other]

SAGE: Intrusion Alert-driven Attack Graph Extractor

Authors: Azqa Nadeem, Sicco Verwer, Shanchieh Jay Yang

Abstract: Attack graphs (AG) are used to assess pathways availed by cyber adversaries to penetrate a network. State-of-the-art approaches for AG generation focus mostly on deriving dependencies between system vulnerabilities based on network scans and expert knowledge. In real-world operations however, it is costly and ineffective to rely on constant vulnerability scanning and expert-crafted AGs. We propose… ▽ More Attack graphs (AG) are used to assess pathways availed by cyber adversaries to penetrate a network. State-of-the-art approaches for AG generation focus mostly on deriving dependencies between system vulnerabilities based on network scans and expert knowledge. In real-world operations however, it is costly and ineffective to rely on constant vulnerability scanning and expert-crafted AGs. We propose to automatically learn AGs based on actions observed through intrusion alerts, without prior expert knowledge. Specifically, we develop an unsupervised sequence learning system, SAGE, that leverages the temporal and probabilistic dependence between alerts in a suffix-based probabilistic deterministic finite automaton (S-PDFA) -- a model that accentuates infrequent severe alerts and summarizes paths leading to them. AGs are then derived from the S-PDFA on a per-objective, per-victim basis. Tested with intrusion alerts collected through Collegiate Penetration Testing Competition, SAGE compresses over 330k alerts into 93 AGs. These AGs reflect the strategies used by the participating teams. The AGs are succinct, interpretable, and capture behavioral dynamics, e.g., that attackers will often follow shorter paths to re-exploit objectives. △ Less

Submitted 14 October, 2021; v1 submitted 6 July, 2021; originally announced July 2021.

Comments: Appeared at VizSec '21 (proceedings) and KDD AI4Cyber '21 (without proceedings)

arXiv:1907.08083

Laptop Theft in a University Setting can be Avoided with Warnings

Authors: Azqa Nadeem, Marianne Junger

Abstract: Laptops have become an indispensable asset in today's digital age. They often contain highly sensitive information, such as credentials and confidential documents. As a result, the value of a laptop is an accumulation of the value of both the physical device itself and the cyber assets it contains, making it a lucrative target for theft. Educational institutions have a large population of potentia… ▽ More Laptops have become an indispensable asset in today's digital age. They often contain highly sensitive information, such as credentials and confidential documents. As a result, the value of a laptop is an accumulation of the value of both the physical device itself and the cyber assets it contains, making it a lucrative target for theft. Educational institutions have a large population of potential victims of laptop theft. To mitigate this risk, we investigate whether a simple warning sign can reduce the opportunity for potential offenders. To this end, we have conducted an empirical study to observe the prevalence of students/staff leaving their laptops unattended at a university study hall at the Delft University of Technology in the Netherlands, both with and without a warning sign. We observed 148 out of 220 subjects leaving their laptops unattended in just three weeks. The results also showed that without the warning banner, 75.5% (83 out of 110) of subjects left their laptops unattended and with the warning, only 59.1% (65 out of 110) of subjects showed the same behavior, which is a significant reduction of 16.4%. In addition, a qualitative analysis was performed on the responses of subjects who left their laptops unattended after the warning banner was placed. The results showed a mix of convenience, and a blind trust on the safety of the faculty. In conclusion, a simple banner was effective in reducing the opportunity for laptop theft. However, the percentage of laptops left unattended was still high even after the introduction of the banner. △ Less

Submitted 4 November, 2022; v1 submitted 18 July, 2019; originally announced July 2019.

Comments: The results in this paper are erroneous. Due to selection bias, the results are not statistically significant

arXiv:1904.01371 [pdf, other]

doi 10.1007/978-3-030-62582-5_15

Beyond Labeling: Using Clustering to Build Network Behavioral Profiles of Malware Families

Authors: Azqa Nadeem, Christian Hammerschmidt, Carlos H. Gañán, Sicco Verwer

Abstract: Malware family labels are known to be inconsistent. They are also black-box since they do not represent the capabilities of malware. The current state-of-the-art in malware capability assessment include mostly manual approaches, which are infeasible due to the ever-increasing volume of discovered malware samples. We propose a novel unsupervised machine learning-based method called MalPaCA, which a… ▽ More Malware family labels are known to be inconsistent. They are also black-box since they do not represent the capabilities of malware. The current state-of-the-art in malware capability assessment include mostly manual approaches, which are infeasible due to the ever-increasing volume of discovered malware samples. We propose a novel unsupervised machine learning-based method called MalPaCA, which automates capability assessment by clustering the temporal behavior in malware's network traces. MalPaCA provides meaningful behavioral clusters using only 20 packet headers. Behavioral profiles are generated based on the cluster membership of malware's network traces. A Directed Acyclic Graph shows the relationship between malwares according to their overlap** behaviors. The behavioral profiles together with the DAG provide more insightful characterization of malware than current family designations. We also propose a visualization-based evaluation method for the obtained clusters to assist practitioners in understanding the clustering results. We apply MalPaCA on a financial malware dataset collected in the wild that comprises of 1.1k malware samples resulting in 3.6M packets. Our experiments show that (i) MalPaCA successfully identifies capabilities, such as port scans and reuse of Command and Control servers; (ii) It uncovers multiple discrepancies between behavioral clusters and malware family labels; and (iii) It demonstrates the effectiveness of clustering traces using temporal features by producing an error rate of 8.3%, compared to 57.5% obtained from statistical features. △ Less

Submitted 13 November, 2020; v1 submitted 2 April, 2019; originally announced April 2019.

Comments: Accepted as a chapter in Springer MAAIDL 2020

arXiv:1506.01217 [pdf]

doi 10.1007/978-3-642-27207-3_12

A Safe Regression Testing Technique for Web Services based on WSDL Specification

Authors: Tehreem Masood, Aamer Nadeem, Gang-soo Lee

Abstract: Specification-based regression testing of web services is an important activity which verifies the quality of web services. A major problem in web services is that only provider has the source code and both user and broker only have the XML based specification. So from the perspective of user and broker, specification based regression testing of web services is needed. The existing techniques are… ▽ More Specification-based regression testing of web services is an important activity which verifies the quality of web services. A major problem in web services is that only provider has the source code and both user and broker only have the XML based specification. So from the perspective of user and broker, specification based regression testing of web services is needed. The existing techniques are code based. Due to the dynamic behavior of web services, web services undergo maintenance and evolution process rapidly. Retesting of web services is required in order to verify the impact of changes. In this paper, we present an automated safe specification based regression testing approach that uses original and modified WSDL specifications for change identification. All the relevant test cases are selected as reusable hence our regression test selection approach is safe. △ Less

Submitted 3 June, 2015; originally announced June 2015.

Comments: 9 figures, 2 tables, 11 pages, ASEA 2011

Journal ref: Communications in Computer and Information Science Volume 257, 2011, pp 108-119

arXiv:1102.4162 [pdf]

Comparative Study on DFD to UML Diagrams Transformations

Authors: Atif A. A. Jilani, Muhammad Usman, Aamer Nadeem

Abstract: Most of legacy systems use nowadays were modeled and documented using structured approach. Expansion of these systems in terms of functionality and maintainability requires shift towards object-oriented documentation and design, which has been widely accepted by the industry. In this paper, we present a survey of the existing Data Flow Diagram (DFD) to Unified Modeling language (UML) transformatio… ▽ More Most of legacy systems use nowadays were modeled and documented using structured approach. Expansion of these systems in terms of functionality and maintainability requires shift towards object-oriented documentation and design, which has been widely accepted by the industry. In this paper, we present a survey of the existing Data Flow Diagram (DFD) to Unified Modeling language (UML) transformation techniques. We analyze transformation techniques using a set of parameters, identified in the survey. Based on identified parameters, we present an analysis matrix, which describes the strengths and weaknesses of transformation techniques. It is observed that most of the transformation approaches are rule based, which are incomplete and defined at abstract level that does not cover in depth transformation and automation issues. Transformation approaches are data centric, which focuses on data-store for class diagram generation. Very few of the transformation techniques have been applied on case study as a proof of concept, which are not comprehensive and majority of them are partially automated. △ Less

Submitted 21 February, 2011; originally announced February 2011.

Comments: 7 pages; ISSN: 2221-0741

Journal ref: World of Computer Science and Information Technology Journal(WCSIT), Vol. 1, No.1,10-16, 2011

arXiv:1003.2677 [pdf]

Classified Ads Harvesting Agent and Notification System

Authors: Razvi Doomun, Lollmahamod N., Auleear Nadeem, Mozafar Aukin

Abstract: The shift from an information society to a knowledge society require rapid information harvesting, reliable search and instantaneous on demand delivery. Information extraction agents are used to explore and collect data available from Web, in order to effectively exploit such data for business purposes, such as automatic news filtering, advertisement or product searching and price comparing. In t… ▽ More The shift from an information society to a knowledge society require rapid information harvesting, reliable search and instantaneous on demand delivery. Information extraction agents are used to explore and collect data available from Web, in order to effectively exploit such data for business purposes, such as automatic news filtering, advertisement or product searching and price comparing. In this paper, we develop a real-time automatic harvesting agent for adverts posted on Servihoo web portal and an SMS-based notification system. It uses the URL of the web portal and the object model, i.e., the fields of interests and a set of rules written using the HTML parsing functions to extract latest adverts information. The extraction engine executes the extraction rules and stores the information in a database to be processed for automatic notification. This intelligent system helps to tremendously save time. It also enables users or potential product buyers to react more quickly to changes and newly posted sales adverts, paving the way to real-time best buy deals. △ Less

Submitted 13 March, 2010; originally announced March 2010.

Comments: International Conference on Information and Communication Technology for the Muslim World (ICT4M 2006), 21-23 November 2006, Kuala Lumpur, Malaysia

Showing 1–18 of 18 results for author: Nadeem, A