Search | arXiv e-print repository

arXiv:2201.01948 [pdf, other]

Evaluation of Distributed Data Processing Frameworks in Hybrid Clouds

Authors: Faheem Ullah, Shagun Dhingra, Xiaoyu Xia, M. Ali Babar

Abstract: Distributed data processing frameworks (e.g., Hadoop, Spark, and Flink) are widely used to distribute data among computing nodes of a cloud. Recently, there have been increasing efforts aimed at evaluating the performance of distributed data processing frameworks hosted in private and public clouds. However, there is a paucity of research on evaluating the performance of these frameworks hosted in… ▽ More Distributed data processing frameworks (e.g., Hadoop, Spark, and Flink) are widely used to distribute data among computing nodes of a cloud. Recently, there have been increasing efforts aimed at evaluating the performance of distributed data processing frameworks hosted in private and public clouds. However, there is a paucity of research on evaluating the performance of these frameworks hosted in a hybrid cloud, which is an emerging cloud model that integrates private and public clouds to use the best of both worlds. Therefore, in this paper, we evaluate the performance of Hadoop, Spark, and Flink in a hybrid cloud in terms of execution time, resource utilization, horizontal scalability, vertical scalability, and cost. For this study, our hybrid cloud consists of OpenStack (private cloud) and MS Azure (public cloud). We use both batch and iterative workloads for the evaluation. Our results show that in a hybrid cloud (i) the execution time increases as more nodes are borrowed by the private cloud from the public cloud, (ii) Flink outperforms Spark, which in turn outperforms Hadoop in terms of execution time, (iii) Hadoop transfers the largest amount of data among the nodes during the workload execution while Spark transfers the least amount of data, (iv) all three frameworks horizontally scale better as compared to vertical scaling, and (v) Spark is found to be least expensive in terms of $ cost for data processing while Hadoop is found the most expensive. △ Less

Submitted 6 January, 2022; originally announced January 2022.

arXiv:2112.12597 [pdf, other]

Well Begun is Half Done: An Empirical Study of Exploitability & Impact of Base-Image Vulnerabilities

Authors: Mubin Ul Haque, M. Ali Babar

Abstract: Container technology, (e.g., Docker) is being widely adopted for deploying software infrastructures or applications in the form of container images. Security vulnerabilities in the container images are a primary concern for develo** containerized software. Exploitation of the vulnerabilities could result in disastrous impact, such as loss of confidentiality, integrity, and availability of contai… ▽ More Container technology, (e.g., Docker) is being widely adopted for deploying software infrastructures or applications in the form of container images. Security vulnerabilities in the container images are a primary concern for develo** containerized software. Exploitation of the vulnerabilities could result in disastrous impact, such as loss of confidentiality, integrity, and availability of containerized software. Understanding the exploitability and impact characteristics of vulnerabilities can help in securing the configuration of containerized software. However, there is a lack of research aimed at empirically identifying and understanding the exploitability and impact of vulnerabilities in container images. We carried out an empirical study to investigate the exploitability and impact of security vulnerabilities in base-images and their prevalence in open-source containerized software. We considered base-images since container images are built from base-images that provide all the core functionalities to build and operate containerized software. We discovered and characterized the exploitability and impact of security vulnerabilities in 261 base-images, which are the origin of 4,681 actively maintained official container images in the largest container registry, i.e., Docker Hub. To characterize the prevalence of vulnerable base-images in real-world projects, we analysed 64,579 containerized software from GitHub. Our analysis of a set of $1,983$ unique base-image security vulnerabilities revealed 13 novel findings. These findings are expected to help developers to understand the potential security problems related to base-images and encourage them to investigate base-images from security perspective before develo** their applications. △ Less

Submitted 21 December, 2021; originally announced December 2021.

arXiv:2112.12595 [pdf, other]

KGSecConfig: A Knowledge Graph Based Approach for Secured Container Orchestrator Configuration

Authors: Mubin Ul Haque, M. Mehdi Kholoosi, M. Ali Babar

Abstract: Container Orchestrator (CO) is a vital technology for managing clusters of containers, which may form a virtualized infrastructure for develo** and operating software systems. Like any other software system, securing CO is critical, but can be quite challenging task due to large number of configurable options. Manual configuration is not only knowledge intensive and time consuming, but also is e… ▽ More Container Orchestrator (CO) is a vital technology for managing clusters of containers, which may form a virtualized infrastructure for develo** and operating software systems. Like any other software system, securing CO is critical, but can be quite challenging task due to large number of configurable options. Manual configuration is not only knowledge intensive and time consuming, but also is error prone. For automating security configuration of CO, we propose a novel Knowledge Graph based Security Configuration, KGSecConfig, approach. Our solution leverages keyword and learning models to systematically capture, link, and correlate heterogeneous and multi-vendor configuration space in a unified structure for supporting automation of security configuration of CO. We implement KGSecConfig on Kubernetes, Docker, Azure, and VMWare to build secured configuration knowledge graph. Our evaluation results show 0.98 and 0.94 accuracy for keyword and learning-based secured configuration option and concept extraction, respectively. We also demonstrate the utilization of the knowledge graph for automated misconfiguration mitigation in a Kubernetes cluster. We assert that our knowledge graph based approach can help in addressing several challenges, e.g., misconfiguration of security, associated with manually configuring the security of CO. △ Less

Submitted 21 December, 2021; originally announced December 2021.

arXiv:2112.10356 [pdf, other]

An Investigation into Inconsistency of Software Vulnerability Severity across Data Sources

Authors: Roland Croft, M. Ali Babar, Li Li

Abstract: Software Vulnerability (SV) severity assessment is a vital task for informing SV remediation and triage. Ranking of SV severity scores is often used to advise prioritization of patching efforts. However, severity assessment is a difficult and subjective manual task that relies on expertise, knowledge, and standardized reporting schemes. Consequently, different data sources that perform independent… ▽ More Software Vulnerability (SV) severity assessment is a vital task for informing SV remediation and triage. Ranking of SV severity scores is often used to advise prioritization of patching efforts. However, severity assessment is a difficult and subjective manual task that relies on expertise, knowledge, and standardized reporting schemes. Consequently, different data sources that perform independent analysis may provide conflicting severity rankings. Inconsistency across these data sources affects the reliability of severity assessment data, and can consequently impact SV prioritization and fixing. In this study, we investigate severity ranking inconsistencies over the SV reporting lifecycle. Our analysis helps characterize the nature of this problem, identify correlated factors, and determine the impacts of inconsistency on downstream tasks. Our findings observe that SV severity often lacks consideration or is underestimated during initial reporting, and such SVs consequently receive lower prioritization. We identify six potential attributes that are correlated to this misjudgment, and show that inconsistency in severity reporting schemes can severely degrade the performance of downstream severity prediction by up to 77%. Our findings help raise awareness of SV severity data inconsistencies and draw attention to this data quality problem. These insights can help developers better consider SV severity data sources, and improve the reliability of consequent SV prioritization. Furthermore, we encourage researchers to provide more attention to SV severity data selection. △ Less

Submitted 16 January, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

Comments: Accepted for publication in SANER 22

arXiv:2112.10354 [pdf, other]

Systematic Literature Review on Cyber Situational Awareness Visualizations

Authors: Liuyue Jiang, Asangi Jayatilaka, Mehwish Nasim, Marthie Grobler, Mansooreh Zahedi, M. Ali Babar

Abstract: The dynamics of cyber threats are increasingly complex, making it more challenging than ever for organizations to obtain in-depth insights into their cyber security status. Therefore, organizations rely on Cyber Situational Awareness (CSA) to support them in better understanding the threats and associated impacts of cyber events. Due to the heterogeneity and complexity of cyber security data, ofte… ▽ More The dynamics of cyber threats are increasingly complex, making it more challenging than ever for organizations to obtain in-depth insights into their cyber security status. Therefore, organizations rely on Cyber Situational Awareness (CSA) to support them in better understanding the threats and associated impacts of cyber events. Due to the heterogeneity and complexity of cyber security data, often with multidimensional attributes, sophisticated visualization techniques are needed to achieve CSA. However, there have been no previous attempts to systematically review and analyze the scientific literature on CSA visualizations. In this paper, we systematically select and review 54 publications that discuss visualizations to support CSA. We extract data from these papers to identify key stakeholders, information types, data sources, and visualization techniques. Furthermore, we analyze the level of CSA supported by the visualizations, alongside examining the maturity of the visualizations, challenges, and practices related to CSA visualizations to prepare a full analysis of the current state of CSA in an organizational context. Our results reveal certain gaps in CSA visualizations. For instance, the largest focus is on operational-level staff, and there is a clear lack of visualizations targeting other types of stakeholders such as managers, higher-level decision makers, and non-expert users. Most papers focus on threat information visualization, and there is a dearth of papers that visualize impact information, response plans, and information shared within teams. Based on the results that highlight the important concerns in CSA visualizations, we recommend a list of future research directions. △ Less

Submitted 24 May, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

arXiv:2112.06356 [pdf, other]

Evaluation of Security Training and Awareness Programs: Review of Current Practices and Guideline

Authors: Asangi Jayatilaka, Nathan Beu, Irina Baetu, Mansooreh Zahedi, M. Ali Babar, Laura Hartley, Winston Lewinsmith

Abstract: Evaluating the effectiveness of security awareness and training programs is critical for minimizing organizations' human security risk. Based on a literature review and industry interviews, we discuss current practices and devise guidelines for measuring the effectiveness of security training and awareness initiatives used by organizations Evaluating the effectiveness of security awareness and training programs is critical for minimizing organizations' human security risk. Based on a literature review and industry interviews, we discuss current practices and devise guidelines for measuring the effectiveness of security training and awareness initiatives used by organizations △ Less

Submitted 12 December, 2021; originally announced December 2021.

Comments: 12 pages

arXiv:2112.00853 [pdf]

On the Scalability of Big Data Cyber Security Analytics Systems

Authors: Faheem Ullah, Muhammad Ali Babar

Abstract: Big Data Cyber Security Analytics (BDCA) systems use big data technologies (e.g., Apache Spark) to collect, store, and analyze a large volume of security event data for detecting cyber-attacks. The volume of digital data in general and security event data in specific is increasing exponentially. The velocity with which the security event data is generated and fed into a BDCA system is unpredictabl… ▽ More Big Data Cyber Security Analytics (BDCA) systems use big data technologies (e.g., Apache Spark) to collect, store, and analyze a large volume of security event data for detecting cyber-attacks. The volume of digital data in general and security event data in specific is increasing exponentially. The velocity with which the security event data is generated and fed into a BDCA system is unpredictable. Therefore, a BDCA system should be highly scalable to deal with the unpredictable increase/decrease in the velocity of security event data. However, there has been little effort to investigate the scalability of BDCA systems to identify and exploit the sources of scalability improvement. In this paper, we first investigate the scalability of a Spark-based BDCA system with default Spark settings. we then identify Spark configuration parameters (e.g., execution memory) that can significantly impact the scalability of a BDCA system. Based on the identified parameters, we finally propose a parameter-driven adaptation approach, SCALER, for optimizing a system's scalability. We have conducted a set of experiments by implementing a Spark-based BDCA system on a large-scale OpenStack cluster. We ran our experiments with four security datasets. We have found that (i) a BDCA system with default Spark configuration parameters deviates from ideal scalability by 59.5% (ii) 9 out of 11 studied Spark configuration parameters significantly impact scalability (iii) SCALER improves the BDCA system's scalability by 20.8% compared to the scalability with default Spark parameter setting. The findings of our study highlight the importance of exploring the parameter space of the underlying big data framework (e.g., Apache Spark) for scalable cyber security analytics. △ Less

Submitted 28 November, 2021; originally announced December 2021.

arXiv:2110.01927 [pdf, other]

LogDP: Combining Dependency and Proximity for Log-based Anomaly Detection

Authors: Yongzheng Xie, Hongyu Zhang, Bo Zhang, Muhammad Ali Babar, Sha Lu

Abstract: Log analysis is an important technique that engineers use for troubleshooting faults of large-scale service-oriented systems. In this study, we propose a novel semi-supervised log-based anomaly detection approach, LogDP, which utilizes the dependency relationships among log events and proximity among log sequences to detect the anomalies in massive unlabeled log data. LogDP divides log events into… ▽ More Log analysis is an important technique that engineers use for troubleshooting faults of large-scale service-oriented systems. In this study, we propose a novel semi-supervised log-based anomaly detection approach, LogDP, which utilizes the dependency relationships among log events and proximity among log sequences to detect the anomalies in massive unlabeled log data. LogDP divides log events into dependent and independent events, then learns normal patterns of dependent events using dependency and independent events using proximity. Events violating any normal pattern are identified as anomalies. By combining dependency and proximity, LogDP is able to achieve high detection accuracy. Extensive experiments have been conducted on real-world datasets, and the results show that LogDP outperforms six state-of-the-art methods. △ Less

Submitted 5 October, 2021; originally announced October 2021.

arXiv:2109.07260 [pdf, other]

Evaluation of Distributed Databases in Hybrid Clouds and Edge Computing: Energy, Bandwidth, and Storage Consumption

Authors: Yaser Mansouri, Victor Prokhorenko, Faheem Ullah, M. Ali Babar

Abstract: A benchmark study of modern distributed databases is an important source of information to select the right technology for managing data in the cloud-edge paradigms. To make the right decision, it is required to conduct an extensive experimental study on a variety of hardware infrastructures. While most of the state-of-the-art studies have investigated only response time and scalability of distrib… ▽ More A benchmark study of modern distributed databases is an important source of information to select the right technology for managing data in the cloud-edge paradigms. To make the right decision, it is required to conduct an extensive experimental study on a variety of hardware infrastructures. While most of the state-of-the-art studies have investigated only response time and scalability of distributed databases, focusing on other various metrics (e.g., energy, bandwidth, and storage consumption) is essential to fully understand the resources consumption of the distributed databases. Also, existing studies have explored the response time and scalability of these databases either in private or public cloud. Hence, there is a paucity of investigation into the evaluation of these databases deployed in a hybrid cloud, which is the seamless integration of public and private cloud. To address these research gaps, in this paper, we investigate energy, bandwidth and storage consumption of the most used and common distributed databases. For this purpose, we have evaluated four open-source databases (Cassandra, Mongo, Redis and MySQL) on the hybrid cloud spanning over local OpenStack and Microsoft Azure, and a variety of edge computing nodes including Raspberry Pi, a cluster of Raspberry Pi, and low and high power servers. Our extensive experimental results reveal several helpful insights for the deployment selection of modern distributed databases in edge-cloud environments. △ Less

Submitted 8 January, 2023; v1 submitted 15 September, 2021; originally announced September 2021.

Comments: 19 pages, 13 figures

arXiv:2109.05740 [pdf, other]

Data Preparation for Software Vulnerability Prediction: A Systematic Literature Review

Authors: Roland Croft, Yongzheng Xie, M. Ali Babar

Abstract: Software Vulnerability Prediction (SVP) is a data-driven technique for software quality assurance that has recently gained considerable attention in the Software Engineering research community. However, the difficulties of preparing Software Vulnerability (SV) related data is considered as the main barrier to industrial adoption of SVP approaches. Given the increasing, but dispersed, literature on… ▽ More Software Vulnerability Prediction (SVP) is a data-driven technique for software quality assurance that has recently gained considerable attention in the Software Engineering research community. However, the difficulties of preparing Software Vulnerability (SV) related data is considered as the main barrier to industrial adoption of SVP approaches. Given the increasing, but dispersed, literature on this topic, it is needed and timely to systematically select, review, and synthesize the relevant peer-reviewed papers reporting the existing SV data preparation techniques and challenges. We have carried out a Systematic Literature Review (SLR) of SVP research in order to develop a systematized body of knowledge of the data preparation challenges, solutions, and the needed research. Our review of the 61 relevant papers has enabled us to develop a taxonomy of data preparation for SVP related challenges. We have analyzed the identified challenges and available solutions using the proposed taxonomy. Our analysis of the state of the art has enabled us identify the opportunities for future research. This review also provides a set of recommendations for researchers and practitioners of SVP approaches. △ Less

Submitted 26 April, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

Comments: Accepted for publication in TSE

arXiv:2109.04029 [pdf, other]

Automated Security Assessment for the Internet of Things

Authors: Xuanyu Duan, Mengmeng Ge, Triet H. M. Le, Faheem Ullah, Shang Gao, Xuequan Lu, M. Ali Babar

Abstract: Internet of Things (IoT) based applications face an increasing number of potential security risks, which need to be systematically assessed and addressed. Expert-based manual assessment of IoT security is a predominant approach, which is usually inefficient. To address this problem, we propose an automated security assessment framework for IoT networks. Our framework first leverages machine learni… ▽ More Internet of Things (IoT) based applications face an increasing number of potential security risks, which need to be systematically assessed and addressed. Expert-based manual assessment of IoT security is a predominant approach, which is usually inefficient. To address this problem, we propose an automated security assessment framework for IoT networks. Our framework first leverages machine learning and natural language processing to analyze vulnerability descriptions for predicting vulnerability metrics. The predicted metrics are then input into a two-layered graphical security model, which consists of an attack graph at the upper layer to present the network connectivity and an attack tree for each node in the network at the bottom layer to depict the vulnerability information. This security model automatically assesses the security of the IoT network by capturing potential attack paths. We evaluate the viability of our approach using a proof-of-concept smart building system model which contains a variety of real-world IoT devices and potential vulnerabilities. Our evaluation of the proposed framework demonstrates its effectiveness in terms of automatically predicting the vulnerability metrics of new vulnerabilities with more than 90% accuracy, on average, and identifying the most vulnerable attack paths within an IoT network. The produced assessment results can serve as a guideline for cybersecurity professionals to take further actions and mitigate risks in a timely manner. △ Less

Submitted 9 September, 2021; originally announced September 2021.

Comments: Accepted for publication at the 26th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC 2021)

arXiv:2108.08041 [pdf, other]

DeepCVA: Automated Commit-level Vulnerability Assessment with Deep Multi-task Learning

Authors: Triet H. M. Le, David Hin, Roland Croft, M. Ali Babar

Abstract: It is increasingly suggested to identify Software Vulnerabilities (SVs) in code commits to give early warnings about potential security risks. However, there is a lack of effort to assess vulnerability-contributing commits right after they are detected to provide timely information about the exploitability, impact and severity of SVs. Such information is important to plan and prioritize the mitiga… ▽ More It is increasingly suggested to identify Software Vulnerabilities (SVs) in code commits to give early warnings about potential security risks. However, there is a lack of effort to assess vulnerability-contributing commits right after they are detected to provide timely information about the exploitability, impact and severity of SVs. Such information is important to plan and prioritize the mitigation for the identified SVs. We propose a novel Deep multi-task learning model, DeepCVA, to automate seven Commit-level Vulnerability Assessment tasks simultaneously based on Common Vulnerability Scoring System (CVSS) metrics. We conduct large-scale experiments on 1,229 vulnerability-contributing commits containing 542 different SVs in 246 real-world software projects to evaluate the effectiveness and efficiency of our model. We show that DeepCVA is the best-performing model with 38% to 59.8% higher Matthews Correlation Coefficient than many supervised and unsupervised baseline models. DeepCVA also requires 6.3 times less training and validation time than seven cumulative assessment models, leading to significantly less model maintenance cost as well. Overall, DeepCVA presents the first effective and efficient solution to automatically assess SVs early in software systems. △ Less

Submitted 18 August, 2021; originally announced August 2021.

Comments: Accepted as a full paper at the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2021

arXiv:2108.06705 [pdf, other]

A Qualitative Study of Architectural Design Issues in DevOps

Authors: Mojtaba Shahin, Ali Rezaei Nasab, Muhammad Ali Babar

Abstract: Software architecture is critical in succeeding with DevOps. However, designing software architectures that enable and support DevOps (DevOps-driven software architectures) is a challenge for organizations. We assert that one of the essential steps towards characterizing DevOps-driven architectures is to understand architectural design issues raised in DevOps. At the same time, some of the archite… ▽ More Software architecture is critical in succeeding with DevOps. However, designing software architectures that enable and support DevOps (DevOps-driven software architectures) is a challenge for organizations. We assert that one of the essential steps towards characterizing DevOps-driven architectures is to understand architectural design issues raised in DevOps. At the same time, some of the architectural issues that emerge in the DevOps context (and their corresponding architectural practices or tactics) may stem from the context (i.e., domain) and characteristics of software organizations. To this end, we conducted a mixed-methods study that consists of a qualitative case study of two teams in a company during their DevOps transformation and a content analysis of Stack Overflow and DevOps Stack Exchange posts to understand architectural design issues in DevOps. Our study found eight specific and contextual architectural design issues faced by the two teams and classified architectural design issues discussed in Stack Overflow and DevOps Stack Exchange into 11 groups. Our aggregated results reveal that the main characteristics of DevOps-driven architectures are: being loosely coupled and prioritizing deployability, testability, supportability, and modifiability over other quality attributes. Finally, we discuss some concrete implications for research and practice. △ Less

Submitted 12 November, 2021; v1 submitted 15 August, 2021; originally announced August 2021.

Comments: Preprint accepted for publication in Journal of Software: Evolution and Process, 2021. 38 Pages, 6 Tables, 11 Figures. This article is an extended version of the ICSSP2020 paper (the preprint is available at arXiv:2003.06108). arXiv admin note: text overlap with arXiv:2003.06108

arXiv:2108.04766 [pdf]

Falling for Phishing: An Empirical Investigation into People's Email Response Behaviors

Authors: Asangi Jayatilaka, Nalin Asanka Gamagedara Arachchilage, Muhammad Ali Babar

Abstract: Despite sophisticated phishing email detection systems, and training and awareness programs, humans continue to be tricked by phishing emails. In an attempt to better understand why phishing email attacks still work and how best to mitigate them, we have carried out an empirical study to investigate people's thought processes when reading their emails. We used a scenario-based role-play "think alo… ▽ More Despite sophisticated phishing email detection systems, and training and awareness programs, humans continue to be tricked by phishing emails. In an attempt to better understand why phishing email attacks still work and how best to mitigate them, we have carried out an empirical study to investigate people's thought processes when reading their emails. We used a scenario-based role-play "think aloud" method and follow-up interviews to collect data from 19 participants. The experiment was conducted using a simulated web email client, and real phishing and legitimate emails adapted to the given scenario. The analysis of the collected data has enabled us to identify eleven factors that influence people's response decisions to both phishing and legitimate emails. Furthermore, based on the user study findings, we discuss novel insights into flaws in the general email decision-making behaviors that could make people susceptible to phishing attacks. △ Less

Submitted 6 October, 2021; v1 submitted 10 August, 2021; originally announced August 2021.

Comments: The 42nd International Conference on Information Systems (ICIS'21), Austin, Texas, USA, 2021, 17

Journal ref: The 42nd International Conference on Information Systems (ICIS'21), Austin, Texas, USA, 2021, 17

arXiv:2108.02133 [pdf]

The Impact of Traceability on Software Maintenance and Evolution: A Map** Study

Authors: Fangchao Tian, Tianlu Wang, Peng Liang, Chong Wang, Arif Ali Khan, Muhammad Ali Babar

Abstract: Software traceability plays a critical role in software maintenance and evolution. We conducted a systematic map** study with six research questions to understand the benefits, costs, and challenges of using traceability in maintenance and evolution. We systematically selected, analyzed, and synthesized 63 studies published between January 2000 and May 2020, and the results show that: traceabili… ▽ More Software traceability plays a critical role in software maintenance and evolution. We conducted a systematic map** study with six research questions to understand the benefits, costs, and challenges of using traceability in maintenance and evolution. We systematically selected, analyzed, and synthesized 63 studies published between January 2000 and May 2020, and the results show that: traceability supports 11 maintenance and evolution activities, among which change management is the most frequently supported activity; strong empirical evidence from industry is needed to validate the impact of traceability on maintenance and evolution; easing the process of change management is the main benefit of deploying traceability practices; establishing and maintaining traceability links is the main cost of deploying traceability practices; 13 approaches and 32 tools that support traceability in maintenance and evolution were identified; improving the quality of traceability links, the performance of using traceability approaches and tools are the main traceability challenges in maintenance and evolution. The findings of this study provide a comprehensive understanding of deploying traceability practices in software maintenance and evolution phase, and can be used by researchers for future directions and practitioners for making informed decisions while using traceability in maintenance and evolution. △ Less

Submitted 4 August, 2021; originally announced August 2021.

Comments: Preprint accepted for publication in Journal of Software: Evolution and Process, 2021

arXiv:2108.01018 [pdf]

Relationships between Software Architecture and Source Code in Practice: An Exploratory Survey and Interview

Authors: Fangchao Tian, Peng Liang, Muhammad Ali Babar

Abstract: Context: Software Architecture (SA) and Source Code (SC) are two intertwined artefacts that represent the interdependent design decisions made at different levels of abstractions - High-Level (HL) and Low-Level (LL). An understanding of the relationships between SA and SC is expected to bridge the gap between SA and SC for supporting maintenance and evolution of software systems. Objective: We aim… ▽ More Context: Software Architecture (SA) and Source Code (SC) are two intertwined artefacts that represent the interdependent design decisions made at different levels of abstractions - High-Level (HL) and Low-Level (LL). An understanding of the relationships between SA and SC is expected to bridge the gap between SA and SC for supporting maintenance and evolution of software systems. Objective: We aimed at exploring practitioners' understanding about the relationships between SA and SC. Method: We used a mixed-method that combines an online survey with 87 respondents and an interview with 8 participants to collect the views of practitioners from 37 countries about the relationships between SA and SC. Results: Our results reveal that: practitioners mainly discuss five features of relationships between SA and SC; a few practitioners have adopted dedicated approaches and tools in the literature for identifying and analyzing the relationships between SA and SC despite recognizing the importance of such information for improving a system's quality attributes, especially maintainability and reliability. It is felt that cost and effort are the major impediments that prevent practitioners from identifying, analyzing, and using the relationships between SA and SC. Conclusions: The results have empirically identified five features of relationships between SA and SC reported in the literature from the perspective of practitioners and a systematic framework to manage the five features of relationships should be developed with dedicated approaches and tools considering the cost and benefit of maintaining the relationships. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: Preprint accepted for publication in Information and Software Technology, 2021

arXiv:2107.13723 [pdf, other]

An Empirical Study of Developers' Discussions about Security Challenges of Different Programming Languages

Authors: Roland Croft, Yongzheng Xie, Mansooreh Zahedi, M. Ali Babar, Christoph Treude

Abstract: Given programming languages can provide different types and levels of security support, it is critically important to consider security aspects while selecting programming languages for develo** software systems. Inadequate consideration of security in the choice of a programming language may lead to potential ramifications for secure development. Whilst theoretical analysis of the supposed secu… ▽ More Given programming languages can provide different types and levels of security support, it is critically important to consider security aspects while selecting programming languages for develo** software systems. Inadequate consideration of security in the choice of a programming language may lead to potential ramifications for secure development. Whilst theoretical analysis of the supposed security properties of different programming languages has been conducted, there has been relatively little effort to empirically explore the actual security challenges experienced by developers. We have performed a large-scale study of the security challenges of 15 programming languages by quantitatively and qualitatively analysing the developers' discussions from Stack Overflow and GitHub. By leveraging topic modelling, we have derived a taxonomy of 18 major security challenges for 6 topic categories. We have also conducted comparative analysis to understand how the identified challenges vary regarding the different programming languages and data sources. Our findings suggest that the challenges and their characteristics differ substantially for different programming languages and data sources, i.e., Stack Overflow and GitHub. The findings provide evidence-based insights and understanding of security challenges related to different programming languages to software professionals (i.e., practitioners or researchers). The reported taxonomy of security challenges can assist both practitioners and researchers in better understanding and traversing the secure development landscape. This study highlights the importance of the choice of technology, e.g., programming language, in secure software engineering. Hence, the findings are expected to motivate practitioners to consider the potential impact of the choice of programming languages on software security. △ Less

Submitted 26 November, 2021; v1 submitted 28 July, 2021; originally announced July 2021.

Comments: To be published in EMSE

arXiv:2107.08364 [pdf, other]

doi 10.1145/3529757

A Survey on Data-driven Software Vulnerability Assessment and Prioritization

Authors: Triet H. M. Le, Huaming Chen, M. Ali Babar

Abstract: Software Vulnerabilities (SVs) are increasing in complexity and scale, posing great security risks to many software systems. Given the limited resources in practice, SV assessment and prioritization help practitioners devise optimal SV mitigation plans based on various SV characteristics. The surges in SV data sources and data-driven techniques such as Machine Learning and Deep Learning have taken… ▽ More Software Vulnerabilities (SVs) are increasing in complexity and scale, posing great security risks to many software systems. Given the limited resources in practice, SV assessment and prioritization help practitioners devise optimal SV mitigation plans based on various SV characteristics. The surges in SV data sources and data-driven techniques such as Machine Learning and Deep Learning have taken SV assessment and prioritization to the next level. Our survey provides a taxonomy of the past research efforts and highlights the best practices for data-driven SV assessment and prioritization. We also discuss the current limitations and propose potential solutions to address such issues. △ Less

Submitted 3 April, 2022; v1 submitted 18 July, 2021; originally announced July 2021.

Comments: Accepted for publication in the ACM Computing Surveys journal (CSUR), 2022

Journal ref: ACM Comput. Surv., 55, 5 (2022), Article 100

arXiv:2107.02096 [pdf, other]

An Empirical Analysis of Practitioners' Perspectives on Security Tool Integration into DevOps

Authors: Roshan Namal Rajapakse, Mansooreh Zahedi, Muhammad Ali Babar

Abstract: Background: Security tools play a vital role in enabling developers to build secure software. However, it can be quite challenging to introduce and fully leverage security tools without affecting the speed or frequency of deployments in the DevOps paradigm. Aims: We aim to empirically investigate the key challenges practitioners face when integrating security tools into a DevOps workflow in order… ▽ More Background: Security tools play a vital role in enabling developers to build secure software. However, it can be quite challenging to introduce and fully leverage security tools without affecting the speed or frequency of deployments in the DevOps paradigm. Aims: We aim to empirically investigate the key challenges practitioners face when integrating security tools into a DevOps workflow in order to provide recommendations to overcome them. Method: We conducted a study involving 31 systematically selected webinars on integrating security tools in DevOps. We used a qualitative data analysis method, i.e., thematic analysis, to identify the challenges and emerging solutions related to integrating security tools in rapid deployment environments. Results: We find that while traditional security tools are unable to cater for the needs of DevOps, the industry is moving towards new generations of tools that have started focusing on these requirements. We have developed a DevOps workflow that integrates security tools and a set of guidelines by synthesizing practitioners' recommendations in the analyzed webinars. Conclusion: While the latest security tools are addressing some of the requirements of DevOps, there are many tool-related drawbacks yet to be adequately addressed. △ Less

Submitted 19 July, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

Comments: [v3] Camera-ready version (with a few improvements)

arXiv:2107.01921 [pdf, other]

An Empirical Study of Rule-Based and Learning-Based Approaches for Static Application Security Testing

Authors: Roland Croft, Dominic Newlands, Ziyu Chen, M. Ali Babar

Abstract: Background: Static Application Security Testing (SAST) tools purport to assist developers in detecting security issues in source code. These tools typically use rule-based approaches to scan source code for security vulnerabilities. However, due to the significant shortcomings of these tools (i.e., high false positive rates), learning-based approaches for Software Vulnerability Prediction (SVP) ar… ▽ More Background: Static Application Security Testing (SAST) tools purport to assist developers in detecting security issues in source code. These tools typically use rule-based approaches to scan source code for security vulnerabilities. However, due to the significant shortcomings of these tools (i.e., high false positive rates), learning-based approaches for Software Vulnerability Prediction (SVP) are becoming a popular approach. Aims: Despite the similar objectives of these two approaches, their comparative value is unexplored. We provide an empirical analysis of SAST tools and SVP models, to identify their relative capabilities for source code security analysis. Method: We evaluate the detection and assessment performance of several common SAST tools and SVP models on a variety of vulnerability datasets. We further assess the viability and potential benefits of combining the two approaches. Results: SAST tools and SVP models provide similar detection capabilities, but SVP models exhibit better overall performance for both detection and assessment. Unification of the two approaches is difficult due to lacking synergies. Conclusions: Our study generates 12 main findings which provide insights into the capabilities and synergy of these two approaches. Through these observations we provide recommendations for use and improvement. △ Less

Submitted 15 July, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

Comments: To be published in ESEM 21; reduced length

arXiv:2106.03458 [pdf, other]

doi 10.1145/3468264.3468595

A Grounded Theory of the Role of Coordination in Software Security Patch Management

Authors: Nesara Dissanayake, Mansooreh Zahedi, Asangi Jayatilaka, Muhammad Ali Babar

Abstract: Several disastrous security attacks can be attributed to delays in patching software vulnerabilities. While researchers and practitioners have paid significant attention to automate vulnerabilities identification and patch development activities of software security patch management, there has been relatively little effort dedicated to gain an in-depth understanding of the socio-technical aspects,… ▽ More Several disastrous security attacks can be attributed to delays in patching software vulnerabilities. While researchers and practitioners have paid significant attention to automate vulnerabilities identification and patch development activities of software security patch management, there has been relatively little effort dedicated to gain an in-depth understanding of the socio-technical aspects, e.g., coordination of interdependent activities of the patching process and patching decisions, that may cause delays in applying security patches. We report on a Grounded Theory study of the role of coordination in security patch management. The reported theory consists of four inter-related dimensions, i.e., causes, breakdowns, constraints, and mechanisms. The theory explains the causes that define the need for coordination among interdependent software and hardware components and multiple stakeholders' decisions, the constraints that can negatively impact coordination, the breakdowns in coordination, and the potential corrective measures. This study provides potentially useful insights for researchers and practitioners who can carefully consider the needs of and devise suitable solutions for supporting the coordination of interdependencies involved in security patch management. △ Less

Submitted 18 June, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

Comments: Accepted for publication at the 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE '21)

arXiv:2104.11906 [pdf, other]

doi 10.1145/3558001

A Review on C3I Systems' Security: Vulnerabilities, Attacks, and Countermeasures

Authors: Hussain Ahmad, Isuru Dharmadasa, Faheem Ullah, M. Ali Babar

Abstract: Command, Control, Communication, and Intelligence (C3I) systems are increasingly used in critical civil and military domains for achieving information superiority, operational efficacy, and greater situational awareness. Unlike traditional systems facing widespread cyber-attacks, the sensitive nature of C3I tactical operations make their cybersecurity a critical concern. For instance, tampering or… ▽ More Command, Control, Communication, and Intelligence (C3I) systems are increasingly used in critical civil and military domains for achieving information superiority, operational efficacy, and greater situational awareness. Unlike traditional systems facing widespread cyber-attacks, the sensitive nature of C3I tactical operations make their cybersecurity a critical concern. For instance, tampering or intercepting confidential information in military battlefields not only damages C3I operations, but also causes irreversible consequences such as loss of human lives and mission failures. Therefore, C3I systems have become a focal point for cyber adversaries. Moreover, technological advancements and modernization of C3I systems have significantly increased the potential risk of cyber-attacks on C3I systems. Consequently, cyber adversaries leverage highly sophisticated attack vectors to exploit security vulnerabilities in C3I systems. Despite the burgeoning significance of cybersecurity for C3I systems, the existing literature lacks a comprehensive review to systematize the body of knowledge on C3I systems' security. Therefore, in this paper, we have gathered, analyzed, and synthesized the state-of-the-art on the cybersecurity of C3I systems. In particular, this paper has identified security vulnerabilities, attack vectors, and countermeasures/defenses for C3I systems. Furthermore, our survey has enabled us to: (i) propose a taxonomy for security vulnerabilities, attack vectors and countermeasures; (ii) interrelate attack vectors with security vulnerabilities and countermeasures; and (iii) propose future research directions for advancing the state-of-the-art on the cybersecurity of C3I systems. △ Less

Submitted 31 January, 2022; v1 submitted 24 April, 2021; originally announced April 2021.

arXiv:2104.01340 [pdf, ps, other]

Retrolensing by a spherically symmetric naked singularity

Authors: Gulmina Zaman Babar, Farruh Atamurotov, Abdullah Zaman Babar, Yen-Kheng Lim

Abstract: Considering a strong field limit, we investigate the retrolensing phenomenon in the vicinity of a Janis-Newman-Winicour (JNW) naked singularity embedded in a scalar field. We assume that the light rays from a nearby source are reflected by the photon sphere of the naked singularity, acting as a lens, to create a pair of images. The analytic expressions of the lensing coefficients $\bar{a}$ and… ▽ More Considering a strong field limit, we investigate the retrolensing phenomenon in the vicinity of a Janis-Newman-Winicour (JNW) naked singularity embedded in a scalar field. We assume that the light rays from a nearby source are reflected by the photon sphere of the naked singularity, acting as a lens, to create a pair of images. The analytic expressions of the lensing coefficients $\bar{a}$ and $\bar{b}$ are obtained, which are scalar dependent and discordant with \cite{Bozza:2002b}. Moreover, considering the powerful supermassive black hole candidates Sgr A* and M87*, we examined the influence of the scalar field on the apparent brightness and angular positions of the parity images. Our results are highlighted in correspondence with a non-scalar field gravity, specifically the Schwarzschild gravity. We have found that the brightness increases in the presence of the scalar field, while, the lensing coefficients $\bar{a}$, $\bar{b}$, the angular positions and the angular separation of the relativistic images experience a reverse effect. △ Less

Submitted 6 December, 2021; v1 submitted 3 April, 2021; originally announced April 2021.

Comments: 16 pages, 7 figures

arXiv:2103.11316 [pdf, other]

doi 10.1109/MSR.2019.00063

Automated Software Vulnerability Assessment with Concept Drift

Authors: Triet H. M. Le, Bushra Sabir, M. Ali Babar

Abstract: Software Engineering researchers are increasingly using Natural Language Processing (NLP) techniques to automate Software Vulnerabilities (SVs) assessment using the descriptions in public repositories. However, the existing NLP-based approaches suffer from concept drift. This problem is caused by a lack of proper treatment of new (out-of-vocabulary) terms for the evaluation of unseen SVs over time… ▽ More Software Engineering researchers are increasingly using Natural Language Processing (NLP) techniques to automate Software Vulnerabilities (SVs) assessment using the descriptions in public repositories. However, the existing NLP-based approaches suffer from concept drift. This problem is caused by a lack of proper treatment of new (out-of-vocabulary) terms for the evaluation of unseen SVs over time. To perform automated SVs assessment with concept drift using SVs' descriptions, we propose a systematic approach that combines both character and word features. The proposed approach is used to predict seven Vulnerability Characteristics (VCs). The optimal model of each VC is selected using our customized time-based cross-validation method from a list of eight NLP representations and six well-known Machine Learning models. We have used the proposed approach to conduct large-scale experiments on more than 100,000 SVs in the National Vulnerability Database (NVD). The results show that our approach can effectively tackle the concept drift issue of the SVs' descriptions reported from 2000 to 2018 in NVD even without retraining the model. In addition, our approach performs competitively compared to the existing word-only method. We also investigate how to build compact concept-drift-aware models with much fewer features and give some recommendations on the choice of classifiers and NLP representations for SVs assessment. △ Less

Submitted 21 March, 2021; originally announced March 2021.

Comments: Published as a full paper at the 16th International Conference on Mining Software Repositories 2019

Journal ref: Proceedings of the 16th International Conference on Mining Software Repositories, 2019, pp. 371-382

arXiv:2103.08306 [pdf, other]

ReinforceBug: A Framework to Generate Adversarial Textual Examples

Authors: Bushra Sabir, M. Ali Babar, Raj Gaire

Abstract: Adversarial Examples (AEs) generated by perturbing original training examples are useful in improving the robustness of Deep Learning (DL) based models. Most prior works, generate AEs that are either unconscionable due to lexical errors or semantically or functionally deviant from original examples. In this paper, we present ReinforceBug, a reinforcement learning framework, that learns a policy th… ▽ More Adversarial Examples (AEs) generated by perturbing original training examples are useful in improving the robustness of Deep Learning (DL) based models. Most prior works, generate AEs that are either unconscionable due to lexical errors or semantically or functionally deviant from original examples. In this paper, we present ReinforceBug, a reinforcement learning framework, that learns a policy that is transferable on unseen datasets and generates utility-preserving and transferable (on other models) AEs. Our results show that our method is on average 10% more successful as compared to the state-of-the-art attack TextFooler. Moreover, the target models have on average 73.64% confidence in the wrong prediction, the generated AEs preserve the functional equivalence and semantic similarity (83.38% ) to their original counterparts, and are transferable on other models with an average success rate of 46%. △ Less

Submitted 11 March, 2021; originally announced March 2021.

Comments: Accepted in NAACL-HLT 2021

arXiv:2103.08266 [pdf, other]

doi 10.1016/j.infsof.2021.106700

Challenges and solutions when adopting DevSecOps: A systematic review

Authors: Roshan N. Rajapakse, Mansooreh Zahedi, M. Ali Babar, Haifeng Shen

Abstract: Context: DevOps has become one of the fastest-growing software development paradigms in the industry. However, this trend has presented the challenge of ensuring secure software delivery while maintaining the agility of DevOps. The efforts to integrate security in DevOps have resulted in the DevSecOps paradigm, which is gaining significant interest from both industry and academia. However, the ado… ▽ More Context: DevOps has become one of the fastest-growing software development paradigms in the industry. However, this trend has presented the challenge of ensuring secure software delivery while maintaining the agility of DevOps. The efforts to integrate security in DevOps have resulted in the DevSecOps paradigm, which is gaining significant interest from both industry and academia. However, the adoption of DevSecOps in practice is proving to be a challenge. Objective: This study aims to systemize the knowledge about the challenges faced by practitioners when adopting DevSecOps and the proposed solutions reported in the literature. We also aim to identify the areas that need further research in the future. Method: We conducted a Systematic Literature Review of 54 peer-reviewed studies. The thematic analysis method was applied to analyze the extracted data. Results: We identified 21 challenges related to adopting DevSecOps, 31 specific solutions, and the map** between these findings. We also determined key gap areas in this domain by holistically evaluating the available solutions against the challenges. The results of the study were classified into four themes: People, Practices, Tools, and Infrastructure. Our findings demonstrate that tool-related challenges and solutions were the most frequently reported, driven by the need for automation in this paradigm. Shift-left security and continuous security assessment were two key practices recommended for DevSecOps. Conclusions: We highlight the need for developer-centered application security testing tools that target the continuous practices in DevSecOps. More research is needed on how the traditionally manual security practices can be automated to suit rapid software deployment cycles. Finally, achieving a suitable balance between the speed of delivery and security is a significant issue practitioners face in the DevSecOps paradigm. △ Less

Submitted 29 July, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

Comments: Addressed reviewer comments from the Information and Software Technology (IST) journal

Journal ref: Information and software technology 141 (2022) 106700

arXiv:2103.00316 [pdf, ps, other]

Gravitational lensing in 4-D Einstein-Gauss-Bonnet gravity in the presence of plasma

Authors: Gulmina Zaman Babar, Farruh Atamurotov, Abdullah Zaman Babar

Abstract: In this paper we have assumed a weak-field regime to explore the gravitational lensed photons in a 4 dimensional Einstein-Gauss-Bonnet gravity, which is very much in the limelight these days. The investigation is conducted in three distinct paradigms: uniform plasma, singular isothermal sphere and a non-singular isothermal sphere. The lensing angle associated with the distribution factor of the me… ▽ More In this paper we have assumed a weak-field regime to explore the gravitational lensed photons in a 4 dimensional Einstein-Gauss-Bonnet gravity, which is very much in the limelight these days. The investigation is conducted in three distinct paradigms: uniform plasma, singular isothermal sphere and a non-singular isothermal sphere. The lensing angle associated with the distribution factor of the medium is individually derived for each case and further utilized to study the magnification of the image source brightness, selectively for the uniform plasma and singular isothermal sphere. The attained results are brought forth in contrast with the standard Schwarzschild geometry △ Less

Submitted 2 August, 2021; v1 submitted 27 February, 2021; originally announced March 2021.

Comments: 10 pages, 13 figures

Journal ref: Physics of the Dark Universe, 32 (2021) 100798

arXiv:2101.10412 [pdf]

End-Users' Knowledge and Perception about Security of Mobile Health Apps: A Case Study with Two Saudi Arabian mHealth Providers

Authors: Bakheet Aljedaani, Aakash Ahmad, Mansooreh Zahedi, M. Ali Babar

Abstract: Mobile health applications (mHealth apps for short) are being increasingly adopted in the healthcare sector, enabling stakeholders such as governments, health units, medics, and patients, to utilize health services in a pervasive manner. Despite having several known benefits, mHealth apps entail significant security and privacy challenges that can lead to data breaches with serious social, legal,… ▽ More Mobile health applications (mHealth apps for short) are being increasingly adopted in the healthcare sector, enabling stakeholders such as governments, health units, medics, and patients, to utilize health services in a pervasive manner. Despite having several known benefits, mHealth apps entail significant security and privacy challenges that can lead to data breaches with serious social, legal, and financial consequences. This research presents an empirical investigation about security awareness of end-users of mHealth apps that are available on major mobile platforms, including Android and iOS. We collaborated with two mHealth providers in Saudi Arabia to survey 101 end-users, investigating their security awareness about (i) existing and desired security features, (ii) security related issues, and (iii) methods to improve security knowledge. Findings indicate that majority of the end-users are aware of the existing security features provided by the apps (e.g., restricted app permissions); however, they desire usable security (e.g., biometric authentication) and are concerned about privacy of their health information (e.g., data anonymization). End-users suggested that protocols such as session timeout or Two-factor authentication (2FA) positively impact security but compromise usability of the app. Security-awareness via social media, peer guidance, or training from app providers can increase end-users trust in mHealth apps. This research investigates human-centric knowledge based on empirical evidence and provides a set of guidelines to develop secure and usable mHealth apps. △ Less

Submitted 23 September, 2021; v1 submitted 25 January, 2021; originally announced January 2021.

Comments: This research is 29 pages. It has 9 figures, and 7 tables

arXiv:2012.09344 [pdf, other]

Machine Learning for Detecting Data Exfiltration: A Review

Authors: Bushra Sabir, Faheem Ullah, M. Ali Babar, Raj Gaire

Abstract: Context: Research at the intersection of cybersecurity, Machine Learning (ML), and Software Engineering (SE) has recently taken significant steps in proposing countermeasures for detecting sophisticated data exfiltration attacks. It is important to systematically review and synthesize the ML-based data exfiltration countermeasures for building a body of knowledge on this important topic. Objective… ▽ More Context: Research at the intersection of cybersecurity, Machine Learning (ML), and Software Engineering (SE) has recently taken significant steps in proposing countermeasures for detecting sophisticated data exfiltration attacks. It is important to systematically review and synthesize the ML-based data exfiltration countermeasures for building a body of knowledge on this important topic. Objective: This paper aims at systematically reviewing ML-based data exfiltration countermeasures to identify and classify ML approaches, feature engineering techniques, evaluation datasets, and performance metrics used for these countermeasures. This review also aims at identifying gaps in research on ML-based data exfiltration countermeasures. Method: We used a Systematic Literature Review (SLR) method to select and review {92} papers. Results: The review has enabled us to (a) classify the ML approaches used in the countermeasures into data-driven, and behaviour-driven approaches, (b) categorize features into six types: behavioural, content-based, statistical, syntactical, spatial and temporal, (c) classify the evaluation datasets into simulated, synthesized, and real datasets and (d) identify 11 performance measures used by these studies. Conclusion: We conclude that: (i) the integration of data-driven and behaviour-driven approaches should be explored; (ii) There is a need of develo** high quality and large size evaluation datasets; (iii) Incremental ML model training should be incorporated in countermeasures; (iv) resilience to adversarial learning should be considered and explored during the development of countermeasures to avoid poisoning attacks; and (v) the use of automated feature engineering should be encouraged for efficiently detecting data exfiltration attacks. △ Less

Submitted 21 March, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

arXiv:2012.03754 [pdf]

Deep Learning Methods for Credit Card Fraud Detection

Authors: Thanh Thi Nguyen, Hammad Tahir, Mohamed Abdelrazek, Ali Babar

Abstract: Credit card frauds are at an ever-increasing rate and have become a major problem in the financial sector. Because of these frauds, card users are hesitant in making purchases and both the merchants and financial institutions bear heavy losses. Some major challenges in credit card frauds involve the availability of public data, high class imbalance in data, changing nature of frauds and the high n… ▽ More Credit card frauds are at an ever-increasing rate and have become a major problem in the financial sector. Because of these frauds, card users are hesitant in making purchases and both the merchants and financial institutions bear heavy losses. Some major challenges in credit card frauds involve the availability of public data, high class imbalance in data, changing nature of frauds and the high number of false alarms. Machine learning techniques have been used to detect credit card frauds but no fraud detection systems have been able to offer great efficiency to date. Recent development of deep learning has been applied to solve complex problems in various areas. This paper presents a thorough study of deep learning methods for the credit card fraud detection problem and compare their performance with various machine learning algorithms on three different financial datasets. Experimental results show great performance of the proposed deep learning methods against traditional machine learning models and imply that the proposed approaches can be implemented effectively for real-world credit card fraud detection systems. △ Less

Submitted 7 December, 2020; originally announced December 2020.

arXiv:2012.00544 [pdf, other]

Software Security Patch Management -- A Systematic Literature Review of Challenges, Approaches, Tools and Practices

Authors: Nesara Dissanayake, Asangi Jayatilaka, Mansooreh Zahedi, M. Ali Babar

Abstract: Context: Software security patch management purports to support the process of patching known software security vulnerabilities. Given the increasing recognition of the importance of software security patch management, it is important and timely to systematically review and synthesise the relevant literature on this topic. Objective: This paper aims at systematically reviewing the state of the a… ▽ More Context: Software security patch management purports to support the process of patching known software security vulnerabilities. Given the increasing recognition of the importance of software security patch management, it is important and timely to systematically review and synthesise the relevant literature on this topic. Objective: This paper aims at systematically reviewing the state of the art of software security patch management to identify the socio-technical challenges in this regard, reported solutions (i.e., approaches, tools, and practices), the rigour of the evaluation and the industrial relevance of the reported solutions, and to identify the gaps for future research. Method: We conducted a systematic literature review of 72 studies published from 2002 to March 2020, with extended coverage until September 2020 through forward snowballing. Results: We identify 14 socio-technical challenges, 18 solution approaches, tools and practices mapped onto the software security patch management process. We provide a map** between the solutions and challenges to enable a reader to obtain a holistic overview of the gap areas. The findings also reveal that only 20.8% of the reported solutions have been rigorously evaluated in industrial settings. Conclusion: Our results reveal that 50% of the common challenges have not been directly addressed in the solutions and that most of them (38.9%) address the challenges in one phase of the process, namely vulnerability scanning, assessment and prioritisation. Based on the results that highlight the important concerns in software security patch management and the lack of solutions, we recommend a list of future research directions. This study also provides useful insights about different opportunities for practitioners to adopt new solutions and understand the variations of their practical utility. △ Less

Submitted 19 August, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

Comments: 45 pages, 7 figures

arXiv:2008.13009 [pdf]

Security Awareness of End-Users of Mobile Health Applications: An Empirical Study

Authors: Bakheet Aljedaani, Aakash Ahmad, Mansooreh Zahedi, M. Ali Babar

Abstract: Mobile systems offer portable and interactive computing, empowering users, to exploit a multitude of context-sensitive services, including mobile healthcare. Mobile health applications (i.e., mHealth apps) are revolutionizing the healthcare sector by enabling stakeholders to produce and consume healthcare services. A widespread adoption of mHealth technologies and rapid increase in mHealth apps en… ▽ More Mobile systems offer portable and interactive computing, empowering users, to exploit a multitude of context-sensitive services, including mobile healthcare. Mobile health applications (i.e., mHealth apps) are revolutionizing the healthcare sector by enabling stakeholders to produce and consume healthcare services. A widespread adoption of mHealth technologies and rapid increase in mHealth apps entail a critical challenge, i.e., lack of security awareness by end-users regarding health-critical data. This paper presents an empirical study aimed at exploring the security awareness of end-users of mHealth apps. We collaborated with two mHealth providers in Saudi Arabia to gather data from 101 end-users. The results reveal that despite having the required knowledge, end-users lack appropriate behaviour , i.e., reluctance or lack of understanding to adopt security practices, compromising health-critical data with social, legal, and financial consequences. The results emphasize that mHealth providers should ensure security training of end-users (e.g., threat analysis workshops), promote best practices to enforce security (e.g., multi-step authentication), and adopt suitable mHealth apps (e.g., trade-offs for security vs usability). The study provides empirical evidence and a set of guidelines about security awareness of mHealth apps. △ Less

Submitted 29 August, 2020; originally announced August 2020.

Comments: 10 pages, 4 figures, 5 tables

arXiv:2008.05845 [pdf, ps, other]

doi 10.1140/epjc/s10052-020-8346-3

Optical properties of Kerr-Newman spacetime in the presence of plasma

Authors: Gulmina Zaman Babar, Abdullah Zaman Babar, Farruh Atamurotov

Abstract: We have studied the null geodesics in the background of the Kerr-Newman black hole veiled by a plasma medium using the Hamilton-Jacobi method. The in uence of black hole's charge and plasma parameters on the effective potential and the generic photon orbits has been investigated. Furthermore, our discussion embodies the effects of black hole's charge, plasma and the inclination angle on the shadow… ▽ More We have studied the null geodesics in the background of the Kerr-Newman black hole veiled by a plasma medium using the Hamilton-Jacobi method. The in uence of black hole's charge and plasma parameters on the effective potential and the generic photon orbits has been investigated. Furthermore, our discussion embodies the effects of black hole's charge, plasma and the inclination angle on the shadow cast by the gravity with and without the spin parameter. We examined the energy released from the black hole as a result of the thermal radiations, which exclusively depends on the size of the shadow. The angle of de ection of the massless particles is also explored considering a weak-field approximation.We present our results in juxtaposition to the analogous black holes in General Relativity, particularly the Schwarzschild and Kerr black hole. △ Less

Submitted 9 June, 2022; v1 submitted 13 August, 2020; originally announced August 2020.

Comments: 10 pages, 8 figures, some mistakes are corrected

Journal ref: Eur. Phys. J. C 80 (2020) 761; Eur. Phys. J. C 82 (2022) 405 (Erratum)

arXiv:2008.04467 [pdf, other]

doi 10.1145/3382494.3410693

Challenges in Docker Development: A Large-scale Study Using Stack Overflow

Authors: Mubin Ul Haque, Leonardo Horn Iwaya, M. Ali Babar

Abstract: Docker technology has been increasingly used among software developers in a multitude of projects. This growing interest is due to the fact that Docker technology supports a convenient process for creating and building containers, promoting close cooperation between developer and operations teams, and enabling continuous software delivery. As a fast-growing technology, it is important to identify… ▽ More Docker technology has been increasingly used among software developers in a multitude of projects. This growing interest is due to the fact that Docker technology supports a convenient process for creating and building containers, promoting close cooperation between developer and operations teams, and enabling continuous software delivery. As a fast-growing technology, it is important to identify the Docker-related topics that are most popular as well as existing challenges and difficulties that developers face. This paper presents a large-scale empirical study identifying practitioners' perspectives on Docker technology by mining posts from the Stack Overflow (SoF) community. Method: A dataset of 113,922 Docker-related posts was created based on a set of relevant tags and contents. The dataset was cleaned and prepared. Topic modelling was conducted using Latent Dirichlet Allocation (LDA), allowing the identification of dominant topics in the domain. Our results show that most developers use SoF to ask about a broad spectrum of Docker topics including framework development, application deployment, continuous integration, web-server configuration and many more. We determined that 30 topics that developers discuss can be grouped into 13 main categories. Most of the posts belong to categories of application development, configuration, and networking. On the other hand, we find that the posts on monitoring status, transferring data, and authenticating users are more popular among developers compared to the other topics. Specifically, developers face challenges in web browser issues, networking error and memory management. Besides, there is a lack of experts in this domain. Our research findings will guide future work on the development of new tools and techniques, hel** the community to focus efforts and understand existing trade-offs on Docker topics. △ Less

Submitted 10 August, 2020; originally announced August 2020.

Comments: 11 pages, 3 Figures, conference

arXiv:2008.04176 [pdf, other]

doi 10.1145/3463274.3463331

A Large-scale Study of Security Vulnerability Support on Developer Q&A Websites

Authors: Triet H. M. Le, Roland Croft, David Hin, M. Ali Babar

Abstract: Context: Security Vulnerabilities (SVs) pose many serious threats to software systems. Developers usually seek solutions to addressing these SVs on developer Question and Answer (Q&A) websites. However, there is still little known about on-going SV-specific discussions on different developer Q&A sites. Objective: We present a large-scale empirical study to understand developers' SV discussions and… ▽ More Context: Security Vulnerabilities (SVs) pose many serious threats to software systems. Developers usually seek solutions to addressing these SVs on developer Question and Answer (Q&A) websites. However, there is still little known about on-going SV-specific discussions on different developer Q&A sites. Objective: We present a large-scale empirical study to understand developers' SV discussions and how these discussions are being supported by Q&A sites. Method: We first curate 71,329 SV posts from two large Q&A sites, namely Stack Overflow (SO) and Security StackExchange (SSE). We then use topic modeling to uncover the topics of SV-related discussions and analyze the popularity, difficulty, and level of expertise for each topic. We also perform a qualitative analysis to identify the types of solutions to SV-related questions. Results: We identify 13 main SV discussion topics on Q&A sites. Many topics do not follow the distributions and trends in expert-based security sources such as Common Weakness Enumeration (CWE) and Open Web Application Security Project (OWASP). We also discover that SV discussions attract more experts to answer than many other domains, but some difficult SV topics (e.g., Vulnerability Scanning Tools) still receive quite limited support from experts. Moreover, we identify seven key types of answers given to SV questions on Q&A sites, in which SO often provides code and instructions, while SSE usually gives experience-based advice and explanations. Conclusion: Our findings provide support for researchers and practitioners to effectively acquire, share and leverage SV knowledge on Q&A sites. △ Less

Submitted 21 April, 2021; v1 submitted 10 August, 2020; originally announced August 2020.

Comments: Accepted for publication at the 25th International Conference on Evaluation and Assessment in Software Engineering (EASE 2021)

arXiv:2008.03034 [pdf]

An Empirical Study on Develo** Secure Mobile Health Apps: The Developers Perspective

Authors: Bakheet Aljedaani, Aakash Ahmad, Mansooreh Zahedi, M. Ali Babar

Abstract: Mobile apps exploit embedded sensors and wireless connectivity of a device to empower users with portable computations, context-aware communication, and enhanced interaction. Specifically, mobile health apps (mHealth apps for short) are becoming integral part of mobile and pervasive computing to improve the availability and quality of healthcare services. Despite the offered benefits, mHealth apps… ▽ More Mobile apps exploit embedded sensors and wireless connectivity of a device to empower users with portable computations, context-aware communication, and enhanced interaction. Specifically, mobile health apps (mHealth apps for short) are becoming integral part of mobile and pervasive computing to improve the availability and quality of healthcare services. Despite the offered benefits, mHealth apps face a critical challenge, i.e., security of health critical data that is produced and consumed by the app. Several studies have revealed that security specific issues of mHealth apps have not been adequately addressed. The objectives of this study are to empirically (a) investigate the challenges that hinder development of secure mHealth apps, (b) identify practices to develop secure apps, and (c) explore motivating factors that influence secure development. We conducted this study by collecting responses of 97 developers from 25 countries, across 06 continents, working in diverse teams and roles to develop mHealth apps for Android, iOS, and Windows platform. Qualitative analysis of the survey data is based on (i) 8 critical challenges, (ii) taxonomy of best practices to ensure security, and (iii) 6 motivating factors that impact secure mHealth apps. This research provides empirical evidence as practitioners view and guidelines to develop emerging and next generation of secure mHealth apps. △ Less

Submitted 7 August, 2020; originally announced August 2020.

Comments: 10 pages, 5 figures

arXiv:2007.15826 [pdf, other]

The Impact of Distance on Performance and Scalability of Distributed Database Systems in Hybrid Clouds

Authors: Yaser Mansouri, M. Ali Babar

Abstract: The increasing need for managing big data has led the emergence of advanced database management systems. There has been increased efforts aimed at evaluating the performance and scalability of NoSQL and Relational databases hosted by either private or public cloud datacenters. However, there has been little work on evaluating the performance and scalability of these databases in hybrid clouds, whe… ▽ More The increasing need for managing big data has led the emergence of advanced database management systems. There has been increased efforts aimed at evaluating the performance and scalability of NoSQL and Relational databases hosted by either private or public cloud datacenters. However, there has been little work on evaluating the performance and scalability of these databases in hybrid clouds, where the distance between private and public cloud datacenters can be one of the key factors that can affect their performance. Hence, in this paper, we present a detailed evaluation of throughput, scalability, and VMs size vs. VMs number for six modern databases in a hybrid cloud, consisting of a private cloud in Adelaide and Azure based datacenter in Sydney, Mumbai, and Virginia regions. Based on results, as the distance between private and public clouds increases, the throughput performance of most databases reduces. Second, MongoDB obtains the best throughput performance, followed by MySQL C luster, whilst Cassandra exposes the most fluctuation in through performance. Third, vertical scalability improves the throughput of databases more than the horizontal scalability. Forth, exploiting bigger VMs rather than more VMs with less cores can increase throughput performance for Cassandra, Riak, and Redis. △ Less

Submitted 30 July, 2020; originally announced July 2020.

Comments: 26 pages

arXiv:2007.10876 [pdf]

Challenges in Develo** Secure Mobile Health Applications, A Systematic Review

Authors: Bakheet Aljedaani, M. Ali Babar

Abstract: Mobile health (mHealth) applications (apps) have gained significant popularity over the last few years due to its tremendous benefits, such as lowering healthcare cost and increasing patient awareness. However, the sensitivity of healthcare data makes the security of mHealth apps a serious concern. In this review, we aim to identify and analyse the reported challenges that the developers of mHealt… ▽ More Mobile health (mHealth) applications (apps) have gained significant popularity over the last few years due to its tremendous benefits, such as lowering healthcare cost and increasing patient awareness. However, the sensitivity of healthcare data makes the security of mHealth apps a serious concern. In this review, we aim to identify and analyse the reported challenges that the developers of mHealth apps face concerning security. Additionally, our study aimed to develop a conceptual framework with the challenges faced by mHealth apps development organization for develo** secure apps. The knowledge of such challenges can help to reduce the risk of develo** insecure mHealth apps. We followed the Systematic Literature Review method for this review. We selected studies that have been published between January 2008 and October 2020. We selected 32 primary studies using predefined criteria and used thematic analysis method for analysing the extracted data. We identified nine challenges that can affect the development of secure mHealth apps. Such as 1) lack of security guidelines and regulations for develo** secure mHealth apps, 2) developers lack of knowledge and expertise for secure mHealth app development, 3) lack of stakeholders involvement during mHealth app development, etc . Based on our analysis, we have presented a conceptual framework which highlights the correlation between the identified challenges. We conclude that our findings can help them identify their weaknesses and improve their security practices. Similarly, mHealth apps developers can identify the challenges they face to develop mHealth apps that do not pose security risks for users. Our review is a step towards providing insights into the development of secure mHealth apps. Our proposed conceptual framework can act as a practice guideline for practitioners to enhance secure mHealth apps development. △ Less

Submitted 15 January, 2021; v1 submitted 21 July, 2020; originally announced July 2020.

Comments: This paper has 5 figures and 1 table

arXiv:2006.14185 [pdf, ps, other]

Optimizing Affine Maximizer Auctions via Linear Programming: an Application to Revenue Maximizing Mechanism Design for Zero-Day Exploits Markets

Authors: Mingyu Guo, Hideaki Hata, Ali Babar

Abstract: Optimizing within the affine maximizer auctions (AMA) is an effective approach for revenue maximizing mechanism design. The AMA mechanisms are strategy-proof and individually rational (if the agents' valuations for the outcomes are nonnegative). Every AMA mechanism is characterized by a list of parameters. By focusing on the AMA mechanisms, we turn mechanism design into a value optimization proble… ▽ More Optimizing within the affine maximizer auctions (AMA) is an effective approach for revenue maximizing mechanism design. The AMA mechanisms are strategy-proof and individually rational (if the agents' valuations for the outcomes are nonnegative). Every AMA mechanism is characterized by a list of parameters. By focusing on the AMA mechanisms, we turn mechanism design into a value optimization problem, where we only need to adjust the parameters. We propose a linear programming based heuristic for optimizing within the AMA family. We apply our technique to revenue maximizing mechanism design for zero-day exploit markets. We show that due to the nature of the zero-day exploit markets, if there are only two agents (one offender and one defender), then our technique generally produces a near optimal mechanism: the mechanism's expected revenue is close to the optimal revenue achieved by the optimal strategy-proof and individually rational mechanism (not necessarily an AMA mechanism). △ Less

Submitted 25 June, 2020; originally announced June 2020.

Journal ref: PRIMA 2017: Principles and Practice of Multi-Agent Systems

arXiv:2006.14184 [pdf, ps, other]

Revenue Maximizing Markets for Zero-Day Exploits

Authors: Mingyu Guo, Hideaki Hata, Ali Babar

Abstract: Markets for zero-day exploits (software vulnerabilities unknown to the vendor) have a long history and a growing popularity. We study these markets from a revenue-maximizing mechanism design perspective. We first propose a theoretical model for zero-day exploits markets. In our model, one exploit is being sold to multiple buyers. There are two kinds of buyers, which we call the defenders and the o… ▽ More Markets for zero-day exploits (software vulnerabilities unknown to the vendor) have a long history and a growing popularity. We study these markets from a revenue-maximizing mechanism design perspective. We first propose a theoretical model for zero-day exploits markets. In our model, one exploit is being sold to multiple buyers. There are two kinds of buyers, which we call the defenders and the offenders. The defenders are buyers who buy vulnerabilities in order to fix them (e.g., software vendors). The offenders, on the other hand, are buyers who intend to utilize the exploits (e.g., national security agencies and police). Our model is more than a single-item auction. First, an exploit is a piece of information, so one exploit can be sold to multiple buyers. Second, buyers have externalities. If one defender wins, then the exploit becomes worthless to the offenders. Third, if we disclose the details of the exploit to the buyers before the auction, then they may leave with the information without paying. On the other hand, if we do not disclose the details, then it is difficult for the buyers to come up with their private valuations. Considering the above, our proposed mechanism discloses the details of the exploit to all offenders before the auction. The offenders then pay to delay the exploit being disclosed to the defenders. △ Less

Submitted 25 June, 2020; originally announced June 2020.

Journal ref: PRIMA 2016: Principles and Practice of Multi-Agent Systems

arXiv:2006.14177 [pdf, ps, other]

Cost Sharing Security Information with Minimal Release Delay

Authors: Mingyu Guo, Yong Yang, Muhammad Ali Babar

Abstract: We study a cost sharing problem derived from bug bounty programs, where agents gain utility by the amount of time they get to enjoy the cost shared information. Once the information is provided to an agent, it cannot be retracted. The goal, instead of maximizing revenue, is to pick a time as early as possible, so that enough agents are willing to cost share the information and enjoy it for a premi… ▽ More We study a cost sharing problem derived from bug bounty programs, where agents gain utility by the amount of time they get to enjoy the cost shared information. Once the information is provided to an agent, it cannot be retracted. The goal, instead of maximizing revenue, is to pick a time as early as possible, so that enough agents are willing to cost share the information and enjoy it for a premium time period, while other agents wait and enjoy the information for free after a certain amount of release delay. We design a series of mechanisms with the goal of minimizing the maximum delay and the total delay. Under prior-free settings, our final mechanism achieves a competitive ratio of $4$ in terms of maximum delay, against an undominated mechanism. Finally, we assume some distributions of the agents' valuations, and investigate our mechanism's performance in terms of expected delays. △ Less

Submitted 25 June, 2020; originally announced June 2020.

Journal ref: PRIMA 2018: Principles and Practice of Multi-Agent Systems

arXiv:2006.12069 [pdf, other]

doi 10.1109/ACCESS.2020.3015962

Security and Privacy for mHealth and uHealth Systems: a Systematic Map** Study

Authors: Leonardo Horn Iwaya, Aakash Ahmad, M. Ali Babar

Abstract: An increased adoption of mobile health (mHealth) and ubiquitous health (uHealth) systems empower users with handheld devices and embedded sensors for a broad range of healthcare services. However, m/uHealth systems face significant challenges related to data security and privacy that must be addressed to increase the pervasiveness of such systems. This study aims to systematically identify, classi… ▽ More An increased adoption of mobile health (mHealth) and ubiquitous health (uHealth) systems empower users with handheld devices and embedded sensors for a broad range of healthcare services. However, m/uHealth systems face significant challenges related to data security and privacy that must be addressed to increase the pervasiveness of such systems. This study aims to systematically identify, classify, compare, and evaluate state-of-the-art on security and privacy of m/uHealth systems. We conducted a systematic map** study (SMS) based on 365 qualitatively selected studies to (i) classify the types, frequency, and demography of published research and (ii) synthesize and categorize research themes, (iii) recurring challenges, (iv) prominent solutions (i.e., research outcomes) and their (v) reported evaluations (i.e., practical validations). Results suggest that the existing research on security and privacy of m/uHealth systems primarily focuses on select group of control families (compliant with NIST800-53), protection of systems and information, access control, authentication, individual participation, and privacy authorisation. In contrast, areas of data governance, security and privacy policies, and program management are under-represented, although these are critical to most of the organizations that employ m/uHealth systems. Most research proposes new solutions with limited validation, reflecting a lack of evaluation of security and privacy of m/uHealth in the real world. Empirical research, development, and validation of m/uHealth security and privacy is still incipient, which may discourage practitioners from readily adopting solutions from the literature. This SMS facilitates knowledge transfer, enabling researchers and practitioners to engineer security and privacy for emerging and next generation of m/uHealth systems. △ Less

Submitted 22 June, 2020; originally announced June 2020.

Comments: 29 pages, 10 figures, in IEEE Access, 2020

arXiv:2006.02833 [pdf, other]

An Automated Implementation of Hybrid Cloud for Performance Evaluation of Distributed Databases

Authors: Yaser Mansouri, Victor Prokhorenko, M. Ali Babar

Abstract: A Hybrid cloud is an integration of resources between private and public clouds. It enables users to horizontally scale their on-premises infrastructure up to public clouds in order to improve performance and cut up-front investment cost. This model of applications deployment is called cloud bursting that allows data-intensive applications especially distributed database systems to have the benefi… ▽ More A Hybrid cloud is an integration of resources between private and public clouds. It enables users to horizontally scale their on-premises infrastructure up to public clouds in order to improve performance and cut up-front investment cost. This model of applications deployment is called cloud bursting that allows data-intensive applications especially distributed database systems to have the benefit of both private and public clouds. In this work, we present an automated implementation of a hybrid cloud using (i) a robust and zero-cost Linux-based VPN to make a secure connection between private and public clouds, and (ii) Terraform as a software tool to deploy infrastructure resources based on the requirements of hybrid cloud. We also explore performance evaluation of cloud bursting for six modern and distributed database systems on the hybrid cloud spanning over local OpenStack and Microsoft Azure. Our results reveal that MongoDB and MySQL Cluster work efficient in terms of throughput and operations latency if they burst into a public cloud to supply their resources. In contrast, the performance of Cassandra, Riak, Redis, and Couchdb reduces if they significantly leverage their required resources via cloud bursting. △ Less

Submitted 4 June, 2020; originally announced June 2020.

Journal ref: Journal of Network and Computer Applications (JNCA), 2020

arXiv:2005.08454 [pdf, other]

Reliability and Robustness analysis of Machine Learning based Phishing URL Detectors

Authors: Bushra Sabir, M. Ali Babar, Raj Gaire, Alsharif Abuadbba

Abstract: ML-based Phishing URL (MLPU) detectors serve as the first level of defence to protect users and organisations from being victims of phishing attacks. Lately, few studies have launched successful adversarial attacks against specific MLPU detectors raising questions about their practical reliability and usage. Nevertheless, the robustness of these systems has not been extensively investigated. There… ▽ More ML-based Phishing URL (MLPU) detectors serve as the first level of defence to protect users and organisations from being victims of phishing attacks. Lately, few studies have launched successful adversarial attacks against specific MLPU detectors raising questions about their practical reliability and usage. Nevertheless, the robustness of these systems has not been extensively investigated. Therefore, the security vulnerabilities of these systems, in general, remain primarily unknown which calls for testing the robustness of these systems. In this article, we have proposed a methodology to investigate the reliability and robustness of 50 representative state-of-the-art MLPU models. Firstly, we have proposed a cost-effective Adversarial URL generator URLBUG that created an Adversarial URL dataset. Subsequently, we reproduced 50 MLPU (traditional ML and Deep learning) systems and recorded their baseline performance. Lastly, we tested the considered MLPU systems on Adversarial Dataset and analyzed their robustness and reliability using box plots and heat maps. Our results showed that the generated adversarial URLs have valid syntax and can be registered at a median annual price of \$11.99. Out of 13\% of the already registered adversarial URLs, 63.94\% were used for malicious purposes. Moreover, the considered MLPU models Matthew Correlation Coefficient (MCC) dropped from a median 0.92 to 0.02 when tested against $Adv_\mathrm{data}$, indicating that the baseline MLPU models are unreliable in their current form. Further, our findings identified several security vulnerabilities of these systems and provided future directions for researchers to design dependable and secure MLPU systems. △ Less

Submitted 24 November, 2022; v1 submitted 18 May, 2020; originally announced May 2020.

Comments: Accepted in Transactions of Dependable and Secure Computing (SI-Reliability and Robustness in AI-Based Cybersecurity Solutions)

arXiv:2005.07883 [pdf]

Architectural Design Space for Modelling and Simulation as a Service: A Review

Authors: Mojtaba Shahin, M. Ali Babar, Muhammad Aufeef Chauhan

Abstract: Modelling and Simulation as a Service (MSaaS) is a promising approach to deploy and execute Modelling and Simulation (M&S) applications quickly and on-demand. An appropriate software architecture is essential to deliver quality M&S applications following the MSaaS concept to a wide range of users. This study aims to characterize the state-of-the-art MSaaS architectures by conducting a systematic r… ▽ More Modelling and Simulation as a Service (MSaaS) is a promising approach to deploy and execute Modelling and Simulation (M&S) applications quickly and on-demand. An appropriate software architecture is essential to deliver quality M&S applications following the MSaaS concept to a wide range of users. This study aims to characterize the state-of-the-art MSaaS architectures by conducting a systematic review of 31 papers published from 2010 to 2018. Our findings reveal that MSaaS applications are mainly designed using layered architecture style, followed by service-oriented architecture, component-based architecture, and pluggable component-based architecture. We also found that interoperability and deployability have the greatest importance in the architecture of MSaaS applications. In addition, our study indicates that the current MSaaS architectures do not meet the critical user requirements of modern M&S applications appropriately. Based on our results, we recommend that there is a need for more effort and research to (1) design the user interfaces that enable users to build and configure simulation models with minimum effort and limited domain knowledge, (2) provide mechanisms to improve the deployability of M&S applications, and (3) gain a deep insight into how M&S applications should be architected to respond to the emerging user requirements in the military domain. △ Less

Submitted 31 July, 2020; v1 submitted 16 May, 2020; originally announced May 2020.

Comments: 38 Pages, To appear in Journal of Systems and Software (JSS), 2020

arXiv:2003.06108 [pdf]

On the Role of Software Architecture in DevOps Transformation: An Industrial Case Study

Authors: Mojtaba Shahin, M. Ali Babar

Abstract: Development and Operations (DevOps), a particular type of Continuous Software Engineering, has become a popular Software System Engineering paradigm. Software architecture is critical in succeeding with DevOps. However, there is little evidence-based knowledge of how software systems are architected in the industry to enable and support DevOps. Since architectural decisions, along with their ratio… ▽ More Development and Operations (DevOps), a particular type of Continuous Software Engineering, has become a popular Software System Engineering paradigm. Software architecture is critical in succeeding with DevOps. However, there is little evidence-based knowledge of how software systems are architected in the industry to enable and support DevOps. Since architectural decisions, along with their rationales and implications, are very important in the architecting process, we performed an industrial case study that has empirically identified and synthesized the key architectural decisions considered essential to DevOps transformation by two software development teams. Our study also reveals that apart from the chosen architecture style, DevOps works best with modular architectures. In addition, we found that the performance of the studied teams can improve in DevOps if operations specialists are added to the teams to perform the operations tasks that require advanced expertise. Finally, investment in testing is inevitable for the teams if they want to release software changes faster. △ Less

Submitted 13 March, 2020; originally announced March 2020.

Comments: 10 pages, To appear in International Conference on Software and Systems Process (ICSSP 2020)

arXiv:2003.03741 [pdf]

doi 10.1145/3379597.3387443

PUMiner: Mining Security Posts from Developer Question and Answer Websites with PU Learning

Authors: Triet H. M. Le, David Hin, Roland Croft, M. Ali Babar

Abstract: Security is an increasing concern in software development. Developer Question and Answer (Q&A) websites provide a large amount of security discussion. Existing studies have used human-defined rules to mine security discussions, but these works still miss many posts, which may lead to an incomplete analysis of the security practices reported on Q&A websites. Traditional supervised Machine Learning… ▽ More Security is an increasing concern in software development. Developer Question and Answer (Q&A) websites provide a large amount of security discussion. Existing studies have used human-defined rules to mine security discussions, but these works still miss many posts, which may lead to an incomplete analysis of the security practices reported on Q&A websites. Traditional supervised Machine Learning methods can automate the mining process; however, the required negative (non-security) class is too expensive to obtain. We propose a novel learning framework, PUMiner, to automatically mine security posts from Q&A websites. PUMiner builds a context-aware embedding model to extract features of the posts, and then develops a two-stage PU model to identify security content using the labelled Positive and Unlabelled posts. We evaluate PUMiner on more than 17.2 million posts on Stack Overflow and 52,611 posts on Security StackExchange. We show that PUMiner is effective with the validation performance of at least 0.85 across all model configurations. Moreover, Matthews Correlation Coefficient (MCC) of PUMiner is 0.906, 0.534 and 0.084 points higher than one-class SVM, positive-similarity filtering, and one-stage PU models on unseen testing posts, respectively. PUMiner also performs well with an MCC of 0.745 for scenarios where string matching totally fails. Even when the ratio of the labelled positive posts to the unlabelled ones is only 1:100, PUMiner still achieves a strong MCC of 0.65, which is 160% better than fully-supervised learning. Using PUMiner, we provide the largest and up-to-date security content on Q&A websites for practitioners and researchers. △ Less

Submitted 8 March, 2020; originally announced March 2020.

Comments: Accepted for publication at the 17th Mining Software Repositories 2020 conference

arXiv:2002.11382 [pdf, other]

Mechanism Design for Public Projects via Neural Networks

Authors: Guanhua Wang, Runqi Guo, Yuko Sakurai, Ali Babar, Mingyu Guo

Abstract: We study mechanism design for nonexcludable and excludable binary public project problems. We aim to maximize the expected number of consumers and the expected social welfare. For the nonexcludable public project model, we identify a sufficient condition on the prior distribution for the conservative equal costs mechanism to be the optimal strategy-proof and individually rational mechanism. For ge… ▽ More We study mechanism design for nonexcludable and excludable binary public project problems. We aim to maximize the expected number of consumers and the expected social welfare. For the nonexcludable public project model, we identify a sufficient condition on the prior distribution for the conservative equal costs mechanism to be the optimal strategy-proof and individually rational mechanism. For general distributions, we propose a dynamic program that solves for the optimal mechanism. For the excludable public project model, we identify a similar sufficient condition for the serial cost sharing mechanism to be optimal for $2$ and $3$ agents. We derive a numerical upper bound. Experiments show that for several common distributions, the serial cost sharing mechanism is close to optimality. The serial cost sharing mechanism is not optimal in general. We design better performing mechanisms via neural networks. Our approach involves several technical innovations that can be applied to mechanism design in general. We interpret the mechanisms as price-oriented rationing-free (PORF) mechanisms, which enables us to move the mechanism's complex (e.g., iterative) decision making off the network, to a separate program. We feed the prior distribution's analytical form into the cost function to provide quality gradients for training. We use supervision to manual mechanisms as a systematic way for initialization. Our approach of "supervision and then gradient descent" is effective for improving manual mechanisms' performances. It is also effective for fixing constraint violations for heuristic-based mechanisms that are infeasible. △ Less

Submitted 26 February, 2020; originally announced February 2020.

arXiv:2002.09190 [pdf]

doi 10.1145/3305268

A Multi-Vocal Review of Security Orchestration

Authors: Chadni Islam, M. Ali Babar, Surya Nepal

Abstract: Organizations use diverse types of security solutions to prevent cyberattacks. Multiple vendors provide security solutions developed using heterogeneous technologies and paradigms. Hence, it is a challenging rather impossible to easily make security solutions to work an integrated fashion. Security orchestration aims at smoothly integrating multivendor security tools that can effectively and effic… ▽ More Organizations use diverse types of security solutions to prevent cyberattacks. Multiple vendors provide security solutions developed using heterogeneous technologies and paradigms. Hence, it is a challenging rather impossible to easily make security solutions to work an integrated fashion. Security orchestration aims at smoothly integrating multivendor security tools that can effectively and efficiently interoperate to support security staff of a Security Operation Centre (SOC). Given the increasing role and importance of security orchestration, there has been an increasing amount of literature on different aspects of security orchestration solutions. However, there has been no effort to systematically review and analyze the reported solutions. We report a Multivocal Literature Review that has systematically selected and reviewed both academic and grey (blogs, web pages, white papers) literature on different aspects of security orchestration published from January 2007 until July 2017. The review has enabled us to provide a working definition of security orchestration and classify the main functionalities of security orchestration into three main areas: unification, orchestration, and automation. We have also identified the core components of a security orchestration platform and categorized the drivers of security orchestration based on technical and socio-technical aspects. We also provide a taxonomy of security orchestration based on the execution environment, automation strategy, deployment type, mode of task and resource type. This review has helped us to reveal several areas of further research and development in security orchestration. △ Less

Submitted 21 February, 2020; originally announced February 2020.

Comments: This paper is published in ACM Computing Survey

ACM Class: D.2

Journal ref: ACM Comput. Surv. 52, 2, Article 37 (April 2019), 45 pages

arXiv:2002.05442 [pdf, other]

doi 10.1145/3383458

Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges

Authors: Triet H. M. Le, Hao Chen, M. Ali Babar

Abstract: Deep Learning (DL) techniques for Natural Language Processing have been evolving remarkably fast. Recently, the DL advances in language modeling, machine translation and paragraph understanding are so prominent that the potential of DL in Software Engineering cannot be overlooked, especially in the field of program learning. To facilitate further research and applications of DL in this field, we p… ▽ More Deep Learning (DL) techniques for Natural Language Processing have been evolving remarkably fast. Recently, the DL advances in language modeling, machine translation and paragraph understanding are so prominent that the potential of DL in Software Engineering cannot be overlooked, especially in the field of program learning. To facilitate further research and applications of DL in this field, we provide a comprehensive review to categorize and investigate existing DL methods for source code modeling and generation. To address the limitations of the traditional source code models, we formulate common program learning tasks under an encoder-decoder framework. After that, we introduce recent DL mechanisms suitable to solve such problems. Then, we present the state-of-the-art practices and discuss their challenges with some recommendations for practitioners and researchers as well. △ Less

Submitted 13 February, 2020; originally announced February 2020.

Journal ref: ACM Comput. Surv., 53, 3 (2020), Article 62

Showing 51–100 of 113 results for author: Babar, A