Skip to main content

Showing 1–50 of 98 results for author: Babar, M A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19765  [pdf, other

    cs.SE cs.LG

    Systematic Literature Review on Application of Learning-based Approaches in Continuous Integration

    Authors: Ali Kazemi Arani, Triet Huynh Minh Le, Mansooreh Zahedi, M. Ali Babar

    Abstract: Context: Machine learning (ML) and deep learning (DL) analyze raw data to extract valuable insights in specific phases. The rise of continuous practices in software projects emphasizes automating Continuous Integration (CI) with these learning-based methods, while the growing adoption of such approaches underscores the need for systematizing knowledge. Objective: Our objective is to comprehensivel… ▽ More

    Submitted 2 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted to be published in IEEE Access

  2. arXiv:2406.18813  [pdf, other

    cs.CR cs.DC cs.SE

    Towards Secure Management of Edge-Cloud IoT Microservices using Policy as Code

    Authors: Samodha Pallewatta, Muhammad Ali Babar

    Abstract: IoT application providers increasingly use MicroService Architecture (MSA) to develop applications that convert IoT data into valuable information. The independently deployable and scalable nature of microservices enables dynamic utilization of edge and cloud resources provided by various service providers, thus improving performance. However, IoT data security should be ensured during multi-domai… ▽ More

    Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: 16 pages, 7 figures, Accepted for full paper presentation at ECSA 2024 conference

  3. arXiv:2406.09737  [pdf, other

    cs.SE

    A Multivocal Review of MLOps Practices, Challenges and Open Issues

    Authors: Beyza Eken, Samodha Pallewatta, Nguyen Khoi Tran, Ayse Tosun, Muhammad Ali Babar

    Abstract: With the increasing trend of Machine Learning (ML) enabled software applications, the paradigm of ML Operations (MLOps) has gained tremendous attention of researchers and practitioners. MLOps encompasses the practices and technologies for streamlining the resources and monitoring needs of operationalizing ML models. Software development practitioners need access to the detailed and easily understa… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 45 pages, 4 figures

  4. arXiv:2406.04902  [pdf, other

    cs.ET

    Beyond Data, Towards Sustainability: A Sydney Case Study on Urban Digital Twins

    Authors: Ammar Sohail, Bojie Shen, Muhammad Aamir Cheema, Mohammed Eunus Ali, Anwaar Ulhaq, Muhammad Ali Babar, Asama Qureshi

    Abstract: As urban areas grapple with unprecedented challenges stemming from population growth and climate change, the emergence of urban digital twins offers a promising solution. This paper presents a case study focusing on Sydney's urban digital twin, a virtual replica integrating diverse real-time and historical data, including weather, crime, emissions, and traffic. Through advanced visualization and d… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  5. arXiv:2405.15293  [pdf, other

    cs.CR

    Transaction Fee Estimation in the Bitcoin System

    Authors: Limeng Zhang, Rui Zhou, Qing Liu, Chengfei Liu, M. Ali Babar

    Abstract: In the Bitcoin system, transaction fees serve as an incentive for blockchain confirmations. In general, a transaction with a higher fee is likely to be included in the next block mined, whereas a transaction with a smaller fee or no fee may be delayed or never processed at all. However, the transaction fee needs to be specified when submitting a transaction and almost cannot be altered thereafter.… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  6. arXiv:2404.17110  [pdf, other

    cs.SE cs.CR cs.LG

    Software Vulnerability Prediction in Low-Resource Languages: An Empirical Study of CodeBERT and ChatGPT

    Authors: Triet H. M. Le, M. Ali Babar, Tung Hoang Thai

    Abstract: Background: Software Vulnerability (SV) prediction in emerging languages is increasingly important to ensure software security in modern systems. However, these languages usually have limited SV data for develo** high-performing prediction models. Aims: We conduct an empirical study to evaluate the impact of SV data scarcity in emerging languages on the state-of-the-art SV prediction model and i… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted in the 4th International Workshop on Software Security co-located with the 28th International Conference on Evaluation and Assessment in Software Engineering (EASE) 2024

  7. arXiv:2404.11294  [pdf, other

    cs.SE

    LogSD: Detecting Anomalies from System Logs through Self-supervised Learning and Frequency-based Masking

    Authors: Yongzheng Xie, Hongyu Zhang, Muhammad Ali Babar

    Abstract: Log analysis is one of the main techniques that engineers use for troubleshooting large-scale software systems. Over the years, many supervised, semi-supervised, and unsupervised log analysis methods have been proposed to detect system anomalies by analyzing system logs. Among these, semi-supervised methods have garnered increasing attention as they strike a balance between relaxed labeled data re… ▽ More

    Submitted 18 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: 23 pages with 11 figures

  8. arXiv:2404.06043  [pdf, other

    cs.DB

    Automatic Configuration Tuning on Cloud Database: A Survey

    Authors: Limeng Zhang, M. Ali Babar

    Abstract: Faced with the challenges of big data, modern cloud database management systems are designed to efficiently store, organize, and retrieve data, supporting optimal performance, scalability, and reliability for complex data processing and analysis. However, achieving good performance in modern databases is non-trivial as they are notorious for having dozens of configurable knobs, such as hardware se… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  9. arXiv:2404.03823  [pdf, other

    cs.CR cs.CL cs.CY

    An Investigation into Misuse of Java Security APIs by Large Language Models

    Authors: Zahra Mousavi, Chadni Islam, Kristen Moore, Alsharif Abuadbba, Muhammad Ali Babar

    Abstract: The increasing trend of using Large Language Models (LLMs) for code generation raises the question of their capability to generate trustworthy code. While many researchers are exploring the utility of code generation for uncovering software vulnerabilities, one crucial but often overlooked aspect is the security Application Programming Interfaces (APIs). APIs play an integral role in upholding sof… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by ACM ASIACCS 2024

  10. arXiv:2401.13199  [pdf, other

    cs.CR cs.CY cs.HC

    Why People Still Fall for Phishing Emails: An Empirical Investigation into How Users Make Email Response Decisions

    Authors: Asangi Jayatilaka, Nalin Asanka Gamagedara Arachchilage, Muhammad Ali Babar

    Abstract: Despite technical and non-technical countermeasures, humans continue to be tricked by phishing emails. How users make email response decisions is a missing piece in the puzzle to identifying why people still fall for phishing emails. We conducted an empirical study using a think-aloud method to investigate how people make 'response decisions' while reading emails. The grounded theory analysis of t… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Journal ref: Symposium on Usable Security and Privacy (USEC) 2024

  11. arXiv:2401.11105  [pdf, other

    cs.SE cs.CR cs.LG

    Are Latent Vulnerabilities Hidden Gems for Software Vulnerability Prediction? An Empirical Study

    Authors: Triet H. M. Le, Xiaoning Du, M. Ali Babar

    Abstract: Collecting relevant and high-quality data is integral to the development of effective Software Vulnerability (SV) prediction models. Most of the current SV datasets rely on SV-fixing commits to extract vulnerable functions and lines. However, none of these datasets have considered latent SVs existing between the introduction and fix of the collected SVs. There is also little known about the useful… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted as a full paper in the technical track at the 21st International Conference on Mining Software Repositories (MSR) 2024

  12. arXiv:2312.06056  [pdf, other

    cs.SE cs.AI cs.CL

    METAL: Metamorphic Testing Framework for Analyzing Large-Language Model Qualities

    Authors: Sangwon Hyun, Mingyu Guo, M. Ali Babar

    Abstract: Large-Language Models (LLMs) have shifted the paradigm of natural language data processing. However, their black-boxed and probabilistic characteristics can lead to potential risks in the quality of outputs in diverse LLM applications. Recent studies have tested Quality Attributes (QAs), such as robustness or fairness, of LLMs by generating adversarial input texts. However, existing studies have l… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: Accepted to International Conference on Software Testing, Verification and Validation (ICST) 2024 / Key words: Large-language models, Metamorphic testing, Quality evaluation, Text perturbations

  13. arXiv:2310.06300  [pdf, other

    cs.CR cs.SE

    An Empirically Grounded Reference Architecture for Software Supply Chain Metadata Management

    Authors: Nguyen Khoi Tran, Samodha Pallewatta, M. Ali Babar

    Abstract: With the rapid rise in Software Supply Chain (SSC) attacks, organisations need thorough and trustworthy visibility over the entire SSC of their software inventory to detect risks early and identify compromised assets rapidly in the event of an SSC attack. One way to achieve such visibility is through SSC metadata, machine-readable and authenticated documents describing an artefact's lifecycle. Ado… ▽ More

    Submitted 8 June, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Accepted for full paper presentation at EASE 2024 conference

  14. arXiv:2310.00635  [pdf, other

    cs.NI

    Reinforcement Learning Based Neighbour Selection for VANET with Adaptive Trust Management

    Authors: Orvila Sarker, Hong Shen, M. Ali Babar

    Abstract: Successful information propagation from source to destination in Vehicular Adhoc Network (VANET) can be hampered by the presence of neighbouring attacker nodes causing unwanted packet drop**. Potential attackers change their behaviour over time and remain undetected due to the ad-hoc nature of VANET. Capturing the dynamic attacker behaviour and updating the corresponding neighbourhood informatio… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

    Comments: This article is accepted at the 22nd IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) 2023

  15. arXiv:2308.11862  [pdf, other

    cs.CR cs.SE

    Empirical Analysis of Software Vulnerabilities Causing Timing Side Channels

    Authors: M. Mehdi Kholoosi, M. Ali Babar, Cemal Yilmaz

    Abstract: Timing attacks are considered one of the most damaging side-channel attacks. These attacks exploit timing fluctuations caused by certain operations to disclose confidential information to an attacker. For instance, in asymmetric encryption, operations such as multiplication and division can cause time-varying execution times that can be ill-treated to obtain an encryption key. Whilst several effor… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

  16. arXiv:2307.04458  [pdf, other

    cs.SE

    Analyzing the Evolution of Inter-package Dependencies in Operating Systems: A Case Study of Ubuntu

    Authors: Victor Prokhorenko, Chadni Islam, Muhammad Ali Babar

    Abstract: An Operating System (OS) combines multiple interdependent software packages, which usually have their own independently developed architectures. When a multitude of independent packages are placed together in an OS, an implicit inter-package architecture is formed. For an evolutionary effort, designers/developers of OS can greatly benefit from fully understanding the system-wide dependency focused… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: This paper is accepted for publication in the 17th international conference on Software Architecture

  17. arXiv:2307.01225  [pdf, other

    cs.CL cs.AI cs.LG

    Interpretability and Transparency-Driven Detection and Transformation of Textual Adversarial Examples (IT-DT)

    Authors: Bushra Sabir, M. Ali Babar, Sharif Abuadbba

    Abstract: Transformer-based text classifiers like BERT, Roberta, T5, and GPT-3 have shown impressive performance in NLP. However, their vulnerability to adversarial examples poses a security risk. Existing defense methods lack interpretability, making it hard to understand adversarial classifications and identify model vulnerabilities. To address this, we propose the Interpretability and Transparency-Driven… ▽ More

    Submitted 2 July, 2023; originally announced July 2023.

  18. arXiv:2306.08869  [pdf, other

    cs.CR cs.SE

    Detecting Misuse of Security APIs: A Systematic Review

    Authors: Zahra Mousavi, Chadni Islam, M. Ali Babar, Alsharif Abuadbba, Kristen Moore

    Abstract: Security Application Programming Interfaces (APIs) are crucial for ensuring software security. However, their misuse introduces vulnerabilities, potentially leading to severe data breaches and substantial financial loss. Complex API design, inadequate documentation, and insufficient security training often lead to unintentional misuse by developers. The software security community has devised and… ▽ More

    Submitted 25 June, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

  19. arXiv:2306.06600  [pdf, other

    cs.CY

    Enabling Spatial Digital Twins: Technologies, Challenges, and Future Research Directions

    Authors: Mohammed Eunus Ali, Muhammad Aamir Cheema, Tanzima Hashem, Anwaar Ulhaq, Muhammad Ali Babar

    Abstract: A Digital Twin (DT) is a virtual replica of a physical object or system, created to monitor, analyze, and optimize its behavior and characteristics. A Spatial Digital Twin (SDT) is a specific type of digital twin that emphasizes the geospatial aspects of the physical entity, incorporating precise location and dimensional attributes for a comprehensive understanding within its spatial environment.… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

    Comments: 26 pages, 2 figures

  20. arXiv:2305.12736   

    cs.SE

    Mitigating ML Model Decay in Continuous Integration with Data Drift Detection: An Empirical Study

    Authors: Ali Kazemi Arani, Triet Huynh Minh Le, Mansooreh Zahedi, Muhammad Ali Babar

    Abstract: Background: Machine Learning (ML) methods are being increasingly used for automating different activities, e.g., Test Case Prioritization (TCP), of Continuous Integration (CI). However, ML models need frequent retraining as a result of changes in the CI environment, more commonly known as data drift. Also, continuously retraining ML models consume a lot of time and effort. Hence, there is an urgen… ▽ More

    Submitted 17 July, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: This paper got a rejection and we need to address the comments and upload the new version with new results

  21. arXiv:2305.12695   

    cs.SE cs.LG

    Systematic Literature Review on Application of Machine Learning in Continuous Integration

    Authors: Ali Kazemi Arani, Triet Huynh Minh Le, Mansooreh Zahedi, Muhammad Ali Babar

    Abstract: This research conducted a systematic review of the literature on machine learning (ML)-based methods in the context of Continuous Integration (CI) over the past 22 years. The study aimed to identify and describe the techniques used in ML-based solutions for CI and analyzed various aspects such as data engineering, feature engineering, hyper-parameter tuning, ML models, evaluation methods, and metr… ▽ More

    Submitted 17 July, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: This paper got a rejection and we need to address the comments and upload the new version with new results

  22. arXiv:2305.11657  [pdf, other

    cs.GT

    Cost Sharing Public Project with Minimum Release Delay

    Authors: Mingyu Guo, Diksha Goel, Guanhua Wang, Yong Yang, Muhammad Ali Babar

    Abstract: We study the excludable public project model where the decision is binary (build or not build). In a classic excludable and binary public project model, an agent either consumes the project in its whole or is completely excluded. We study a setting where the mechanism can set different project release time for different agents, in the sense that high-paying agents can consume the project earlier t… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2204.07315

  23. arXiv:2304.02829  [pdf, other

    cs.SE cs.LG

    SoK: Machine Learning for Continuous Integration

    Authors: Ali Kazemi Arani, Mansooreh Zahedi, Triet Huynh Minh Le, Muhammad Ali Babar

    Abstract: Continuous Integration (CI) has become a well-established software development practice for automatically and continuously integrating code changes during software development. An increasing number of Machine Learning (ML) based approaches for automation of CI phases are being reported in the literature. It is timely and relevant to provide a Systemization of Knowledge (SoK) of ML-based approaches… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

    Comments: 6 pages, 2 figures, accepted in the ICSE'23 Workshop on Cloud Intelligence / AIOps

  24. arXiv:2301.05456  [pdf, other

    cs.SE

    Data Quality for Software Vulnerability Datasets

    Authors: Roland Croft, M. Ali Babar, Mehdi Kholoosi

    Abstract: The use of learning-based techniques to achieve automated software vulnerability detection has been of longstanding interest within the software security domain. These data-driven solutions are enabled by large software vulnerability datasets used for training and benchmarking. However, we observe that the quality of the data powering these solutions is currently ill-considered, hindering the reli… ▽ More

    Submitted 13 January, 2023; originally announced January 2023.

    Comments: Accepted for publication in the ICSE 23 Technical Track

  25. Privacy Engineering in the Wild: Understanding the Practitioners' Mindset, Organisational Aspects, and Current Practices

    Authors: Leonardo Horn Iwaya, Muhammad Ali Babar, Awais Rashid

    Abstract: Privacy engineering, as an emerging field of research and practice, comprises the technical capabilities and management processes needed to implement, deploy, and operate privacy features and controls in working systems. For that, software practitioners and other stakeholders in software companies need to work cooperatively toward building privacy-preserving businesses and engineering solutions. S… ▽ More

    Submitted 30 June, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: 26 pages, 8 figures

  26. arXiv:2211.07585  [pdf

    cs.CY cs.SE

    An Empirical Study on Secure Usage of Mobile Health Apps: The Attack Simulation Approach

    Authors: Bakheet Aljedaani, Aakash Ahmad, Mansooreh Zahedi, M. Ali Babar

    Abstract: Mobile applications, mobile apps for short, have proven their usefulness in enhancing service provisioning across a multitude of domains that range from smart healthcare, to mobile commerce, and areas of context sensitive computing. In recent years, a number of empirically grounded, survey-based studies have been conducted to investigate secure development and usage of mHealth apps. However, such… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

  27. arXiv:2211.06953  [pdf, other

    cs.SE

    Collaborative Application Security Testing for DevSecOps: An Empirical Analysis of Challenges, Best Practices and Tool Support

    Authors: Roshan Namal Rajapakse, Mansooreh Zahedi, Muhammad Ali Babar

    Abstract: DevSecOps is a software development paradigm that places a high emphasis on the culture of collaboration between developers (Dev), security (Sec) and operations (Ops) teams to deliver secure software continuously and rapidly. Adopting this paradigm effectively, therefore, requires an understanding of the challenges, best practices and available solutions for collaboration among these functional te… ▽ More

    Submitted 25 November, 2022; v1 submitted 13 November, 2022; originally announced November 2022.

    Comments: Submitted to the Empirical Software Engineering journal_v2

  28. arXiv:2210.06679  [pdf, other

    cs.DC

    A Survey on UAV-enabled Edge Computing: Resource Management Perspective

    Authors: Xiaoyu Xia, Sheik Mohammad Mostakim Fattah, Muhammad Ali Babar

    Abstract: Edge computing facilitates low-latency services at the network's edge by distributing computation, communication, and storage resources within the geographic proximity of mobile and Internet-of-Things (IoT) devices. The recent advancement in Unmanned Aerial Vehicles (UAVs) technologies has opened new opportunities for edge computing in military operations, disaster response, or remote areas where… ▽ More

    Submitted 26 September, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: 36 pages, Accepted to ACM CSUR

  29. arXiv:2209.09487  [pdf, other

    cs.DB

    Design and Implementation of Fragmented Clouds for Evaluation of Distributed Databases

    Authors: Yaser Mansouri, Faheem Ullah, Shagun Dhingra, M. Ali Babar

    Abstract: In this paper, we present a Fragmented Hybrid Cloud (FHC) that provides a unified view of multiple geographically distributed private cloud datacenters. FHC leverages a fragmented usage model in which outsourcing is bi-directional across private clouds that can be hosted by static and mobile entities. The mobility aspect of private cloud nodes has important impact on the FHC performance in terms o… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

  30. arXiv:2209.07869  [pdf, other

    cs.SE cs.LG

    LogGD:Detecting Anomalies from System Logs by Graph Neural Networks

    Authors: Yongzheng Xie, Hongyu Zhang, Muhammad Ali Babar

    Abstract: Log analysis is one of the main techniques engineers use to troubleshoot faults of large-scale software systems. During the past decades, many log analysis approaches have been proposed to detect system anomalies reflected by logs. They usually take log event counts or sequential log events as inputs and utilize machine learning algorithms including deep learning models to detect system anomalies.… ▽ More

    Submitted 16 September, 2022; originally announced September 2022.

    Comments: 12 pages, 12 figures

  31. An Empirical Study of Automation in Software Security Patch Management

    Authors: Nesara Dissanayake, Asangi Jayatilaka, Mansooreh Zahedi, Muhammad Ali Babar

    Abstract: Several studies have shown that automated support for different activities of the security patch management process has great potential for reducing delays in installing security patches. However, it is also important to understand how automation is used in practice, its limitations in meeting real-world needs and what practitioners really need, an area that has not been empirically investigated i… ▽ More

    Submitted 3 September, 2022; originally announced September 2022.

    Comments: 13 pages, 2 figures

  32. arXiv:2206.10110  [pdf, other

    cs.SE

    ProML: A Decentralised Platform for Provenance Management of Machine Learning Software Systems

    Authors: Nguyen Khoi Tran, Bushra Sabir, M. Ali Babar, Nini Cui, Mehran Abolhasan, Justin Lipman

    Abstract: Large-scale Machine Learning (ML) based Software Systems are increasingly developed by distributed teams situated in different trust domains. Insider threats can launch attacks from any domain to compromise ML assets (models and datasets). Therefore, practitioners require information about how and by whom ML assets were developed to assess their quality attributes such as security, safety, and fai… ▽ More

    Submitted 21 June, 2022; originally announced June 2022.

    Comments: Accepted as full paper in ECSA 2022 conference. To be presented

  33. Mod2Dash: A Framework for Model-Driven Dashboards Generation

    Authors: Liuyue Jiang, Nguyen Khoi Tran, M. Ali Babar

    Abstract: The construction of an interactive dashboard involves deciding on what information to present and how to display it and implementing those design decisions to create an operational dashboard. Traditionally, a dashboard's design is implied in the deployed dashboard rather than captured explicitly as a digital artifact, preventing it from being backed up, version-controlled, and shared. Moreover, pr… ▽ More

    Submitted 15 May, 2022; originally announced May 2022.

  34. arXiv:2203.12132  [pdf, other

    cs.SE

    Runtime Software Patching: Taxonomy, Survey and Future Directions

    Authors: Chadni Islam, Victor Prokhorenko, M. Ali Babar

    Abstract: Runtime software patching aims to minimize or eliminate service downtime, user interruptions and potential data losses while deploying a patch. Due to modern software systems' high variance and heterogeneity, no universal solutions are available or proposed to deploy and execute patches at runtime. Existing runtime software patching solutions focus on specific cases, scenarios, programming languag… ▽ More

    Submitted 22 February, 2023; v1 submitted 22 March, 2022; originally announced March 2022.

  35. A Framework for Automating Deployment and Evaluation of Blockchain Network

    Authors: Nguyen Khoi Tran, M. Ali Babar, Andrew Walters

    Abstract: Blockchain network deployment and evaluation have become prevalent due to the demand for private blockchains by enterprises, governments, and edge computing systems. Whilst a blockchain network's deployment and evaluation are driven by its architecture, practitioners still need to learn and carry out many repetitive and error-prone activities to transform architecture into an operational blockchai… ▽ More

    Submitted 24 July, 2022; v1 submitted 20 March, 2022; originally announced March 2022.

    Comments: Published in the Journal of Network and Computer Applications

  36. arXiv:2203.08417  [pdf, other

    cs.SE cs.CR cs.LG

    On the Use of Fine-grained Vulnerable Code Statements for Software Vulnerability Assessment Models

    Authors: Triet H. M. Le, M. Ali Babar

    Abstract: Many studies have developed Machine Learning (ML) approaches to detect Software Vulnerabilities (SVs) in functions and fine-grained code statements that cause such SVs. However, there is little work on leveraging such detection outputs for data-driven SV assessment to give information about exploitability, impact, and severity of SVs. The information is important to understand SVs and prioritize t… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: Accepted as a full paper in the technical track at the 19th International Conference on Mining Software Repositories (MSR) 2022

  37. arXiv:2203.07603  [pdf, other

    cs.CR cs.SE

    SmartValidator: A Framework for Automatic Identification and Classification of Cyber Threat Data

    Authors: Chadni Islam, M. Ali Babar, Roland Croft, Helge Janicke

    Abstract: A wide variety of Cyber Threat Information (CTI) is used by Security Operation Centres (SOCs) to perform validation of security incidents and alerts. Security experts manually define different types of rules and scripts based on CTI to perform validation tasks. These rules and scripts need to be updated continuously due to evolving threats, changing SOCs' requirements and dynamic nature of CTI. Th… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

  38. arXiv:2203.05181  [pdf, other

    cs.CR cs.SE

    LineVD: Statement-level Vulnerability Detection using Graph Neural Networks

    Authors: David Hin, Andrey Kan, Huaming Chen, M. Ali Babar

    Abstract: Current machine-learning based software vulnerability detection methods are primarily conducted at the function-level. However, a key limitation of these methods is that they do not indicate the specific lines of code contributing to vulnerabilities. This limits the ability of developers to efficiently inspect and interpret the predictions from a learnt model, which is crucial for integrating mach… ▽ More

    Submitted 25 March, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

    Comments: Accepted in the 19th International Conference on Mining Software Repositories Technical Papers

  39. arXiv:2203.04468  [pdf, other

    cs.SE

    Noisy Label Learning for Security Defects

    Authors: Roland Croft, M. Ali Babar, Huaming Chen

    Abstract: Data-driven software engineering processes, such as vulnerability prediction heavily rely on the quality of the data used. In this paper, we observe that it is infeasible to obtain a noise-free security defect dataset in practice. Despite the vulnerable class, the non-vulnerable modules are difficult to be verified and determined as truly exploit free given the limited manual efforts available. It… ▽ More

    Submitted 1 April, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: Accepted at MSR 22

  40. arXiv:2202.09016  [pdf, other

    cs.SE cs.HC

    Why, How and Where of Delays in Software Security Patch Management: An Empirical Investigation in the Healthcare Sector

    Authors: Nesara Dissanayake, Mansooreh Zahedi, Asangi Jayatilaka, M. Ali Babar

    Abstract: Numerous security attacks that resulted in devastating consequences can be traced back to a delay in applying a security patch. Despite the criticality of timely patch application, not much is known about why and how delays occur when applying security patches in practice, and how the delays can be mitigated. Based on longitudinal data collected from 132 delayed patching tasks over a period of fou… ▽ More

    Submitted 3 September, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

    Comments: 28 pages, 10 figures

  41. On the Privacy of Mental Health Apps: An Empirical Investigation and its Implications for Apps Development

    Authors: Leonardo Horn Iwaya, M. Ali Babar, Awais Rashid, Chamila Wijayarathna

    Abstract: An increasing number of mental health services are offered through mobile systems, a paradigm called mHealth. Although there is an unprecedented growth in the adoption of mHealth systems, partly due to the COVID-19 pandemic, concerns about data privacy risks due to security breaches are also increasing. Whilst some studies have analyzed mHealth apps from different angles, including security, there… ▽ More

    Submitted 22 January, 2022; originally announced January 2022.

    Comments: 40 pages, 13 figures

  42. arXiv:2201.08066  [pdf, other

    cs.SE

    NLP Methods in Host-based Intrusion Detection Systems: A Systematic Review and Future Directions

    Authors: Zarrin Tasnim Sworna, Zahra Mousavi, Muhammad Ali Babar

    Abstract: Host based Intrusion Detection System (HIDS) is an effective last line of defense for defending against cyber security attacks after perimeter defenses (e.g., Network based Intrusion Detection System and Firewall) have failed or been bypassed. HIDS is widely adopted in the industry as HIDS is ranked among the top two most used security tools by Security Operation Centers (SOC) of organizations. Al… ▽ More

    Submitted 19 November, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

  43. arXiv:2201.07959  [pdf, other

    cs.CR cs.SE

    APIRO: A Framework for Automated Security Tools API Recommendation

    Authors: Zarrin Tasnim Sworna, Chadni Islam, Muhammad Ali Babar

    Abstract: Security Orchestration, Automation, and Response (SOAR) platforms integrate and orchestrate a wide variety of security tools to accelerate the operational activities of Security Operation Center (SOC). Integration of security tools in a SOAR platform is mostly done manually using APIs, plugins, and scripts. SOC teams need to navigate through API calls of different security tools to find a suitable… ▽ More

    Submitted 19 January, 2022; originally announced January 2022.

  44. arXiv:2201.04736  [pdf, other

    cs.CR cs.SE

    Security for Machine Learning-based Software Systems: a survey of threats, practices and challenges

    Authors: Huaming Chen, M. Ali Babar

    Abstract: The rapid development of Machine Learning (ML) has demonstrated superior performance in many areas, such as computer vision, video and speech recognition. It has now been increasingly leveraged in software systems to automate the core tasks. However, how to securely develop the machine learning-based modern software systems (MLBSS) remains a big challenge, for which the insufficient consideration… ▽ More

    Submitted 17 December, 2023; v1 submitted 12 January, 2022; originally announced January 2022.

    Comments: Accepted at ACM Computing Surveys

  45. arXiv:2201.01972  [pdf, other

    cs.DC

    A Framework for Energy-aware Evaluation of Distributed Data Processing Platforms in Edge-Cloud Environment

    Authors: Faheem Ullah, Imaduddin Mohammed, M. Ali Babar

    Abstract: Distributed data processing platforms (e.g., Hadoop, Spark, and Flink) are widely used to distribute the storage and processing of data among computing nodes of a cloud. The centralization of cloud resources has given birth to edge computing, which enables the processing of data closer to the data source instead of sending it to the cloud. However, due to resource constraints such as energy limita… ▽ More

    Submitted 6 January, 2022; originally announced January 2022.

  46. arXiv:2201.01948  [pdf, other

    cs.DC

    Evaluation of Distributed Data Processing Frameworks in Hybrid Clouds

    Authors: Faheem Ullah, Shagun Dhingra, Xiaoyu Xia, M. Ali Babar

    Abstract: Distributed data processing frameworks (e.g., Hadoop, Spark, and Flink) are widely used to distribute data among computing nodes of a cloud. Recently, there have been increasing efforts aimed at evaluating the performance of distributed data processing frameworks hosted in private and public clouds. However, there is a paucity of research on evaluating the performance of these frameworks hosted in… ▽ More

    Submitted 6 January, 2022; originally announced January 2022.

  47. arXiv:2112.12597  [pdf, other

    cs.CR cs.SE

    Well Begun is Half Done: An Empirical Study of Exploitability & Impact of Base-Image Vulnerabilities

    Authors: Mubin Ul Haque, M. Ali Babar

    Abstract: Container technology, (e.g., Docker) is being widely adopted for deploying software infrastructures or applications in the form of container images. Security vulnerabilities in the container images are a primary concern for develo** containerized software. Exploitation of the vulnerabilities could result in disastrous impact, such as loss of confidentiality, integrity, and availability of contai… ▽ More

    Submitted 21 December, 2021; originally announced December 2021.

  48. arXiv:2112.12595  [pdf, other

    cs.CR cs.SE

    KGSecConfig: A Knowledge Graph Based Approach for Secured Container Orchestrator Configuration

    Authors: Mubin Ul Haque, M. Mehdi Kholoosi, M. Ali Babar

    Abstract: Container Orchestrator (CO) is a vital technology for managing clusters of containers, which may form a virtualized infrastructure for develo** and operating software systems. Like any other software system, securing CO is critical, but can be quite challenging task due to large number of configurable options. Manual configuration is not only knowledge intensive and time consuming, but also is e… ▽ More

    Submitted 21 December, 2021; originally announced December 2021.

  49. arXiv:2112.10356  [pdf, other

    cs.SE cs.CR

    An Investigation into Inconsistency of Software Vulnerability Severity across Data Sources

    Authors: Roland Croft, M. Ali Babar, Li Li

    Abstract: Software Vulnerability (SV) severity assessment is a vital task for informing SV remediation and triage. Ranking of SV severity scores is often used to advise prioritization of patching efforts. However, severity assessment is a difficult and subjective manual task that relies on expertise, knowledge, and standardized reporting schemes. Consequently, different data sources that perform independent… ▽ More

    Submitted 16 January, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: Accepted for publication in SANER 22

  50. arXiv:2112.10354  [pdf, other

    cs.CR

    Systematic Literature Review on Cyber Situational Awareness Visualizations

    Authors: Liuyue Jiang, Asangi Jayatilaka, Mehwish Nasim, Marthie Grobler, Mansooreh Zahedi, M. Ali Babar

    Abstract: The dynamics of cyber threats are increasingly complex, making it more challenging than ever for organizations to obtain in-depth insights into their cyber security status. Therefore, organizations rely on Cyber Situational Awareness (CSA) to support them in better understanding the threats and associated impacts of cyber events. Due to the heterogeneity and complexity of cyber security data, ofte… ▽ More

    Submitted 24 May, 2022; v1 submitted 20 December, 2021; originally announced December 2021.