-
Lessons from the Use of Natural Language Inference (NLI) in Requirements Engineering Tasks
Authors:
Mohamad Fazelnia,
Viktoria Koscinski,
Spencer Herzog,
Mehdi Mirakhorli
Abstract:
We investigate the use of Natural Language Inference (NLI) in automating requirements engineering tasks. In particular, we focus on three tasks: requirements classification, identification of requirements specification defects, and detection of conflicts in stakeholders' requirements. While previous research has demonstrated significant benefit in using NLI as a universal method for a broad spectr…
▽ More
We investigate the use of Natural Language Inference (NLI) in automating requirements engineering tasks. In particular, we focus on three tasks: requirements classification, identification of requirements specification defects, and detection of conflicts in stakeholders' requirements. While previous research has demonstrated significant benefit in using NLI as a universal method for a broad spectrum of natural language processing tasks, these advantages have not been investigated within the context of software requirements engineering. Therefore, we design experiments to evaluate the use of NLI in requirements analysis. We compare the performance of NLI with a spectrum of approaches, including prompt-based models, conventional transfer learning, Large Language Models (LLMs)-powered chatbot models, and probabilistic models. Through experiments conducted under various learning settings including conventional learning and zero-shot, we demonstrate conclusively that our NLI method surpasses classical NLP methods as well as other LLMs-based and chatbot models in the analysis of requirements specifications. Additionally, we share lessons learned characterizing the learning settings that make NLI a suitable approach for automating requirements engineering tasks.
△ Less
Submitted 24 April, 2024;
originally announced May 2024.
-
IPSynth: Interprocedural Program Synthesis for Software Security Implementation
Authors:
Ali Shokri,
Ibrahim Jameel Mujhid,
Mehdi Mirakhorli
Abstract:
To implement important quality attributes of software such as architectural security tactics, developers incorporate API of software frameworks, as building blocks, to avoid re-inventing the wheel and improve their productivity. However, this is a challenging and error-prone task, especially for novice programmers. Despite the advances in the field of API-based program synthesis, the state-of-the-…
▽ More
To implement important quality attributes of software such as architectural security tactics, developers incorporate API of software frameworks, as building blocks, to avoid re-inventing the wheel and improve their productivity. However, this is a challenging and error-prone task, especially for novice programmers. Despite the advances in the field of API-based program synthesis, the state-of-the-art suffers from a twofold shortcoming when it comes to architectural tactic implementation tasks. First, the specification of the desired tactic must be explicitly expressed, which is out of the knowledge of such programmers. Second, these approaches synthesize a block of code and leave the task of breaking it down into smaller pieces, adding each piece to the proper location in the code, and establishing correct dependencies between each piece and its surrounding environment as well as the other pieces, to the programmer.
To mitigate these challenges, we introduce IPSynth, a novel inter-procedural program synthesis approach that automatically learns the specification of the tactic, synthesizes the tactic as inter-related code snippets, and adds them to an existing code base. We extend our first-place award-winning extended abstract recognized at the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE'21) research competition track. In this paper, we provide the details of the approach, present the results of the experimental evaluation of IPSynth, and analyses and insights for a more comprehensive exploration of the research topic. Moreover, we compare the results of our approach to one of the most powerful code generator tools, ChatGPT. Our results show that our approach can accurately locate corresponding spots in the program, synthesize needed code snippets, add them to the program, and outperform ChatGPT in inter-procedural tactic synthesis tasks.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
A Landscape Study of Open Source and Proprietary Tools for Software Bill of Materials (SBOM)
Authors:
Mehdi Mirakhorli,
Derek Garcia,
Schuyler Dillon,
Kevin Laporte,
Matthew Morrison,
Henry Lu,
Viktoria Koscinski,
Christopher Enoch
Abstract:
Modern software applications heavily rely on diverse third-party components, libraries, and frameworks sourced from various vendors and open source repositories, presenting a complex challenge for securing the software supply chain. To address this complexity, the adoption of a Software Bill of Materials (SBOM) has emerged as a promising solution, offering a centralized repository that inventories…
▽ More
Modern software applications heavily rely on diverse third-party components, libraries, and frameworks sourced from various vendors and open source repositories, presenting a complex challenge for securing the software supply chain. To address this complexity, the adoption of a Software Bill of Materials (SBOM) has emerged as a promising solution, offering a centralized repository that inventories all third-party components and dependencies used in an application. Recent supply chain breaches, exemplified by the SolarWinds attack, underscore the urgent need to enhance software security and mitigate vulnerability risks, with SBOMs playing a pivotal role in this endeavor by revealing potential vulnerabilities, outdated components, and unsupported elements. This research paper conducts an extensive empirical analysis to assess the current landscape of open-source and proprietary tools related to SBOM. We investigate emerging use cases in software supply chain security and identify gaps in SBOM technologies. Our analysis encompasses 84 tools, providing a snapshot of the current market and highlighting areas for improvement.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Sound Call Graph Construction for Java Object Deserialization
Authors:
Joanna C. S. Santos,
Mehdi Mirakhorli,
Ali Shokri
Abstract:
Object serialization and deserialization is widely used for storing and preserving objects in files, memory, or database as well as for transporting them across machines, enabling remote interaction among processes and many more. This mechanism relies on reflection, a dynamic language that introduces serious challenges for static analyses. Current state-of-the-art call graph construction algorithm…
▽ More
Object serialization and deserialization is widely used for storing and preserving objects in files, memory, or database as well as for transporting them across machines, enabling remote interaction among processes and many more. This mechanism relies on reflection, a dynamic language that introduces serious challenges for static analyses. Current state-of-the-art call graph construction algorithms does not fully support object serialization/deserialization, i.e., they are unable to uncover the callback methods that are invoked when objects are serialized and deserialized. Since call graphs are a core data structure for multiple type of analysis (e.g., vulnerability detection), an appropriate analysis cannot be performed since the call graph does not capture hidden (vulnerable) paths that occur via callback methods. In this paper, we present Seneca, an approach for handling serialization with improved soundness in the context of call graph construction. Our approach relies on taint analysis and API modeling to construct sound call graphs. We evaluated our approach with respect to soundness, precision, performance, and usefulness in detecting untrusted object deserialization vulnerabilities. Our results show that Seneca can create sound call graphs with respect to serialization features. The resulting call graphs do not incur significant overhead and were shown to be useful for performing identification of vulnerable paths caused by untrusted object deserialization.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
A Novel Approach to Identify Security Controls in Source Code
Authors:
Ahmet Okutan,
Ali Shokri,
Viktoria Koscinski,
Mohamad Fazelinia,
Mehdi Mirakhorli
Abstract:
Secure by Design has become the mainstream development approach ensuring that software systems are not vulnerable to cyberattacks. Architectural security controls need to be carefully monitored over the software development life cycle to avoid critical design flaws. Unfortunately, functional requirements usually get in the way of the security features, and the development team may not correctly ad…
▽ More
Secure by Design has become the mainstream development approach ensuring that software systems are not vulnerable to cyberattacks. Architectural security controls need to be carefully monitored over the software development life cycle to avoid critical design flaws. Unfortunately, functional requirements usually get in the way of the security features, and the development team may not correctly address critical security requirements. Identifying tactic-related code pieces in a software project enables an efficient review of the security controls' implementation as well as a resilient software architecture. This paper enumerates a comprehensive list of commonly used security controls and creates a dataset for each one of them by pulling related and unrelated code snippets from the open API of the StackOverflow question and answer platform. It uses the state-of-the-art NLP technique Bidirectional Encoder Representations from Transformers (BERT) and the Tactic Detector from our prior work to show that code pieces that implement security controls could be identified with high confidence. The results show that our model trained on tactic-related and unrelated code snippets derived from StackOverflow is able to identify tactic-related code pieces with F-Measure values above 0.9.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
Supporting AI/ML Security Workers through an Adversarial Techniques, Tools, and Common Knowledge (AI/ML ATT&CK) Framework
Authors:
Mohamad Fazelnia,
Ahmet Okutan,
Mehdi Mirakhorli
Abstract:
This paper focuses on supporting AI/ML Security Workers -- professionals involved in the development and deployment of secure AI-enabled software systems. It presents AI/ML Adversarial Techniques, Tools, and Common Knowledge (AI/ML ATT&CK) framework to enable AI/ML Security Workers intuitively to explore offensive and defensive tactics.
This paper focuses on supporting AI/ML Security Workers -- professionals involved in the development and deployment of secure AI-enabled software systems. It presents AI/ML Adversarial Techniques, Tools, and Common Knowledge (AI/ML ATT&CK) framework to enable AI/ML Security Workers intuitively to explore offensive and defensive tactics.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
Attacks, Defenses, And Tools: A Framework To Facilitate Robust AI/ML Systems
Authors:
Mohamad Fazelnia,
Igor Khokhlov,
Mehdi Mirakhorli
Abstract:
Software systems are increasingly relying on Artificial Intelligence (AI) and Machine Learning (ML) components. The emerging popularity of AI techniques in various application domains attracts malicious actors and adversaries. Therefore, the developers of AI-enabled software systems need to take into account various novel cyber-attacks and vulnerabilities that these systems may be susceptible to.…
▽ More
Software systems are increasingly relying on Artificial Intelligence (AI) and Machine Learning (ML) components. The emerging popularity of AI techniques in various application domains attracts malicious actors and adversaries. Therefore, the developers of AI-enabled software systems need to take into account various novel cyber-attacks and vulnerabilities that these systems may be susceptible to. This paper presents a framework to characterize attacks and weaknesses associated with AI-enabled systems and provide mitigation techniques and defense strategies. This framework aims to support software designers in taking proactive measures in develo** AI-enabled software, understanding the attack surface of such systems, and develo** products that are resilient to various emerging attacks associated with ML. The developed framework covers a broad spectrum of attacks, mitigation techniques, and defensive and offensive tools. In this paper, we demonstrate the framework architecture and its major components, describe their attributes, and discuss the long-term goals of this research.
△ Less
Submitted 18 February, 2022;
originally announced February 2022.
-
A Grounded Theory Based Approach to Characterize Software Attack Surfaces
Authors:
Sara Moshtari,
Ahmet Okutan,
Mehdi Mirakhorli
Abstract:
The notion of Attack Surface refers to the critical points on the boundary of a software system which are accessible from outside or contain valuable content for attackers. The ability to identify attack surface components of software system has a significant role in effectiveness of vulnerability analysis approaches. Most prior works focus on vulnerability techniques that use an approximation of…
▽ More
The notion of Attack Surface refers to the critical points on the boundary of a software system which are accessible from outside or contain valuable content for attackers. The ability to identify attack surface components of software system has a significant role in effectiveness of vulnerability analysis approaches. Most prior works focus on vulnerability techniques that use an approximation of attack surfaces and there has not been many attempt to create a comprehensive list of attack surface components. Although limited number of studies have focused on attack surface analysis, they defined attack surface components based on project specific hypotheses to evaluate security risk of specific types of software applications. In this study, we leverage a qualitative analysis approach to empirically identify an extensive list of attack surface components. To this end, we conduct a Grounded Theory (GT) analysis on 1444 previously published vulnerability reports and weaknesses with a team of three software developers and security experts. We extract vulnerability information from two publicly available repositories: 1) Common Vulnerabilities and Exposures, and 2) Common Weakness Enumeration. We ask three key questions: where the attacks come from, what they target, and how they emerge, and to help answer these questions we define three core categories for attack surface components: Entry points, Targets, and Mechanisms. We extract attack surface concepts related to each category from collected vulnerability information using the GT analysis and provide a comprehensive categorization that represents attack surface components of software systems from various perspectives. The comparison of the proposed attack surface model with the literature shows in the best case previous works cover only 50% of the attack surface components at network level and only 6.7% of the components at code level.
△ Less
Submitted 30 March, 2022; v1 submitted 2 December, 2021;
originally announced December 2021.
-
DepRes: A Tool for Resolving Fully Qualified Names and Their Dependencies
Authors:
Ali Shokri,
Mehdi Mirakhorli
Abstract:
Reusing code snippets shared by other programmers on Q&A forums (e.g., StackOverflow) is a common practice followed by software developers. However, lack of sufficient information about the fully qualified name (FQN) of identifiers in borrowed code snippets, results in serious compile errors. Programmers either have to manually search for the correct FQN of identifiers which is a tedious and error…
▽ More
Reusing code snippets shared by other programmers on Q&A forums (e.g., StackOverflow) is a common practice followed by software developers. However, lack of sufficient information about the fully qualified name (FQN) of identifiers in borrowed code snippets, results in serious compile errors. Programmers either have to manually search for the correct FQN of identifiers which is a tedious and error-prone process, or use tools developed to automatically identify correct FQNs. Despite the efforts made by researchers to automatically identify FQNs in code snippets, the current approaches suffer from low accuracy when it comes to practice. Moreover, while these tools focus on resolving the FQN for an identifier in a code snippet, they leave the challenge of finding the correct third-party library (i.e., dependency) implementing that FQN unresolved. Using an incorrect dependency or incorrect version of a dependency might lead to a semantic error which is not detectable by compilers. Therefore, it can result in serious damages in the run-time.
In this paper, we introduce DepRes, a tool that leverages a sketch-based approach to resolve FQNs in java-based code snippets and recommend the correct dependency for each FQN. The source code, documentation, as well as a demo video of DepRes tool is available from its code repository at https://github.com/SoftwareDesignLab/DepRes-Tool.
△ Less
Submitted 2 August, 2021;
originally announced August 2021.
-
ArCode: A Tool for Supporting Comprehension andImplementation of Architectural Concerns
Authors:
Ali Shokri,
Mehdi Mirakhorli
Abstract:
Integrated development environments (IDE) play an important role in supporting developers during program comprehension and completion. Many of these supportive features focus on low-level programming and debugging activities. Unfortunately, there is less support in understanding and implementing architectural concerns in the form of patterns, tactics and/or other concerns. In this paper we present…
▽ More
Integrated development environments (IDE) play an important role in supporting developers during program comprehension and completion. Many of these supportive features focus on low-level programming and debugging activities. Unfortunately, there is less support in understanding and implementing architectural concerns in the form of patterns, tactics and/or other concerns. In this paper we present ArCode, a tool designed as a plugin for a popular IDE, IntelliJ IDEA. ArCode is able to learn correct ways of using frameworks' API to implement architectural concerns such as Authentication and Authorization. Analyzing the programmer's code, this tool is able to find deviations from correct implementation and provide fix recommendations alongside with graphical demonstrations to better communicate the recommendations with the developers. We showcase how programmers can benefit from ArCode by providing an API misuse detection and API recommendation scenario for a famous Java framework, Java Authentication and Authorization (JAAS) security framework.
△ Less
Submitted 11 March, 2021;
originally announced March 2021.
-
ArCode: Facilitating the Use of Application Frameworks to Implement Tactics and Patterns
Authors:
Ali Shokri,
Joanna C. S. Santos,
Mehdi Mirakhorli
Abstract:
Software designers and developers are increasingly relying on application frameworks as first-class design concepts. They instantiate the services that frameworks provide to implement various architectural tactics and patterns. One of the challenges in using frameworks for such tasks is the difficulty of learning and correctly using frameworks' APIs. This paper introduces a learning-based approach…
▽ More
Software designers and developers are increasingly relying on application frameworks as first-class design concepts. They instantiate the services that frameworks provide to implement various architectural tactics and patterns. One of the challenges in using frameworks for such tasks is the difficulty of learning and correctly using frameworks' APIs. This paper introduces a learning-based approach called ArCode to help novice programmers correctly use frameworks' APIs to implement architectural tactics and patterns. ArCode has several novel components: a graph-based approach for learning specification of a framework from a limited number of training software, a program analysis algorithm to eliminate erroneous training data, and a recommender module to help programmers use APIs correctly and identify API misuses in their programs. We evaluated our technique across two popular frameworks: JAAS security framework used for authentication and authorization tactic and Java RMI framework used to enable remote method invocation between client and server and other object-oriented patterns. Our evaluation results show (i) the feasibility of using ArCode to learn the specification of a framework; (ii) ArCode generates accurate recommendations for finding the next API call to implement an architectural tactic/pattern based on the context of the programmer's code; (iii) it accurately detects API misuses in the code that implements a tactic/pattern and provides fix recommendations. Comparison of ArCode with two prior techniques (MAPO and GrouMiner) on API recommendation and misuse detection shows that ArCode outperforms these approaches.
△ Less
Submitted 16 February, 2021;
originally announced February 2021.
-
Did You Remember to Test Your Tokens?
Authors:
Danielle Gonzalez,
Michael Rath,
Mehdi Mirakhorli
Abstract:
Authentication is a critical security feature for confirming the identity of a system's users, typically implemented with help from frameworks like Spring Security. It is a complex feature which should be robustly tested at all stages of development. Unit testing is an effective technique for fine-grained verification of feature behaviors that is not widely-used to test authentication. Part of the…
▽ More
Authentication is a critical security feature for confirming the identity of a system's users, typically implemented with help from frameworks like Spring Security. It is a complex feature which should be robustly tested at all stages of development. Unit testing is an effective technique for fine-grained verification of feature behaviors that is not widely-used to test authentication. Part of the problem is that resources to help developers unit test security features are limited. Most security testing guides recommend test cases in a "black box" or penetration testing perspective. These resources are not easily applicable to developers writing new unit tests, or who want a security-focused perspective on coverage.
In this paper, we address these issues by applying a grounded theory-based approach to identify common (unit) test cases for token authentication through analysis of 481 JUnit tests exercising Spring Security-based authentication implementations from 53 open source Java projects. The outcome of this study is a developer-friendly unit testing guide organized as a catalog of 53 test cases for token authentication, representing unique combinations of 17 scenarios, 40 conditions, and 30 expected outcomes learned from the data set in our analysis. We supplement the test guide with common test smells to avoid. To verify the accuracy and usefulness of our testing guide, we sought feedback from selected developers, some of whom authored unit tests in our dataset.
△ Less
Submitted 25 June, 2020;
originally announced June 2020.
-
Automated Characterization of Software Vulnerabilities
Authors:
Danielle Gonzalez,
Holly Hastings,
Mehdi Mirakhorli
Abstract:
Preventing vulnerability exploits is a critical software maintenance task, and software engineers often rely on Common Vulnerability and Exposure (CVEs) reports for information about vulnerable systems and libraries. These reports include descriptions, disclosure sources, and manually-populated vulnerability characteristics such as root cause from the NIST Vulnerability Description Ontology (VDO).…
▽ More
Preventing vulnerability exploits is a critical software maintenance task, and software engineers often rely on Common Vulnerability and Exposure (CVEs) reports for information about vulnerable systems and libraries. These reports include descriptions, disclosure sources, and manually-populated vulnerability characteristics such as root cause from the NIST Vulnerability Description Ontology (VDO). This information needs to be complete and accurate so stakeholders of affected products can prevent and react to exploits of the reported vulnerabilities. However, characterizing each report requires significant time and expertise which can lead to inaccurate or incomplete reports. This directly impacts stakeholders ability to quickly and correctly maintain their affected systems. In this study, we demonstrate that VDO characteristics can be automatically detected from the textual descriptions included in CVE reports. We evaluated the performance of 6 classification algorithms with a dataset of 365 vulnerability descriptions, each mapped to 1 of 19 characteristics from the VDO. This work demonstrates that it is feasible to train classification techniques to accurately characterize vulnerabilities from their descriptions. All 6 classifiers evaluated produced accurate results, and the Support Vector Machine classifier was the best-performing individual classifier. Automating the vulnerability characterization process is a step towards ensuring stakeholders have the necessary data to effectively maintain their systems.
△ Less
Submitted 30 September, 2019;
originally announced September 2019.
-
Graph-Based Method for Anomaly Prediction in Brain Network
Authors:
Jalal Mirakhorli,
Hamidreza Amindavar,
Mojgan Mirakhorli
Abstract:
Resting-state functional MRI (rs-fMRI) in functional neuroimaging techniques have improved in brain disorders, dysfunction studies via map** the topology of the brain connections, i.e. connectopic map**. Since, there are the slight differences between healthy and unhealthy brain regions and functions, investigation into the complex topology of functional and structural brain networks in human…
▽ More
Resting-state functional MRI (rs-fMRI) in functional neuroimaging techniques have improved in brain disorders, dysfunction studies via map** the topology of the brain connections, i.e. connectopic map**. Since, there are the slight differences between healthy and unhealthy brain regions and functions, investigation into the complex topology of functional and structural brain networks in human is a complicated task with the growth of evaluation criteria. Irregular graph deep learning applications have widely spread to understanding human cognitive functions that are linked to gene expression and related distributed spatial patterns, because the neuronal networks of the brain can hold dynamically a variety of brain solutions with different activity patterns and functional connectivity, these applications might also be involved with both node-centric and graph-centric tasks. In this paper, we performed a novel approach of individual generative model and high order graph analysis for the region of interest recognition areas of the brain which do not have a normal connection during applying certain tasks. Here, we proposed a high order framework of Graph Auto-Encoder (GAE) with a hypersphere distributer for functional data analysis in brain imaging studies that is underlying non-Euclidean structure in the learning of strong non-rigid graphs among large scale data. In addition, we distinguished the possible modes of correlations in abnormal brain connections. Our finding will show the degree of correlation between the affected regions and their simultaneous occurrence over time that can be used to diagnose brain diseases or revealing the ability of the nervous system to modify in brain topology at all angles, brain plasticity, according to input stimuli.
△ Less
Submitted 17 July, 2019; v1 submitted 15 April, 2019;
originally announced April 2019.
-
A Fine-Grained Approach for Automated Conversion of JUnit Assertions to English
Authors:
Danielle Gonzalez,
Suzanne Prentice,
Mehdi Mirakhorli
Abstract:
Converting source or unit test code to English has been shown to improve the maintainability, understandability, and analysis of software and tests. Code summarizers identify important statements in the source/tests and convert them to easily understood English sentences using static analysis and NLP techniques. However, current test summarization approaches handle only a subset of the variation a…
▽ More
Converting source or unit test code to English has been shown to improve the maintainability, understandability, and analysis of software and tests. Code summarizers identify important statements in the source/tests and convert them to easily understood English sentences using static analysis and NLP techniques. However, current test summarization approaches handle only a subset of the variation and customization allowed in the JUnit assert API (a critical component of test cases) which may affect the accuracy of conversions. In this paper, we present our work towards improving JUnit test summarization with a detailed process for converting a total of 45 unique JUnit assertions to English, including 37 previously-unhandled variations of the assertThat method. This process has also been implemented and released as the AssertConvert tool. Initial evaluations have shown that this tool generates English conversions that accurately represent a wide variety of assertion statements which could be used for code summarization or other NLP analyses.
△ Less
Submitted 12 November, 2018;
originally announced November 2018.
-
A Large-Scale Study on the Usage of Testing Patterns that Address Maintainability Attributes (Patterns for Ease of Modification, Diagnoses, and Comprehension)
Authors:
Danielle Gonzalez,
Joanna C. S. Santos,
Andrew Popovich,
Mehdi Mirakhorli,
Mei Nagappan
Abstract:
Test case maintainability is an important concern, especially in open source and distributed development environments where projects typically have high contributor turnover with varying backgrounds and experience, and where code ownership changes often. Similar to design patterns, patterns for unit testing promote maintainability quality attributes such as ease of diagnoses, modifiability, and co…
▽ More
Test case maintainability is an important concern, especially in open source and distributed development environments where projects typically have high contributor turnover with varying backgrounds and experience, and where code ownership changes often. Similar to design patterns, patterns for unit testing promote maintainability quality attributes such as ease of diagnoses, modifiability, and comprehension. In this paper, we report the results of a large-scale study on the usage of four xUnit testing patterns which can be used to satisfy these maintainability attributes. This is a first-of-its-kind study which developed automated techniques to investigate these issues across 82,447 open source projects, and the findings provide more insight into testing practices in open source projects. Our results indicate that only 17% of projects had test cases, and from the 251 testing frameworks we studied, 93 of them were being used. We found 24% of projects with test files implemented patterns that could help with maintainability, while the remaining did not use these patterns. Multiple qualitative analyses indicate that usage of patterns was an ad-hoc decision by individual developers, rather than motivated by the characteristics of the project, and that developers sometimes used alternative techniques to address maintainability concerns.
△ Less
Submitted 26 April, 2017;
originally announced April 2017.