-
System Safety Monitoring of Learned Components Using Temporal Metric Forecasting
Authors:
Sepehr Sharifi,
Andrea Stocco,
Lionel C. Briand
Abstract:
In learning-enabled autonomous systems, safety monitoring of learned components is crucial to ensure their outputs do not lead to system safety violations, given the operational context of the system. However, develo** a safety monitor for practical deployment in real-world applications is challenging. This is due to limited access to internal workings and training data of the learned component.…
▽ More
In learning-enabled autonomous systems, safety monitoring of learned components is crucial to ensure their outputs do not lead to system safety violations, given the operational context of the system. However, develo** a safety monitor for practical deployment in real-world applications is challenging. This is due to limited access to internal workings and training data of the learned component. Furthermore, safety monitors should predict safety violations with low latency, while consuming a reasonable amount of computation.
To address the challenges, we propose a safety monitoring method based on probabilistic time series forecasting. Given the learned component outputs and an operational context, we empirically investigate different Deep Learning (DL)-based probabilistic forecasting to predict the objective measure capturing the satisfaction or violation of a safety requirement (safety metric). We empirically evaluate safety metric and violation prediction accuracy, and inference latency and resource usage of four state-of-the-art models, with varying horizons, using an autonomous aviation case study. Our results suggest that probabilistic forecasting of safety metrics, given learned component outputs and scenarios, is effective for safety monitoring. Furthermore, for the autonomous aviation case study, Temporal Fusion Transformer (TFT) was the most accurate model for predicting imminent safety violations, with acceptable latency and resource consumption.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
AIM: Automated Input Set Minimization for Metamorphic Security Testing
Authors:
Nazanin Bayati Chaleshtari,
Yoann Marquer,
Fabrizio Pastore,
Lionel C. Briand
Abstract:
Although the security testing of Web systems can be automated by generating crafted inputs, solutions to automate the test oracle, i.e., distinguishing correct from incorrect outputs, remain preliminary. Specifically, previous work has demonstrated the potential of metamorphic testing; indeed, security failures can be determined by metamorphic relations that turn valid inputs into malicious inputs…
▽ More
Although the security testing of Web systems can be automated by generating crafted inputs, solutions to automate the test oracle, i.e., distinguishing correct from incorrect outputs, remain preliminary. Specifically, previous work has demonstrated the potential of metamorphic testing; indeed, security failures can be determined by metamorphic relations that turn valid inputs into malicious inputs. However, without further guidance, metamorphic relations are typically executed on a large set of inputs, which is time-consuming and thus makes metamorphic testing impractical.
We propose AIM, an approach that automatically selects inputs to reduce testing costs while preserving vulnerability detection capabilities. AIM includes a clustering-based black box approach, to identify similar inputs based on their security properties. It also relies on a novel genetic algorithm able to efficiently select diverse inputs while minimizing their total cost. Further, it contains a problem-reduction component to reduce the search space and speed up the minimization process. We evaluated the effectiveness of AIM on two well-known Web systems, Jenkins and Joomla, with documented vulnerabilities. We compared AIM's results with four baselines. Overall, AIM reduced metamorphic testing time by 84% for Jenkins and 82% for Joomla, while preserving vulnerability detection. Furthermore, AIM outperformed all the considered baselines regarding vulnerability coverage.
△ Less
Submitted 21 February, 2024; v1 submitted 16 February, 2024;
originally announced February 2024.
-
Learning-Based Relaxation of Completeness Requirements for Data Entry Forms
Authors:
Hichem Belgacem,
Xiaochen Li,
Domenico Bianculli,
Lionel C. Briand
Abstract:
Data entry forms use completeness requirements to specify the fields that are required or optional to fill for collecting necessary information from different types of users.
However, some required fields may not be applicable for certain types of users anymore. Nevertheless, they may still be incorrectly marked as required in the form; we call such fields obsolete required fields.
Since obsol…
▽ More
Data entry forms use completeness requirements to specify the fields that are required or optional to fill for collecting necessary information from different types of users.
However, some required fields may not be applicable for certain types of users anymore. Nevertheless, they may still be incorrectly marked as required in the form; we call such fields obsolete required fields.
Since obsolete required fields usually have not-null validation checks before submitting the form, users have to enter meaningless values in such fields in order to complete the form submission. These meaningless values threaten the quality of the filled data. To avoid users filling meaningless values, existing techniques usually rely on manually written rules to identify the obsolete required fields and relax their completeness requirements. However, these techniques are ineffective and costly. In this paper, we propose LACQUER, a learning-based automated approach for relaxing the completeness requirements of data entry forms. LACQUER builds Bayesian Network models to automatically learn conditions under which users had to fill meaningless values. To improve its learning ability, LACQUER identifies the cases where a required field is only applicable for a small group of users, and uses SMOTE, an oversampling technique, to generate more instances on such fields for effectively mining dependencies on them. Our experimental results show that LACQUER can accurately relax the completeness requirements of required fields in data entry forms with precision values ranging between 0.76 and 0.90 on different datasets. LACQUER can prevent users from filling 20% to 64% of meaningless values, with negative predictive values between 0.72 and 0.91. Furthermore, LACQUER is efficient; it takes at most 839 ms to predict the completeness requirement of an instance.
△ Less
Submitted 13 December, 2023; v1 submitted 22 November, 2023;
originally announced November 2023.
-
SMARLA: A Safety Monitoring Approach for Deep Reinforcement Learning Agents
Authors:
Amirhossein Zolfagharian,
Manel Abdellatif,
Lionel C. Briand,
Ramesh S
Abstract:
Deep reinforcement learning algorithms (DRL) are increasingly being used in safety-critical systems. Ensuring the safety of DRL agents is a critical concern in such contexts. However, relying solely on testing is not sufficient to ensure safety as it does not offer guarantees. Building safety monitors is one solution to alleviate this challenge. This paper proposes SMARLA, a machine learning-based…
▽ More
Deep reinforcement learning algorithms (DRL) are increasingly being used in safety-critical systems. Ensuring the safety of DRL agents is a critical concern in such contexts. However, relying solely on testing is not sufficient to ensure safety as it does not offer guarantees. Building safety monitors is one solution to alleviate this challenge. This paper proposes SMARLA, a machine learning-based safety monitoring approach designed for DRL agents. For practical reasons, SMARLA is designed to be black-box (as it does not require access to the internals or training data of the agent) and leverages state abstraction to reduce the state space and thus facilitate the learning of safety violation prediction models from agent's states. We validated SMARLA on two well-known RL case studies. Empirical analysis reveals that SMARLA achieves accurate violation prediction with a low false positive rate, and can predict safety violations at an early stage, approximately halfway through the agent's execution before violations occur.
△ Less
Submitted 31 January, 2024; v1 submitted 3 August, 2023;
originally announced August 2023.
-
Automated Smell Detection and Recommendation in Natural Language Requirements
Authors:
Alvaro Veizaga,
Seung Yeob Shin,
Lionel C. Briand
Abstract:
Requirement specifications are typically written in natural language (NL) due to its usability across multiple domains and understandability by all stakeholders. However, unstructured NL is prone to quality problems (e.g., ambiguity) when writing requirements, which can result in project failures. To address this issue, we present a tool, named Paska, that takes as input any NL requirements, autom…
▽ More
Requirement specifications are typically written in natural language (NL) due to its usability across multiple domains and understandability by all stakeholders. However, unstructured NL is prone to quality problems (e.g., ambiguity) when writing requirements, which can result in project failures. To address this issue, we present a tool, named Paska, that takes as input any NL requirements, automatically detects quality problems as smells in the requirements, and offers recommendations to improve their quality. Our approach relies on natural language processing (NLP) techniques and a state-of-the-art controlled natural language (CNL) for requirements (Rimay), to detect smells and suggest recommendations using patterns defined in Rimay to improve requirement quality. We evaluated Paska through an industrial case study in the financial domain involving 13 systems and 2725 annotated requirements. The results show that our tool is accurate in detecting smells (89% precision and recall) and suggesting appropriate Rimay pattern recommendations (96% precision and 94% recall).
△ Less
Submitted 25 November, 2023; v1 submitted 11 May, 2023;
originally announced May 2023.
-
Identifying the Hazard Boundary of ML-enabled Autonomous Systems Using Cooperative Co-Evolutionary Search
Authors:
Sepehr Sharifi,
Donghwan Shin,
Lionel C. Briand,
Nathan Aschbacher
Abstract:
In Machine Learning (ML)-enabled autonomous systems (MLASs), it is essential to identify the hazard boundary of ML Components (MLCs) in the MLAS under analysis. Given that such boundary captures the conditions in terms of MLC behavior and system context that can lead to hazards, it can then be used to, for example, build a safety monitor that can take any predefined fallback mechanisms at runtime…
▽ More
In Machine Learning (ML)-enabled autonomous systems (MLASs), it is essential to identify the hazard boundary of ML Components (MLCs) in the MLAS under analysis. Given that such boundary captures the conditions in terms of MLC behavior and system context that can lead to hazards, it can then be used to, for example, build a safety monitor that can take any predefined fallback mechanisms at runtime when reaching the hazard boundary. However, determining such hazard boundary for an ML component is challenging. This is due to the problem space combining system contexts (i.e., scenarios) and MLC behaviors (i.e., inputs and outputs) being far too large for exhaustive exploration and even to handle using conventional metaheuristics, such as genetic algorithms. Additionally, the high computational cost of simulations required to determine any MLAS safety violations makes the problem even more challenging. Furthermore, it is unrealistic to consider a region in the problem space deterministically safe or unsafe due to the uncontrollable parameters in simulations and the non-linear behaviors of ML models (e.g., deep neural networks) in the MLAS under analysis. To address the challenges, we propose MLCSHE (ML Component Safety Hazard Envelope), a novel method based on a Cooperative Co-Evolutionary Algorithm (CCEA), which aims to tackle a high-dimensional problem by decomposing it into two lower-dimensional search subproblems. Moreover, we take a probabilistic view of safe and unsafe regions and define a novel fitness function to measure the distance from the probabilistic hazard boundary and thus drive the search effectively. We evaluate the effectiveness and efficiency of MLCSHE on a complex Autonomous Vehicle (AV) case study. Our evaluation results show that MLCSHE is significantly more effective and efficient compared to a standard genetic algorithm and random search.
△ Less
Submitted 16 October, 2023; v1 submitted 31 January, 2023;
originally announced January 2023.
-
Learning Failure-Inducing Models for Testing Software-Defined Networks
Authors:
Raphaël Ollando,
Seung Yeob Shin,
Lionel C. Briand
Abstract:
Software-defined networks (SDN) enable flexible and effective communication systems that are managed by centralized software controllers. However, such a controller can undermine the underlying communication network of an SDN-based system and thus must be carefully tested. When an SDN-based system fails, in order to address such a failure, engineers need to precisely understand the conditions unde…
▽ More
Software-defined networks (SDN) enable flexible and effective communication systems that are managed by centralized software controllers. However, such a controller can undermine the underlying communication network of an SDN-based system and thus must be carefully tested. When an SDN-based system fails, in order to address such a failure, engineers need to precisely understand the conditions under which it occurs. In this article, we introduce a machine learning-guided fuzzing method, named FuzzSDN, aiming at both (1) generating effective test data leading to failures in SDN-based systems and (2) learning accurate failure-inducing models that characterize conditions under which such system fails. To our knowledge, no existing work simultaneously addresses these two objectives for SDNs. We evaluate FuzzSDN by applying it to systems controlled by two open-source SDN controllers. Further, we compare FuzzSDN with two state-of-the-art methods for fuzzing SDNs and two baselines for learning failure-inducing models. Our results show that (1) compared to the state-of-the-art methods, FuzzSDN generates at least 12 times more failures, within the same time budget, with a controller that is fairly robust to fuzzing and (2) our failure-inducing models have, on average, a precision of 98% and a recall of 86%, significantly outperforming the baselines.
△ Less
Submitted 8 January, 2024; v1 submitted 27 October, 2022;
originally announced October 2022.
-
NLP-based Automated Compliance Checking of Data Processing Agreements against GDPR
Authors:
Orlando Amaral,
Muhammad Ilyas Azeem,
Sallam Abualhaija,
Lionel C Briand
Abstract:
Processing personal data is regulated in Europe by the General Data Protection Regulation (GDPR) through data processing agreements (DPAs). Checking the compliance of DPAs contributes to the compliance verification of software systems as DPAs are an important source of requirements for software development involving the processing of personal data. However, manually checking whether a given DPA co…
▽ More
Processing personal data is regulated in Europe by the General Data Protection Regulation (GDPR) through data processing agreements (DPAs). Checking the compliance of DPAs contributes to the compliance verification of software systems as DPAs are an important source of requirements for software development involving the processing of personal data. However, manually checking whether a given DPA complies with GDPR is challenging as it requires significant time and effort for understanding and identifying DPA-relevant compliance requirements in GDPR and then verifying these requirements in the DPA. In this paper, we propose an automated solution to check the compliance of a given DPA against GDPR. In close interaction with legal experts, we first built two artifacts: (i) the "shall" requirements extracted from the GDPR provisions relevant to DPA compliance and (ii) a glossary table defining the legal concepts in the requirements. Then, we developed an automated solution that leverages natural language processing (NLP) technologies to check the compliance of a given DPA against these "shall" requirements. Specifically, our approach automatically generates phrasal-level representations for the textual content of the DPA and compares it against predefined representations of the "shall" requirements. Over a dataset of 30 actual DPAs, the approach correctly finds 618 out of 750 genuine violations while raising 76 false violations, and further correctly identifies 524 satisfied requirements. The approach has thus an average precision of 89.1%, a recall of 82.4%, and an accuracy of 84.6%. Compared to a baseline that relies on off-the-shelf NLP tools, our approach provides an average accuracy gain of ~20 percentage points. The accuracy of our approach can be improved to ~94% with limited manual verification effort.
△ Less
Submitted 18 June, 2023; v1 submitted 20 September, 2022;
originally announced September 2022.
-
Metamorphic Testing for Web System Security
Authors:
Nazanin Bayati Chaleshtari,
Fabrizio Pastore,
Arda Goknil,
Lionel C. Briand
Abstract:
Security testing aims at verifying that the software meets its security properties. In modern Web systems, however, this often entails the verification of the outputs generated when exercising the system with a very large set of inputs. Full automation is thus required to lower costs and increase the effectiveness of security testing. Unfortunately, to achieve such automation, in addition to strat…
▽ More
Security testing aims at verifying that the software meets its security properties. In modern Web systems, however, this often entails the verification of the outputs generated when exercising the system with a very large set of inputs. Full automation is thus required to lower costs and increase the effectiveness of security testing. Unfortunately, to achieve such automation, in addition to strategies for automatically deriving test inputs, we need to address the oracle problem, which refers to the challenge, given an input for a system, of distinguishing correct from incorrect behavior. In this paper, we propose Metamorphic Security Testing for Web-interactions (MST-wi), a metamorphic testing approach that integrates test input generation strategies inspired by mutational fuzzing and alleviates the oracle problem in security testing. It enables engineers to specify metamorphic relations (MRs) that capture many security properties of Web systems. To facilitate the specification of such MRs, we provide a domain-specific language accompanied by an Eclipse editor. MST-wi automatically collects the input data and transforms the MRs into executable Java code to automatically perform security testing. It automatically tests Web systems to detect vulnerabilities based on the relations and collected data. We provide a catalog of 76 system-agnostic MRs to automate security testing in Web systems. It covers 39% of the OWASP security testing activities not automated by state-of-the-art techniques; further, our MRs can automatically discover 102 different types of vulnerabilities, which correspond to 45% of the vulnerabilities due to violations of security design principles according to the MITRE CWE database. We also define guidelines that enable test engineers to improve the testability of the system under test with respect to our approach.
△ Less
Submitted 7 March, 2023; v1 submitted 19 August, 2022;
originally announced August 2022.
-
A Machine Learning Approach for Automated Filling of Categorical Fields in Data Entry Forms
Authors:
Hichem Belgacem,
Xiaochen Li,
Domenico Bianculli,
Lionel C. Briand
Abstract:
Users frequently interact with software systems through data entry forms. However, form filling is time-consuming and error-prone. Although several techniques have been proposed to auto-complete or pre-fill fields in the forms, they provide limited support to help users fill categorical fields, i.e., fields that require users to choose the right value among a large set of options.
In this paper,…
▽ More
Users frequently interact with software systems through data entry forms. However, form filling is time-consuming and error-prone. Although several techniques have been proposed to auto-complete or pre-fill fields in the forms, they provide limited support to help users fill categorical fields, i.e., fields that require users to choose the right value among a large set of options.
In this paper, we propose LAFF, a learning-based automated approach for filling categorical fields in data entry forms. LAFF first builds Bayesian Network models by learning field dependencies from a set of historical input instances, representing the values of the fields that have been filled in the past. To improve its learning ability, LAFF uses local modeling to effectively mine the local dependencies of fields in a cluster of input instances. During the form filling phase, LAFF uses such models to predict possible values of a target field, based on the values in the already-filled fields of the form and their dependencies; the predicted values (endorsed based on field dependencies and prediction confidence) are then provided to the end-user as a list of suggestions.
We evaluated LAFF by assessing its effectiveness and efficiency in form filling on two datasets, one of them proprietary from the banking domain. Experimental results show that LAFF is able to provide accurate suggestions with a Mean Reciprocal Rank value above 0.73. Furthermore, LAFF is efficient, requiring at most 317 ms per suggestion.
△ Less
Submitted 28 April, 2022; v1 submitted 17 February, 2022;
originally announced February 2022.
-
An AI-based Approach for Tracing Content Requirements in Financial Documents
Authors:
Xiaochen Li,
Domenico Bianculli,
Lionel C. Briand
Abstract:
The completeness (in terms of content) of financial documents is a fundamental requirement for investment funds. To ensure completeness, financial regulators spend a huge amount of time for carefully checking every financial document based on the relevant content requirements, which prescribe the information types to be included in financial documents (e.g., the description of shares' issue condit…
▽ More
The completeness (in terms of content) of financial documents is a fundamental requirement for investment funds. To ensure completeness, financial regulators spend a huge amount of time for carefully checking every financial document based on the relevant content requirements, which prescribe the information types to be included in financial documents (e.g., the description of shares' issue conditions). Although several techniques have been proposed to automatically detect certain types of information in documents in various application domains, they provide limited support to help regulators automatically identify the text chunks related to financial information types, due to the complexity of financial documents and the diversity of the sentences characterizing an information type. In this paper, we propose FITI, an artificial intelligence (AI)-based method for tracing content requirements in financial documents. Given a new financial document, FITI selects a set of candidate sentences for efficient information type identification. Then, FITI uses a combination of rule-based and data-centric approaches, by leveraging information retrieval (IR) and machine learning (ML) techniques that analyze the words, sentences, and contexts related to an information type, to rank candidate sentences. Finally, using a list of indicator phrases related to each information type, a heuristic-based selector, which considers both the sentence ranking and the domain-specific phrases, determines a list of sentences corresponding to each information type. We evaluated FITI by assessing its effectiveness in tracing financial content requirements in 100 financial documents. Experimental results show that FITI provides accurate identification with average precision and recall values of 0.824 and 0.646, respectively. Furthermore, FITI can detect about 80% of missing information types in financial documents.
△ Less
Submitted 28 October, 2021;
originally announced October 2021.
-
AI-enabled Automation for Completeness Checking of Privacy Policies
Authors:
Orlando Amaral,
Sallam Abualhaija,
Damiano Torre,
Mehrdad Sabetzadeh,
Lionel C. Briand
Abstract:
Technological advances in information sharing have raised concerns about data protection. Privacy policies contain privacy-related requirements about how the personal data of individuals will be handled by an organization or a software system (e.g., a web service or an app). In Europe, privacy policies are subject to compliance with the General Data Protection Regulation (GDPR). A prerequisite for…
▽ More
Technological advances in information sharing have raised concerns about data protection. Privacy policies contain privacy-related requirements about how the personal data of individuals will be handled by an organization or a software system (e.g., a web service or an app). In Europe, privacy policies are subject to compliance with the General Data Protection Regulation (GDPR). A prerequisite for GDPR compliance checking is to verify whether the content of a privacy policy is complete according to the provisions of GDPR. Incomplete privacy policies might result in large fines on violating organization as well as incomplete privacy-related software specifications. Manual completeness checking is both time-consuming and error-prone. In this paper, we propose AI-based automation for the completeness checking of privacy policies. Through systematic qualitative methods, we first build two artifacts to characterize the privacy-related provisions of GDPR, namely a conceptual model and a set of completeness criteria. Then, we develop an automated solution on top of these artifacts by leveraging a combination of natural language processing and supervised machine learning. Specifically, we identify the GDPR-relevant information content in privacy policies and subsequently check them against the completeness criteria. To evaluate our approach, we collected 234 real privacy policies from the fund industry. Over a set of 48 unseen privacy policies, our approach detected 300 of the total of 334 violations of some completeness criteria correctly, while producing 23 false positives. The approach thus has a precision of 92.9% and recall of 89.8%. Compared to a baseline that applies keyword search only, our approach results in an improvement of 24.5% in precision and 38% in recall.
△ Less
Submitted 5 October, 2021; v1 submitted 10 June, 2021;
originally announced June 2021.
-
Optimal Priority Assignment for Real-Time Systems: A Coevolution-Based Approach
Authors:
Jaekwon Lee,
Seung Yeob Shin,
Shiva Nejati,
Lionel C. Briand
Abstract:
In real-time systems, priorities assigned to real-time tasks determine the order of task executions, by relying on an underlying task scheduling policy. Assigning optimal priority values to tasks is critical to allow the tasks to complete their executions while maximizing safety margins from their specified deadlines. This enables real-time systems to tolerate unexpected overheads in task executio…
▽ More
In real-time systems, priorities assigned to real-time tasks determine the order of task executions, by relying on an underlying task scheduling policy. Assigning optimal priority values to tasks is critical to allow the tasks to complete their executions while maximizing safety margins from their specified deadlines. This enables real-time systems to tolerate unexpected overheads in task executions and still meet their deadlines. In practice, priority assignments result from an interactive process between the development and testing teams. In this article, we propose an automated method that aims to identify the best possible priority assignments in real-time systems, accounting for multiple objectives regarding safety margins and engineering constraints. Our approach is based on a multi-objective, competitive coevolutionary algorithm mimicking the interactive priority assignment process between the development and testing teams. We evaluate our approach by applying it to six industrial systems from different domains and several synthetic systems. The results indicate that our approach significantly outperforms both our baselines, i.e., random search and sequential search, and solutions defined by practitioners.Our approach scales to complex industrial systems as an offline analysis method that attempts to find near-optimal solutions within acceptable time, i.e., less than 16 hours.
△ Less
Submitted 21 March, 2022; v1 submitted 15 February, 2021;
originally announced February 2021.
-
Combining Genetic Programming and Model Checking to Generate Environment Assumptions
Authors:
Khouloud Gaaloul,
Claudio Menghi,
Shiva Nejati,
Lionel C. Briand,
Yago Isasi Parache
Abstract:
Software verification may yield spurious failures when environment assumptions are not accounted for. Environment assumptions are the expectations that a system or a component makes about its operational environment and are often specified in terms of conditions over the inputs of that system or component. In this article, we propose an approach to automatically infer environment assumptions for C…
▽ More
Software verification may yield spurious failures when environment assumptions are not accounted for. Environment assumptions are the expectations that a system or a component makes about its operational environment and are often specified in terms of conditions over the inputs of that system or component. In this article, we propose an approach to automatically infer environment assumptions for Cyber-Physical Systems (CPS). Our approach improves the state-of-the-art in three different ways: First, we learn assumptions for complex CPS models involving signal and numeric variables; second, the learned assumptions include arithmetic expressions defined over multiple variables; third, we identify the trade-off between soundness and informativeness of environment assumptions and demonstrate the flexibility of our approach in prioritizing either of these criteria.
We evaluate our approach using a public domain benchmark of CPS models from Lockheed Martin and a component of a satellite control system from LuxSpace, a satellite system provider. The results show that our approach outperforms state-of-the-art techniques on learning assumptions for CPS models, and further, when applied to our industrial CPS model, our approach is able to learn assumptions that are sufficiently close to the assumptions manually developed by engineers to be of practical value.
△ Less
Submitted 6 January, 2021;
originally announced January 2021.
-
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
Authors:
Fitash Ul Haq,
Donghwan Shin,
Lionel C. Briand,
Thomas Stifter,
Jun Wang
Abstract:
Automatically detecting the positions of key-points (e.g., facial key-points or finger key-points) in an image is an essential problem in many applications, such as driver's gaze detection and drowsiness detection in automated driving systems. With the recent advances of Deep Neural Networks (DNNs), Key-Points detection DNNs (KP-DNNs) have been increasingly employed for that purpose. Nevertheless,…
▽ More
Automatically detecting the positions of key-points (e.g., facial key-points or finger key-points) in an image is an essential problem in many applications, such as driver's gaze detection and drowsiness detection in automated driving systems. With the recent advances of Deep Neural Networks (DNNs), Key-Points detection DNNs (KP-DNNs) have been increasingly employed for that purpose. Nevertheless, KP-DNN testing and validation have remained a challenging problem because KP-DNNs predict many independent key-points at the same time -- where each individual key-point may be critical in the targeted application -- and images can vary a great deal according to many factors.
In this paper, we present an approach to automatically generate test data for KP-DNNs using many-objective search. In our experiments, focused on facial key-points detection DNNs developed for an industrial automotive application, we show that our approach can generate test suites to severely mispredict, on average, more than 93% of all key-points. In comparison, random search-based test data generation can only severely mispredict 41% of them. Many of these mispredictions, however, are not avoidable and should not therefore be considered failures. We also empirically compare state-of-the-art, many-objective search algorithms and their variants, tailored for test suite generation. Furthermore, we investigate and demonstrate how to learn specific conditions, based on image characteristics (e.g., head posture and skin color), that lead to severe mispredictions. Such conditions serve as a basis for risk analysis or DNN retraining.
△ Less
Submitted 25 May, 2021; v1 submitted 11 December, 2020;
originally announced December 2020.
-
Trace-Checking CPS Properties: Bridging the Cyber-Physical Gap
Authors:
Claudio Menghi,
Enrico Viganò,
Domenico Bianculli,
Lionel C. Briand
Abstract:
Cyber-physical systems combine software and physical components. Specification-driven trace-checking tools for CPS usually provide users with a specification language to express the requirements of interest, and an automatic procedure to check whether these requirements hold on the execution traces of a CPS. Although there exist several specification languages for CPS, they are often not sufficien…
▽ More
Cyber-physical systems combine software and physical components. Specification-driven trace-checking tools for CPS usually provide users with a specification language to express the requirements of interest, and an automatic procedure to check whether these requirements hold on the execution traces of a CPS. Although there exist several specification languages for CPS, they are often not sufficiently expressive to allow the specification of complex CPS properties related to the software and the physical components and their interactions.
In this paper, we propose (i) the Hybrid Logic of Signals (HLS), a logic-based language that allows the specification of complex CPS requirements, and (ii) ThEodorE, an efficient SMT-based trace-checking procedure. This procedure reduces the problem of checking a CPS requirement over an execution trace, to checking the satisfiability of an SMT formula.
We evaluated our contributions by using a representative industrial case study in the satellite domain. We assessed the expressiveness of HLS by considering 212 requirements of our case study. HLS could express all the 212 requirements. We also assessed the applicability of ThEodorE by running the trace-checking procedure for 747 trace-requirement combinations. ThEodorE was able to produce a verdict in 74.5% of the cases. Finally, we compared HLS and ThEodorE with other specification languages and trace-checking tools from the literature. Our results show that, from a practical standpoint, our approach offers a better trade-off between expressiveness and performance.
△ Less
Submitted 15 September, 2021; v1 submitted 25 September, 2020;
originally announced September 2020.
-
Estimating Probabilistic Safe WCET Ranges of Real-Time Systems at Design Stages
Authors:
Jaekwon Lee,
Seung Yeob Shin,
Shiva Nejati,
Lionel C. Briand,
Yago Isasi Parache
Abstract:
Estimating worst-case execution times (WCET) is an important activity at early design stages of real-time systems. Based on WCET estimates, engineers make design and implementation decisions to ensure that task executions always complete before their specified deadlines. However, in practice, engineers often cannot provide precise point WCET estimates and prefer to provide plausible WCET ranges. G…
▽ More
Estimating worst-case execution times (WCET) is an important activity at early design stages of real-time systems. Based on WCET estimates, engineers make design and implementation decisions to ensure that task executions always complete before their specified deadlines. However, in practice, engineers often cannot provide precise point WCET estimates and prefer to provide plausible WCET ranges. Given a set of real-time tasks with such ranges, we provide an automated technique to determine for what WCET values the system is likely to meet its deadlines, and hence operate safely with a probabilistic guarantee. Our approach combines a search algorithm for generating worst-case scheduling scenarios with polynomial logistic regression for inferring probabilistic safe WCET ranges. We evaluated our approach by applying it to three industrial systems from different domains and several synthetic systems. Our approach efficiently and accurately estimates probabilistic safe WCET ranges within which deadlines are likely to be satisfied with a high degree of confidence.
△ Less
Submitted 7 June, 2022; v1 submitted 20 July, 2020;
originally announced July 2020.
-
Signal-Based Properties of Cyber-Physical Systems: Taxonomy and Logic-based Characterization
Authors:
Chaima Boufaied,
Maris Jukss,
Domenico Bianculli,
Lionel Claude Briand,
Yago Isasi Parache
Abstract:
The behavior of a cyber-physical system (CPS) is usually defined in terms of the input and output signals processed by sensors and actuators. Requirements specifications of CPSs are typically expressed using signal-based temporal properties. Expressing such requirements is challenging, because of (1) the many features that can be used to characterize a signal behavior; (2) the broad variation in e…
▽ More
The behavior of a cyber-physical system (CPS) is usually defined in terms of the input and output signals processed by sensors and actuators. Requirements specifications of CPSs are typically expressed using signal-based temporal properties. Expressing such requirements is challenging, because of (1) the many features that can be used to characterize a signal behavior; (2) the broad variation in expressiveness of the specification languages (i.e., temporal logics) used for defining signal-based temporal properties. Thus, system and software engineers need effective guidance on selecting appropriate signal behavior types and an adequate specification language, based on the type of requirements they have to define. In this paper, we present a taxonomy of the various types of signal-based properties and provide, for each type, a comprehensive and detailed description as well as a formalization in a temporal logic. Furthermore, we review the expressiveness of state-of-the-art signal-based temporal logics in terms of the property types identified in the taxonomy. Moreover, we report on the application of our taxonomy to classify the requirements specifications of an industrial case study in the aerospace domain, in order to assess the feasibility of using the property types included in our taxonomy and the completeness of the latter.
△ Less
Submitted 28 December, 2020; v1 submitted 18 October, 2019;
originally announced October 2019.
-
Approximation-Refinement Testing of Compute-Intensive Cyber-Physical Models: An Approach Based on System Identification
Authors:
Claudio Menghi,
Shiva Nejati,
Lionel C. Briand,
Yago Isasi Parache
Abstract:
Black-box testing has been extensively applied to test models of Cyber-Physical systems (CPS) since these models are not often amenable to static and symbolic testing and verification. Black-box testing, however, requires to execute the model under test for a large number of candidate test inputs. This poses a challenge for a large and practically-important category of CPS models, known as compute…
▽ More
Black-box testing has been extensively applied to test models of Cyber-Physical systems (CPS) since these models are not often amenable to static and symbolic testing and verification. Black-box testing, however, requires to execute the model under test for a large number of candidate test inputs. This poses a challenge for a large and practically-important category of CPS models, known as compute-intensive CPS (CI-CPS) models, where a single simulation may take hours to complete. We propose a novel approach, namely ARIsTEO, to enable effective and efficient testing of CI-CPS models. Our approach embeds black-box testing into an iterative approximation-refinement loop. At the start, some sampled inputs and outputs of the CI-CPS model under test are used to generate a surrogate model that is faster to execute and can be subjected to black-box testing. Any failure-revealing test identified for the surrogate model is checked on the original model. If spurious, the test results are used to refine the surrogate model to be tested again. Otherwise, the test reveals a valid failure. We evaluated ARIsTEO by comparing it with S-Taliro, an open-source and industry-strength tool for testing CPS models. Our results, obtained based on five publicly-available CPS models, show that, on average, ARIsTEO is able to find 24% more requirements violations than S-Taliro and is 31% faster than S-Taliro in finding those violations. We further assessed the effectiveness and efficiency of ARIsTEO on a large industrial case study from the satellite domain. In contrast to S-Taliro, ARIsTEO successfully tested two different versions of this model and could identify three requirements violations, requiring four hours, on average, for each violation.
△ Less
Submitted 7 October, 2019;
originally announced October 2019.
-
Automatic Generation of Acceptance Test Cases from Use Case Specifications: an NLP-based Approach
Authors:
Chunhui Wang,
Fabrizio Pastore,
Arda Goknil,
Lionel C. Briand
Abstract:
Acceptance testing is a validation activity performed to ensure the conformance of software systems with respect to their functional requirements. In safety critical systems, it plays a crucial role since it is enforced by software standards. Test engineers need to identify all the representative test execution scenarios from requirements, determine the runtime conditions that trigger these scenar…
▽ More
Acceptance testing is a validation activity performed to ensure the conformance of software systems with respect to their functional requirements. In safety critical systems, it plays a crucial role since it is enforced by software standards. Test engineers need to identify all the representative test execution scenarios from requirements, determine the runtime conditions that trigger these scenarios, and finally provide the input data that satisfy these conditions. Given that requirements specifications are typically large and often provided in natural language, the generation of acceptance test cases tends to be expensive and error-prone.
In this paper, we present UMTG, an approach that supports the generation of executable, system-level, acceptance test cases from requirements specifications in natural language, with the goal of reducing the manual effort required to generate test cases and ensuring requirements coverage. More specifically, UMTG automates the generation of acceptance test cases based on use case specifications and a domain model for the system under test, which are commonly produced in many development environments. Unlike existing approaches, it does not impose strong restrictions on the expressiveness of use case specifications. We rely on recent advances in natural language processing to automatically identify test scenarios and to generate formal constraints that capture conditions triggering the execution of the scenarios, thus enabling the generation of test data. In two industrial case studies, UMTG automatically and correctly translated 95% of the use case specification steps into formal constraints required for test data generation; furthermore, it generated test cases that exercise not only all the test scenarios manually implemented by experts, but also some critical scenarios not previously considered.
△ Less
Submitted 18 May, 2020; v1 submitted 19 July, 2019;
originally announced July 2019.
-
Dynamic Adaptation of Software-defined Networks for IoT Systems: A Search-based Approach
Authors:
Seung Yeob Shin,
Shiva Nejati,
Mehrdad Sabetzadeh,
Lionel C. Briand,
Chetan Arora,
Frank Zimmer
Abstract:
The concept of Internet of Things (IoT) has led to the development of many complex and critical systems such as smart emergency management systems. IoT-enabled applications typically depend on a communication network for transmitting large volumes of data in unpredictable and changing environments. These networks are prone to congestion when there is a burst in demand, e.g., as an emergency situat…
▽ More
The concept of Internet of Things (IoT) has led to the development of many complex and critical systems such as smart emergency management systems. IoT-enabled applications typically depend on a communication network for transmitting large volumes of data in unpredictable and changing environments. These networks are prone to congestion when there is a burst in demand, e.g., as an emergency situation is unfolding, and therefore rely on configurable software-defined networks (SDN). In this paper, we propose a dynamic adaptive SDN configuration approach for IoT systems. The approach enables resolving congestion in real time while minimizing network utilization, data transmission delays and adaptation costs. Our approach builds on existing work in dynamic adaptive search-based software engineering (SBSE) to reconfigure an SDN while simultaneously ensuring multiple quality of service criteria. We evaluate our approach on an industrial national emergency management system, which is aimed at detecting disasters and emergencies, and facilitating recovery and rescue operations by providing first responders with a reliable communication infrastructure. Our results indicate that (1) our approach is able to efficiently and effectively adapt an SDN to dynamically resolve congestion, and (2) compared to two baseline data forwarding algorithms that are static and non-adaptive, our approach increases data transmission rate by a factor of at least 3 and decreases data loss by at least 70%.
△ Less
Submitted 15 May, 2020; v1 submitted 29 May, 2019;
originally announced May 2019.
-
Automating System Test Case Classification and Prioritization for Use Case-Driven Testing in Product Lines
Authors:
Ines Hajri,
Arda Goknil,
Fabrizio Pastore,
Lionel C. Briand
Abstract:
Product Line Engineering (PLE) is a crucial practice in many software development environments where software systems are complex and developed for multiple customers with varying needs. At the same time, many development processes are use case-driven and this strongly influences their requirements engineering and system testing practices. In this paper, we propose, apply, and assess an automated…
▽ More
Product Line Engineering (PLE) is a crucial practice in many software development environments where software systems are complex and developed for multiple customers with varying needs. At the same time, many development processes are use case-driven and this strongly influences their requirements engineering and system testing practices. In this paper, we propose, apply, and assess an automated system test case classification and prioritization approach specifically targeting system testing in the context of use case-driven development of product families. Our approach provides: (i) automated support to classify, for a new product in a product family, relevant and valid system test cases associated with previous products, and (ii) automated prioritization of system test cases using multiple risk factors such as fault-proneness of requirements and requirements volatility in a product family. Our evaluation was performed in the context of an industrial product family in the automotive domain. Results provide empirical evidence that we propose a practical and beneficial way to classify and prioritize system test cases for industrial product lines.
△ Less
Submitted 17 June, 2020; v1 submitted 28 May, 2019;
originally announced May 2019.
-
Evaluating Model Testing and Model Checking for Finding Requirements Violations in Simulink Models
Authors:
Shiva Nejati,
Khouloud Gaaloul,
Claudio Menghi,
Lionel C. Briand,
Stephen Foster,
David Wolfe
Abstract:
Matlab/Simulink is a development and simulation language that is widely used by the Cyber-Physical System (CPS) industry to model dynamical systems. There are two mainstream approaches to verify CPS Simulink models: model testing that attempts to identify failures in models by executing them for a number of sampled test inputs, and model checking that attempts to exhaustively check the correctness…
▽ More
Matlab/Simulink is a development and simulation language that is widely used by the Cyber-Physical System (CPS) industry to model dynamical systems. There are two mainstream approaches to verify CPS Simulink models: model testing that attempts to identify failures in models by executing them for a number of sampled test inputs, and model checking that attempts to exhaustively check the correctness of models against some given formal properties. In this paper, we present an industrial Simulink model benchmark, provide a categorization of different model types in the benchmark, describe the recurring logical patterns in the model requirements, and discuss the results of applying model checking and model testing approaches to identify requirements violations in the benchmarked models. Based on the results, we discuss the strengths and weaknesses of model testing and model checking. Our results further suggest that model checking and model testing are complementary and by combining them, we can significantly enhance the capabilities of each of these approaches individually. We conclude by providing guidelines as to how the two approaches can be best applied together.
△ Less
Submitted 9 May, 2019;
originally announced May 2019.
-
Practical Constraint Solving for Generating System Test Data
Authors:
Ghanem Soltana,
Mehrdad Sabetzadeh,
Lionel C. Briand
Abstract:
The ability to generate test data is often a necessary prerequisite for automated software testing. For the generated data to be fit for its intended purpose, the data usually has to satisfy various logical constraints. When testing is performed at a system level, these constraints tend to be complex and are typically captured in expressive formalisms based on first-order logic. Motivated by impro…
▽ More
The ability to generate test data is often a necessary prerequisite for automated software testing. For the generated data to be fit for its intended purpose, the data usually has to satisfy various logical constraints. When testing is performed at a system level, these constraints tend to be complex and are typically captured in expressive formalisms based on first-order logic. Motivated by improving the feasibility and scalability of data generation for system testing, we present a novel approach, whereby we employ a combination of metaheuristic search and Satisfiability Modulo Theories (SMT) for constraint solving. Our approach delegates constraint solving tasks to metaheuristic search and SMT in such a way as to take advantage of the complementary strengths of the two techniques. We ground our work on test data models specified in UML, with OCL used as the constraint language. We present tool support and an evaluation of our approach over three industrial case studies. The results indicate that, for complex system test data generation problems, our approach presents substantial benefits over the state of the art in terms of applicability and scalability.
△ Less
Submitted 15 May, 2020; v1 submitted 1 February, 2019;
originally announced February 2019.