Search | arXiv e-print repository

Enhancing Manufacturing Quality Prediction Models through the Integration of Explainability Methods

Authors: Dennis Gross, Helge Spieker, Arnaud Gotlieb, Ricardo Knoblauch

Abstract: This research presents a method that utilizes explainability techniques to amplify the performance of machine learning (ML) models in forecasting the quality of milling processes, as demonstrated in this paper through a manufacturing use case. The methodology entails the initial training of ML models, followed by a fine-tuning phase where irrelevant features identified through explainability metho… ▽ More This research presents a method that utilizes explainability techniques to amplify the performance of machine learning (ML) models in forecasting the quality of milling processes, as demonstrated in this paper through a manufacturing use case. The methodology entails the initial training of ML models, followed by a fine-tuning phase where irrelevant features identified through explainability methods are eliminated. This procedural refinement results in performance enhancements, paving the way for potential reductions in manufacturing costs and a better understanding of the trained ML models. This study highlights the usefulness of explainability techniques in both explaining and optimizing predictive models in the manufacturing realm. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.16908 [pdf, other]

Towards Trustworthy Automated Driving through Qualitative Scene Understanding and Explanations

Authors: Nassim Belmecheri, Arnaud Gotlieb, Nadjib Lazaar, Helge Spieker

Abstract: Understanding driving scenes and communicating automated vehicle decisions are key requirements for trustworthy automated driving. In this article, we introduce the Qualitative Explainable Graph (QXG), which is a unified symbolic and qualitative representation for scene understanding in urban mobility. The QXG enables interpreting an automated vehicle's environment using sensor data and machine le… ▽ More Understanding driving scenes and communicating automated vehicle decisions are key requirements for trustworthy automated driving. In this article, we introduce the Qualitative Explainable Graph (QXG), which is a unified symbolic and qualitative representation for scene understanding in urban mobility. The QXG enables interpreting an automated vehicle's environment using sensor data and machine learning models. It utilizes spatio-temporal graphs and qualitative constraints to extract scene semantics from raw sensor inputs, such as LiDAR and camera data, offering an interpretable scene model. A QXG can be incrementally constructed in real-time, making it a versatile tool for in-vehicle explanations across various sensor types. Our research showcases the potential of QXG, particularly in the context of automated driving, where it can rationalize decisions by linking the graph with observed actions. These explanations can serve diverse purposes, from informing passengers and alerting vulnerable road users to enabling post-hoc analysis of prior behaviors. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: SAE International Journal of Connected and Automated Vehicles

arXiv:2403.15065 [pdf, other]

doi 10.1145/3644032.3644458

Testing for Fault Diversity in Reinforcement Learning

Authors: Quentin Mazouni, Helge Spieker, Arnaud Gotlieb, Mathieu Acher

Abstract: Reinforcement Learning is the premier technique to approach sequential decision problems, including complex tasks such as driving cars and landing spacecraft. Among the software validation and verification practices, testing for functional fault detection is a convenient way to build trustworthiness in the learned decision model. While recent works seek to maximise the number of detected faults, n… ▽ More Reinforcement Learning is the premier technique to approach sequential decision problems, including complex tasks such as driving cars and landing spacecraft. Among the software validation and verification practices, testing for functional fault detection is a convenient way to build trustworthiness in the learned decision model. While recent works seek to maximise the number of detected faults, none consider fault characterisation during the search for more diversity. We argue that policy testing should not find as many failures as possible (e.g., inputs that trigger similar car crashes) but rather aim at revealing as informative and diverse faults as possible in the model. In this paper, we explore the use of quality diversity optimisation to solve the problem of fault diversity in policy testing. Quality diversity (QD) optimisation is a type of evolutionary algorithm to solve hard combinatorial optimisation problems where high-quality diverse solutions are sought. We define and address the underlying challenges of adapting QD optimisation to the test of action policies. Furthermore, we compare classical QD optimisers to state-of-the-art frameworks dedicated to policy testing, both in terms of search efficiency and fault diversity. We show that QD optimisation, while being conceptually simple and generally applicable, finds effectively more diverse faults in the decision model, and conclude that QD-based policy testing is a promising approach. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 11 pages, 4 figures, 1 algorithm, AST @ ICSE 2024

arXiv:2403.09668 [pdf, other]

Trustworthy Automated Driving through Qualitative Scene Understanding and Explanations

Authors: Nassim Belmecheri, Arnaud Gotlieb, Nadjib Lazaar, Helge Spieker

Abstract: We present the Qualitative Explainable Graph (QXG): a unified symbolic and qualitative representation for scene understanding in urban mobility. QXG enables the interpretation of an automated vehicle's environment using sensor data and machine learning models. It leverages spatio-temporal graphs and qualitative constraints to extract scene semantics from raw sensor inputs, such as LiDAR and camera… ▽ More We present the Qualitative Explainable Graph (QXG): a unified symbolic and qualitative representation for scene understanding in urban mobility. QXG enables the interpretation of an automated vehicle's environment using sensor data and machine learning models. It leverages spatio-temporal graphs and qualitative constraints to extract scene semantics from raw sensor inputs, such as LiDAR and camera data, offering an intelligible scene model. Crucially, QXG can be incrementally constructed in real-time, making it a versatile tool for in-vehicle explanations and real-time decision-making across various sensor types. Our research showcases the transformative potential of QXG, particularly in the context of automated driving, where it elucidates decision rationales by linking the graph with vehicle actions. These explanations serve diverse purposes, from informing passengers and alerting vulnerable road users (VRUs) to enabling post-analysis of prior behaviours. △ Less

Submitted 29 January, 2024; originally announced March 2024.

Comments: Transport Research Arena (TRA) 2024

arXiv:2312.09680 [pdf, other]

A Review of Validation and Verification of Neural Network-based Policies for Sequential Decision Making

Authors: Q. Mazouni, H. Spieker, A. Gotlieb, M. Acher

Abstract: In sequential decision making, neural networks (NNs) are nowadays commonly used to represent and learn the agent's policy. This area of application has implied new software quality assessment challenges that traditional validation and verification practises are not able to handle. Subsequently, novel approaches have emerged to adapt those techniques to NN-based policies for sequential decision mak… ▽ More In sequential decision making, neural networks (NNs) are nowadays commonly used to represent and learn the agent's policy. This area of application has implied new software quality assessment challenges that traditional validation and verification practises are not able to handle. Subsequently, novel approaches have emerged to adapt those techniques to NN-based policies for sequential decision making. This survey paper aims at summarising these novel contributions and proposing future research directions. We conducted a literature review of recent research papers (from 2018 to beginning of 2023), whose topics cover aspects of the test or verification of NN-based policies. The selection has been enriched by a snowballing process from the previously selected papers, in order to relax the scope of the study and provide the reader with insight into similar verification challenges and their recent solutions. 18 papers have been finally selected. Our results show evidence of increasing interest for this subject. They highlight the diversity of both the exact problems considered and the techniques used to tackle them. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: 10 pages, 3 figures, RJCIA 2023

arXiv:2310.15586 [pdf, other]

doi 10.1109/TITS.2023.3322690

Detecting Intentional AIS Shutdown in Open Sea Maritime Surveillance Using Self-Supervised Deep Learning

Authors: Pierre Bernabé, Arnaud Gotlieb, Bruno Legeard, Dusica Marijan, Frank Olaf Sem-Jacobsen, Helge Spieker

Abstract: In maritime traffic surveillance, detecting illegal activities, such as illegal fishing or transshipment of illicit products is a crucial task of the coastal administration. In the open sea, one has to rely on Automatic Identification System (AIS) message transmitted by on-board transponders, which are captured by surveillance satellites. However, insincere vessels often intentionally shut down th… ▽ More In maritime traffic surveillance, detecting illegal activities, such as illegal fishing or transshipment of illicit products is a crucial task of the coastal administration. In the open sea, one has to rely on Automatic Identification System (AIS) message transmitted by on-board transponders, which are captured by surveillance satellites. However, insincere vessels often intentionally shut down their AIS transponders to hide illegal activities. In the open sea, it is very challenging to differentiate intentional AIS shutdowns from missing reception due to protocol limitations, bad weather conditions or restricting satellite positions. This paper presents a novel approach for the detection of abnormal AIS missing reception based on self-supervised deep learning techniques and transformer models. Using historical data, the trained model predicts if a message should be received in the upcoming minute or not. Afterwards, the model reports on detected anomalies by comparing the prediction with what actually happens. Our method can process AIS messages in real-time, in particular, more than 500 Millions AIS messages per month, corresponding to the trajectories of more than 60 000 ships. The method is evaluated on 1-year of real-world data coming from four Norwegian surveillance satellites. Using related research results, we validated our method by rediscovering already detected intentional AIS shutdowns. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: IEEE Transactions on Intelligent Transportation Systems

arXiv:2308.12755 [pdf, other]

Acquiring Qualitative Explainable Graphs for Automated Driving Scene Interpretation

Authors: Nassim Belmecheri, Arnaud Gotlieb, Nadjib Lazaar, Helge Spieker

Abstract: The future of automated driving (AD) is rooted in the development of robust, fair and explainable artificial intelligence methods. Upon request, automated vehicles must be able to explain their decisions to the driver and the car passengers, to the pedestrians and other vulnerable road users and potentially to external auditors in case of accidents. However, nowadays, most explainable methods stil… ▽ More The future of automated driving (AD) is rooted in the development of robust, fair and explainable artificial intelligence methods. Upon request, automated vehicles must be able to explain their decisions to the driver and the car passengers, to the pedestrians and other vulnerable road users and potentially to external auditors in case of accidents. However, nowadays, most explainable methods still rely on quantitative analysis of the AD scene representations captured by multiple sensors. This paper proposes a novel representation of AD scenes, called Qualitative eXplainable Graph (QXG), dedicated to qualitative spatiotemporal reasoning of long-term scenes. The construction of this graph exploits the recent Qualitative Constraint Acquisition paradigm. Our experimental results on NuScenes, an open real-world multi-modal dataset, show that the qualitative eXplainable graph of an AD scene composed of 40 frames can be computed in real-time and light in space storage which makes it a potentially interesting tool for improved and more trustworthy perception and control processes in AD. △ Less

Submitted 24 August, 2023; originally announced August 2023.

arXiv:2306.01529 [pdf, other]

doi 10.1007/978-3-031-40923-3_6

Constraint-Guided Test Execution Scheduling: An Experience Report at ABB Robotics

Authors: Arnaud Gotlieb, Morten Mossige, Helge Spieker

Abstract: Automated test execution scheduling is crucial in modern software development environments, where components are frequently updated with changes that impact their integration with hardware systems. Building test schedules, which focus on the right tests and make optimal use of the available resources, both time and hardware, under consideration of vast requirements on the selection of test cases a… ▽ More Automated test execution scheduling is crucial in modern software development environments, where components are frequently updated with changes that impact their integration with hardware systems. Building test schedules, which focus on the right tests and make optimal use of the available resources, both time and hardware, under consideration of vast requirements on the selection of test cases and their assignment to certain test execution machines, is a complex optimization task. Manual solutions are time-consuming and often error-prone. Furthermore, when software and hardware components and test scripts are frequently added, removed or updated, static test execution scheduling is no longer feasible and the motivation for automation taking care of dynamic changes grows. Since 2012, our work has focused on transferring technology based on constraint programming for automating the testing of industrial robotic systems at ABB Robotics. After having successfully transferred constraint satisfaction models dedicated to test case generation, we present the results of a project called DynTest whose goal is to automate the scheduling of test execution from a large test repository, on distinct industrial robots. This paper reports on our experience and lessons learned for successfully transferring constraint-based optimization models for test execution scheduling at ABB Robotics. Our experience underlines the benefits of a close collaboration between industry and academia for both parties. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: SafeComp 2023

arXiv:2208.12747 [pdf, ps, other]

Automatic Synthesis of Random Generators for Numerically Constrained Algebraic Recursive Types

Authors: Ghiles Ziat, Vincent Botbol, Matthieu Dien, Arnaud Gotlieb, Martin Pépin, Catherine Dubois

Abstract: In program verification, constraint-based random testing is a powerful technique which aims at generating random test cases that satisfy functional properties of a program. However, on recursive constrained data-structures (e.g., sorted lists, binary search trees, quadtrees), and, more generally, when the structures are highly constrained, generating uniformly distributed inputs is difficult. In t… ▽ More In program verification, constraint-based random testing is a powerful technique which aims at generating random test cases that satisfy functional properties of a program. However, on recursive constrained data-structures (e.g., sorted lists, binary search trees, quadtrees), and, more generally, when the structures are highly constrained, generating uniformly distributed inputs is difficult. In this paper, we present Testify: a framework in which users can define algebraic data-types decorated with high-level constraints. These constraints are interpreted as membership predicates that restrict the set of inhabitants of the type. From these definitions, Testify automatically synthesises a partial specification of the program so that no function produces a value that violates the constraints (e.g. a binary search tree where nodes are improperly inserted). Our framework augments the original program with tests that check such properties. To achieve that, we automatically produce uniform random samplers that generate values which satisfy the constraints, and verifies the validity of the outputs of the tested functions. By generating the shape of a recursive data-structure using Boltzmann sampling and generating evenly distributed finite domain variable values using constraint solving, our framework guarantees size-constrained uniform sampling of test cases. We provide use-cases of our framework on several key data structures that are of practical relevance for developers. Experiments show encouraging results. △ Less

Submitted 26 August, 2022; originally announced August 2022.

Comments: Paper presented at the 32nd International Symposium on Logic-Based Program Synthesis and Transformation (LOPSTR 2022), Tbilisi, Georgia, and Virtual, September 22-23, 2022 (arXiv:2208.04235)

Report number: LOPSTR/2022/15

arXiv:2205.00210 [pdf, ps, other]

doi 10.1609/aaai.v34i09.7084

Software Testing for Machine Learning

Authors: Dusica Marijan, Arnaud Gotlieb

Abstract: Machine learning has become prevalent across a wide variety of applications. Unfortunately, machine learning has also shown to be susceptible to deception, leading to errors, and even fatal failures. This circumstance calls into question the widespread use of machine learning, especially in safety-critical applications, unless we are able to assure its correctness and trustworthiness properties. S… ▽ More Machine learning has become prevalent across a wide variety of applications. Unfortunately, machine learning has also shown to be susceptible to deception, leading to errors, and even fatal failures. This circumstance calls into question the widespread use of machine learning, especially in safety-critical applications, unless we are able to assure its correctness and trustworthiness properties. Software verification and testing are established technique for assuring such properties, for example by detecting errors. However, software testing challenges for machine learning are vast and profuse - yet critical to address. This summary talk discusses the current state-of-the-art of software testing for machine learning. More specifically, it discusses six key challenge areas for software testing of machine learning systems, examines current approaches to these challenges and highlights their limitations. The paper provides a research agenda with elaborated directions for making progress toward advancing the state-of-the-art on testing of machine learning. △ Less

Submitted 30 April, 2022; originally announced May 2022.

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 34(09), 13576-13582 (2020)

arXiv:2204.11039 [pdf, other]

doi 10.1016/j.infsof.2020.106473

Industry-Academia Research Collaboration in Software Engineering: The Certus Model

Authors: Dusica Marijan, Arnaud Gotlieb

Abstract: Context: Research collaborations between software engineering industry and academia can provide significant benefits to both sides, including improved innovation capacity for industry, and real-world environment for motivating and validating research ideas. However, building scalable and effective research collaborations in software engineering is known to be challenging. While such challenges can… ▽ More Context: Research collaborations between software engineering industry and academia can provide significant benefits to both sides, including improved innovation capacity for industry, and real-world environment for motivating and validating research ideas. However, building scalable and effective research collaborations in software engineering is known to be challenging. While such challenges can be varied and many, in this paper we focus on the challenges of achieving participative knowledge creation supported by active dialog between industry and academia and continuous commitment to joint problem solving. Objective: This paper aims to understand what are the elements of a successful industry-academia collaboration that enable the culture of participative knowledge creation. Method: We conducted participant observation collecting qualitative data spanning 8 years of collaborative research between a software engineering research group on software V&V and the Norwegian IT sector. The collected data was analyzed and synthesized into a practical collaboration model, named the Certus Model. Results: The model is structured in seven phases, describing activities from setting up research projects to the exploitation of research results. As such, the Certus model advances other collaborations models from literature by delineating different phases covering the complete life cycle of participative research knowledge creation. Conclusion: The Certus model describes the elements of a research collaboration process between researchers and practitioners in software engineering, grounded on the principles of research knowledge co-creation and continuous commitment to joint problem solving. The model can be applied and tested in other contexts where it may be adapted to the local context through experimentation. △ Less

Submitted 23 April, 2022; originally announced April 2022.

Journal ref: Information and Software Technology, Volume 132, 2021, 106473, ISSN 0950-5849

arXiv:2202.12139 [pdf, other]

doi 10.1109/ICSTW55395.2022.00035

Testing Deep Learning Models: A First Comparative Study of Multiple Testing Techniques

Authors: Mohit Kumar Ahuja, Arnaud Gotlieb, Helge Spieker

Abstract: Deep Learning (DL) has revolutionized the capabilities of vision-based systems (VBS) in critical applications such as autonomous driving, robotic surgery, critical infrastructure surveillance, air and maritime traffic control, etc. By analyzing images, voice, videos, or any type of complex signals, DL has considerably increased the situation awareness of these systems. At the same time, while rely… ▽ More Deep Learning (DL) has revolutionized the capabilities of vision-based systems (VBS) in critical applications such as autonomous driving, robotic surgery, critical infrastructure surveillance, air and maritime traffic control, etc. By analyzing images, voice, videos, or any type of complex signals, DL has considerably increased the situation awareness of these systems. At the same time, while relying more and more on trained DL models, the reliability and robustness of VBS have been challenged and it has become crucial to test thoroughly these models to assess their capabilities and potential errors. To discover faults in DL models, existing software testing methods have been adapted and refined accordingly. In this article, we provide an overview of these software testing methods, namely differential, metamorphic, mutation, and combinatorial testing, as well as adversarial perturbation testing and review some challenges in their deployment for boosting perception systems used in VBS. We also provide a first experimental comparative study on a classical benchmark used in VBS and discuss its results. △ Less

Submitted 24 February, 2022; originally announced February 2022.

Comments: Artificial Intelligence in Software Testing 2022 workshop @ ICST 2022

Journal ref: Artificial Intelligence in Software Testing @ 2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

arXiv:2111.11871 [pdf, ps, other]

Solve Optimization Problems with Unknown Constraint Networks

Authors: Mohamed-Bachir Belaid, Arnaud Gotlieb, Nadjib Lazaar

Abstract: In most optimization problems, users have a clear understanding of the function to optimize (e.g., minimize the makespan for scheduling problems). However, the constraints may be difficult to state and their modelling often requires expertise in Constraint Programming. Active constraint acquisition has been successfully used to support non-experienced users in learning constraint networks through… ▽ More In most optimization problems, users have a clear understanding of the function to optimize (e.g., minimize the makespan for scheduling problems). However, the constraints may be difficult to state and their modelling often requires expertise in Constraint Programming. Active constraint acquisition has been successfully used to support non-experienced users in learning constraint networks through the generation of a sequence of queries. In this paper, we propose Learn&Optimize, a method to solve optimization problems with known objective function and unknown constraint network. It uses an active constraint acquisition algorithm which learns the unknown constraints and computes boundaries for the optimal solution during the learning process. As a result, our method allows users to solve optimization problems without learning the overall constraint network. △ Less

Submitted 23 November, 2021; originally announced November 2021.

arXiv:2111.03160 [pdf, other]

doi 10.3390/ai2040033

Predictive Machine Learning of Objective Boundaries for Solving COPs

Authors: Helge Spieker, Arnaud Gotlieb

Abstract: Solving Constraint Optimization Problems (COPs) can be dramatically simplified by boundary estimation, that is, providing tight boundaries of cost functions. By feeding a supervised Machine Learning (ML) model with data composed of known boundaries and extracted features of COPs, it is possible to train the model to estimate boundaries of a new COP instance. In this paper, we first give an overvie… ▽ More Solving Constraint Optimization Problems (COPs) can be dramatically simplified by boundary estimation, that is, providing tight boundaries of cost functions. By feeding a supervised Machine Learning (ML) model with data composed of known boundaries and extracted features of COPs, it is possible to train the model to estimate boundaries of a new COP instance. In this paper, we first give an overview of the existing body of knowledge on ML for Constraint Programming (CP) which learns from problem instances. Second, we introduce a boundary estimation framework that is applied as a tool to support a CP solver. Within this framework, different ML models are discussed and evaluated regarding their suitability for boundary estimation, and countermeasures to avoid unfeasible estimations that avoid the solver to find an optimal solution are shown. Third, we present an experimental study with distinct CP solvers on seven COPs. Our results show that near-optimal boundaries can be learned for these COPs with only little overhead. These estimated boundaries reduce the objective domain size by 60-88% and can help the solver to find near-optimal solutions early during search. △ Less

Submitted 4 November, 2021; originally announced November 2021.

Journal ref: AI 2021, 2, 527-551

arXiv:2007.07768 [pdf, other]

Opening the Software Engineering Toolbox for the Assessment of Trustworthy AI

Authors: Mohit Kumar Ahuja, Mohamed-Bachir Belaid, Pierre Bernabé, Mathieu Collet, Arnaud Gotlieb, Chhagan Lal, Dusica Marijan, Sagar Sen, Aizaz Sharif, Helge Spieker

Abstract: Trustworthiness is a central requirement for the acceptance and success of human-centered artificial intelligence (AI). To deem an AI system as trustworthy, it is crucial to assess its behaviour and characteristics against a gold standard of Trustworthy AI, consisting of guidelines, requirements, or only expectations. While AI systems are highly complex, their implementations are still based on so… ▽ More Trustworthiness is a central requirement for the acceptance and success of human-centered artificial intelligence (AI). To deem an AI system as trustworthy, it is crucial to assess its behaviour and characteristics against a gold standard of Trustworthy AI, consisting of guidelines, requirements, or only expectations. While AI systems are highly complex, their implementations are still based on software. The software engineering community has a long-established toolbox for the assessment of software systems, especially in the context of software testing. In this paper, we argue for the application of software engineering and testing practices for the assessment of trustworthy AI. We make the connection between the seven key requirements as defined by the European Commission's AI high-level expert group and established procedures from software engineering and raise questions for future work. △ Less

Submitted 30 August, 2020; v1 submitted 14 July, 2020; originally announced July 2020.

Comments: 1st International Workshop on New Foundations for Human-Centered AI @ ECAI 2020

arXiv:2006.11560 [pdf, ps, other]

doi 10.1007/978-3-030-64580-9_33

Learning Objective Boundaries for Constraint Optimization Problems

Authors: Helge Spieker, Arnaud Gotlieb

Abstract: Constraint Optimization Problems (COP) are often considered without sufficient knowledge on the boundaries of the objective variable to optimize. When available, tight boundaries are helpful to prune the search space or estimate problem characteristics. Finding close boundaries, that correctly under- and overestimate the optimum, is almost impossible without actually solving the COP. This paper in… ▽ More Constraint Optimization Problems (COP) are often considered without sufficient knowledge on the boundaries of the objective variable to optimize. When available, tight boundaries are helpful to prune the search space or estimate problem characteristics. Finding close boundaries, that correctly under- and overestimate the optimum, is almost impossible without actually solving the COP. This paper introduces Bion, a novel approach for boundary estimation by learning from previously solved instances of the COP. Based on supervised machine learning, Bion is problem-specific and solver-independent and can be applied to any COP which is repeatedly solved with different data inputs. An experimental evaluation over seven realistic COPs shows that an estimation model can be trained to prune the objective variables' domains by over 80%. By evaluating the estimated boundaries with various COP solvers, we find that Bion improves the solving process for some problems, although the effect of closer bounds is generally problem-dependent. △ Less

Submitted 20 June, 2020; originally announced June 2020.

Comments: The 6th International Conference on machine Learning, Optimization and Data science - LOD 2020

Journal ref: In: Nicosia G. et al. (eds) Machine Learning, Optimization, and Data Science. LOD 2020. Lecture Notes in Computer Science, vol 12566. Springer, Cham

arXiv:1910.00262 [pdf, other]

doi 10.1016/j.jss.2020.110574

Adaptive Metamorphic Testing with Contextual Bandits

Authors: Helge Spieker, Arnaud Gotlieb

Abstract: Metamorphic Testing is a software testing paradigm which aims at using necessary properties of a system-under-test, called metamorphic relations, to either check its expected outputs, or to generate new test cases. Metamorphic Testing has been successful to test programs for which a full oracle is not available or to test programs for which there are uncertainties on expected outputs such as learn… ▽ More Metamorphic Testing is a software testing paradigm which aims at using necessary properties of a system-under-test, called metamorphic relations, to either check its expected outputs, or to generate new test cases. Metamorphic Testing has been successful to test programs for which a full oracle is not available or to test programs for which there are uncertainties on expected outputs such as learning systems. In this article, we propose Adaptive Metamorphic Testing as a generalization of a simple yet powerful reinforcement learning technique, namely contextual bandits, to select one of the multiple metamorphic relations available for a program. By using contextual bandits, Adaptive Metamorphic Testing learns which metamorphic relations are likely to transform a source test case, such that it has higher chance to discover faults. We present experimental results over two major case studies in machine learning, namely image classification and object detection, and identify weaknesses and robustness boundaries. Adaptive Metamorphic Testing efficiently identifies weaknesses of the tested systems in context of the source test case. △ Less

Submitted 13 March, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

Journal ref: Journal of Systems and Software (JSS) Vol. 165 (2020) 110574

arXiv:1902.04627 [pdf, ps, other]

doi 10.1007/978-3-319-66158-2_25

Time-aware Test Case Execution Scheduling for Cyber-Physical Systems

Authors: Morten Mossige, Arnaud Gotlieb, Helge Spieker, Hein Meling, Mats Carlsson

Abstract: Testing cyber-physical systems involves the execution of test cases on target-machines equipped with the latest release of a software control system. When testing industrial robots, it is common that the target machines need to share some common resources, e.g., costly hardware devices, and so there is a need to schedule test case execution on the target machines, accounting for these shared resou… ▽ More Testing cyber-physical systems involves the execution of test cases on target-machines equipped with the latest release of a software control system. When testing industrial robots, it is common that the target machines need to share some common resources, e.g., costly hardware devices, and so there is a need to schedule test case execution on the target machines, accounting for these shared resources. With a large number of such tests executed on a regular basis, this scheduling becomes difficult to manage manually. In fact, with manual test execution planning and scheduling, some robots may remain unoccupied for long periods of time and some test cases may not be executed. This paper introduces TC-Sched, a time-aware method for automated test case execution scheduling. TC-Sched uses Constraint Programming to schedule tests to run on multiple machines constrained by the tests' access to shared resources, such as measurement or networking devices. The CP model is written in SICStus Prolog and uses the Cumulatives global constraint. Given a set of test cases, a set of machines, and a set of shared resources, TC-Sched produces an execution schedule where each test is executed once with minimal time between when a source code change is committed and the test results are reported to the developer. Experiments reveal that TC-Sched can schedule 500 test cases over 100 machines in less than 4 minutes for 99.5% of the instances. In addition, TC-Sched largely outperforms simpler methods based on a greedy algorithm and is suitable for deployment on industrial robot testing. △ Less

Submitted 12 February, 2019; originally announced February 2019.

Comments: Published in the 23rd International Conference on Principles and Practice of Constraint Programming (CP 2017)

Journal ref: In: Beck J. (eds) Principles and Practice of Constraint Programming. CP 2017. Lecture Notes in Computer Science, vol 10416. Springer, Cham

arXiv:1901.04169 [pdf, ps, other]

Towards Testing of Deep Learning Systems with Training Set Reduction

Authors: Helge Spieker, Arnaud Gotlieb

Abstract: Testing the implementation of deep learning systems and their training routines is crucial to maintain a reliable code base. Modern software development employs processes, such as Continuous Integration, in which changes to the software are frequently integrated and tested. However, testing the training routines requires running them and fully training a deep learning model can be resource-intensi… ▽ More Testing the implementation of deep learning systems and their training routines is crucial to maintain a reliable code base. Modern software development employs processes, such as Continuous Integration, in which changes to the software are frequently integrated and tested. However, testing the training routines requires running them and fully training a deep learning model can be resource-intensive, when using the full data set. Using only a subset of the training data can improve test run time, but can also reduce its effectiveness. We evaluate different ways for training set reduction and their ability to mimic the characteristics of model training with the original full data set. Our results underline the usefulness of training set reduction, especially in resource-constrained environments. △ Less

Submitted 14 January, 2019; originally announced January 2019.

arXiv:1811.04122 [pdf, other]

doi 10.1145/3092703.3092709

Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration

Authors: Helge Spieker, Arnaud Gotlieb, Dusica Marijan, Morten Mossige

Abstract: Testing in Continuous Integration (CI) involves test case prioritization, selection, and execution at each cycle. Selecting the most promising test cases to detect bugs is hard if there are uncertainties on the impact of committed code changes or, if traceability links between code and tests are not available. This paper introduces Retecs, a new method for automatically learning test case selectio… ▽ More Testing in Continuous Integration (CI) involves test case prioritization, selection, and execution at each cycle. Selecting the most promising test cases to detect bugs is hard if there are uncertainties on the impact of committed code changes or, if traceability links between code and tests are not available. This paper introduces Retecs, a new method for automatically learning test case selection and prioritization in CI with the goal to minimize the round-trip time between code commits and developer feedback on failed test cases. The Retecs method uses reinforcement learning to select and prioritize test cases according to their duration, previous last execution and failure history. In a constantly changing environment, where new test cases are created and obsolete test cases are deleted, the Retecs method learns to prioritize error-prone test cases higher under guidance of a reward function and by observing previous CI cycles. By applying Retecs on data extracted from three industrial case studies, we show for the first time that reinforcement learning enables fruitful automatic adaptive test case selection and prioritization in CI and regression testing. △ Less

Submitted 9 November, 2018; originally announced November 2018.

Comments: Spieker, H., Gotlieb, A., Marijan, D., & Mossige, M. (2017). Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration. In Proceedings of 26th International Symposium on Software Testing and Analysis (ISSTA'17) (pp. 12--22). ACM

arXiv:1811.03906 [pdf, ps, other]

doi 10.1142/S0218213020600064

ITE: A Lightweight Implementation of Stratified Reasoning for Constructive Logical Operators

Authors: Arnaud Gotlieb, Dusica Marijan, Helge Spieker

Abstract: Constraint Programming (CP) is a powerful declarative programming paradigm where inference and search are interleaved to find feasible and optimal solutions to various type of constraint systems. However, handling logical connectors with constructive information in CP is notoriously difficult. This paper presents If Then Else (ITE), a lightweight implementation of stratified constructive reasoning… ▽ More Constraint Programming (CP) is a powerful declarative programming paradigm where inference and search are interleaved to find feasible and optimal solutions to various type of constraint systems. However, handling logical connectors with constructive information in CP is notoriously difficult. This paper presents If Then Else (ITE), a lightweight implementation of stratified constructive reasoning for logical connectives. Stratification is introduced to cope with the risk of combinatorial explosion of constructing information from nested and combined logical operators. ITE is an open-source library built on top of SICStus Prolog clp(fd), which proposes various operators, including constructive disjunction and negation, constructive implication and conditional. These operators can be used to express global constraints and to benefit from constructive reasoning for more domain pruning during constraint filtering. Even though ITE is not competitive with specialized filtering algorithms available in some global constraints implementations, its expressiveness allows users to easily define well-tuned constraints with powerful deduction capabilities. Our extended experimental results show that ITE is more efficient than available generic approaches that handle logical constraint systems over finite domains. △ Less

Submitted 22 June, 2020; v1 submitted 9 November, 2018; originally announced November 2018.

Comments: Extended journal version

Journal ref: International Journal on Artificial Intelligence Tools, Vol. 29, No. 03n04, 2060006 (2020)

arXiv:1811.03496 [pdf, ps, other]

Multi-Cycle Assignment Problems with Rotational Diversity

Authors: Helge Spieker, Arnaud Gotlieb, Morten Mossige

Abstract: Multi-cycle assignment problems address scenarios where a series of general assignment problems has to be solved sequentially. Subsequent cycles can differ from previous ones due to changing availability or creation of tasks and agents, which makes an upfront static schedule infeasible and introduces uncertainty in the task-agent assignment process. We consider the setting where, besides profit ma… ▽ More Multi-cycle assignment problems address scenarios where a series of general assignment problems has to be solved sequentially. Subsequent cycles can differ from previous ones due to changing availability or creation of tasks and agents, which makes an upfront static schedule infeasible and introduces uncertainty in the task-agent assignment process. We consider the setting where, besides profit maximization, it is also desired to maintain diverse assignments for tasks and agents, such that all tasks have been assigned to all agents over subsequent cycles. This problem of multi-cycle assignment with rotational diversity is approached in two sub-problems: The outer problem which augments the original profit maximization objective with additional information about the state of rotational diversity while the inner problem solves the adjusted general assignment problem in a single execution of the model. We discuss strategies to augment the profit values and evaluate them experimentally. The method's efficacy is shown in three case studies: multi-cycle variants of the multiple knapsack and the multiple subset sum problems, and a real-world case study on the test case selection and assignment problem from the software engineering domain. △ Less

Submitted 19 December, 2019; v1 submitted 8 November, 2018; originally announced November 2018.

Comments: Extended journal version

arXiv:1502.04645 [pdf, other]

Synthesis of Attributed Feature Models From Product Descriptions: Foundations

Authors: Guillaume Bécan, Razieh Behjati, Arnaud Gotlieb, Mathieu Acher

Abstract: Feature modeling is a widely used formalism to characterize a set of products (also called configurations). As a manual elaboration is a long and arduous task, numerous techniques have been proposed to reverse engineer feature models from various kinds of artefacts. But none of them synthesize feature attributes (or constraints over attributes) despite the practical relevance of attributes for do… ▽ More Feature modeling is a widely used formalism to characterize a set of products (also called configurations). As a manual elaboration is a long and arduous task, numerous techniques have been proposed to reverse engineer feature models from various kinds of artefacts. But none of them synthesize feature attributes (or constraints over attributes) despite the practical relevance of attributes for documenting the different values across a range of products. In this report, we develop an algorithm for synthesizing attributed feature models given a set of product descriptions. We present sound, complete, and parametrizable techniques for computing all possible hierarchies, feature groups, placements of feature attributes, domain values, and constraints. We perform a complexity analysis w.r.t. number of features, attributes, configurations, and domain size. We also evaluate the scalability of our synthesis procedure using randomized configuration matrices. This report is a first step that aims to describe the foundations for synthesizing attributed feature models. △ Less

Submitted 16 February, 2015; originally announced February 2015.

arXiv:1312.0200 [pdf, other]

A Combined Approach for Constraints over Finite Domains and Arrays

Authors: Sébastien Bardin, Arnaud Gotlieb

Abstract: Arrays are ubiquitous in the context of software verification. However, effective reasoning over arrays is still rare in CP, as local reasoning is dramatically ill-conditioned for constraints over arrays. In this paper, we propose an approach combining both global symbolic reasoning and local consistency filtering in order to solve constraint systems involving arrays (with accesses, updates and si… ▽ More Arrays are ubiquitous in the context of software verification. However, effective reasoning over arrays is still rare in CP, as local reasoning is dramatically ill-conditioned for constraints over arrays. In this paper, we propose an approach combining both global symbolic reasoning and local consistency filtering in order to solve constraint systems involving arrays (with accesses, updates and size constraints) and finite-domain constraints over their elements and indexes. Our approach, named FDCC, is based on a combination of a congruence closure algorithm for the standard theory of arrays and a CP solver over finite domains. The tricky part of the work lies in the bi-directional communication mechanism between both solvers. We identify the significant information to share, and design ways to master the communication overhead. Experiments on random instances show that FDCC solves more formulas than any portfolio combination of the two solvers taken in isolation, while overhead is kept reasonable. △ Less

Submitted 1 December, 2013; originally announced December 2013.

ACM Class: I.2.3; F.3.1; F.4.1; D.2.4; D.2.5

arXiv:1308.3847 [pdf, ps, other]

Exploiting Binary Floating-Point Representations for Constraint Propagation: The Complete Unabridged Version

Authors: Roberto Bagnara, Matthieu Carlier, Roberta Gori, Arnaud Gotlieb

Abstract: Floating-point computations are quickly finding their way in the design of safety- and mission-critical systems, despite the fact that designing floating-point algorithms is significantly more difficult than designing integer algorithms. For this reason, verification and validation of floating-point computations is a hot research topic. An important verification technique, especially in some indus… ▽ More Floating-point computations are quickly finding their way in the design of safety- and mission-critical systems, despite the fact that designing floating-point algorithms is significantly more difficult than designing integer algorithms. For this reason, verification and validation of floating-point computations is a hot research topic. An important verification technique, especially in some industrial sectors, is testing. However, generating test data for floating-point intensive programs proved to be a challenging problem. Existing approaches usually resort to random or search-based test data generation, but without symbolic reasoning it is almost impossible to generate test inputs that execute complex paths controlled by floating-point computations. Moreover, as constraint solvers over the reals or the rationals do not natively support the handling of rounding errors, the need arises for efficient constraint solvers over floating-point domains. In this paper, we present and fully justify improved algorithms for the propagation of arithmetic IEEE 754 binary floating-point constraints. The key point of these algorithms is a generalization of an idea by B. Marre and C. Michel that exploits a property of the representation of floating-point numbers. △ Less

Submitted 31 July, 2015; v1 submitted 18 August, 2013; originally announced August 2013.

Comments: 51 pages, 3 figures, 1 table, 1 listing

ACM Class: D.2.4; D.2.5; I.2.2; F.3.1

arXiv:1302.3290 [pdf, ps, other]

doi 10.4204/EPTCS.107.4

Constraint-based reachability

Authors: Arnaud Gotlieb, Tristan Denmat, Nadjib Lazaar

Abstract: Iterative imperative programs can be considered as infinite-state systems computing over possibly unbounded domains. Studying reachability in these systems is challenging as it requires to deal with an infinite number of states with standard backward or forward exploration strategies. An approach that we call Constraint-based reachability, is proposed to address reachability problems by exploring… ▽ More Iterative imperative programs can be considered as infinite-state systems computing over possibly unbounded domains. Studying reachability in these systems is challenging as it requires to deal with an infinite number of states with standard backward or forward exploration strategies. An approach that we call Constraint-based reachability, is proposed to address reachability problems by exploring program states using a constraint model of the whole program. The keypoint of the approach is to interpret imperative constructions such as conditionals, loops, array and memory manipulations with the fundamental notion of constraint over a computational domain. By combining constraint filtering and abstraction techniques, Constraint-based reachability is able to solve reachability problems which are usually outside the scope of backward or forward exploration strategies. This paper proposes an interpretation of classical filtering consistencies used in Constraint Programming as abstract domain computations, and shows how this approach can be used to produce a constraint solver that efficiently generates solutions for reachability problems that are unsolvable by other approaches. △ Less

Submitted 13 February, 2013; originally announced February 2013.

Comments: In Proceedings Infinity 2012, arXiv:1302.3105

Journal ref: EPTCS 107, 2013, pp. 25-43

arXiv:1005.2882 [pdf, ps, other]

On Testing Constraint Programs

Authors: Nadjib Lazaar, Arnaud Gotlieb, Lebbah Yahia

Abstract: The success of several constraint-based modeling languages such as OPL, ZINC, or COMET, appeals for better software engineering practices, particularly in the testing phase. This paper introduces a testing framework enabling automated test case generation for constraint programming. We propose a general framework of constraint program development which supposes that a first declarative and simple… ▽ More The success of several constraint-based modeling languages such as OPL, ZINC, or COMET, appeals for better software engineering practices, particularly in the testing phase. This paper introduces a testing framework enabling automated test case generation for constraint programming. We propose a general framework of constraint program development which supposes that a first declarative and simple constraint model is available from the problem specifications analysis. Then, this model is refined using classical techniques such as constraint reformulation, surrogate and global constraint addition, or symmetry-breaking to form an improved constraint model that must be thoroughly tested before being used to address real-sized problems. We think that most of the faults are introduced in this refinement step and propose a process which takes the first declarative model as an oracle for detecting non-conformities. We derive practical test purposes from this process to generate automatically test data that exhibit non-conformities. We implemented this approach in a new tool called CPTEST that was used to automatically detect non-conformities on two classical benchmark programs, namely the Golomb rulers and the car-sequencing problem. △ Less

Submitted 17 May, 2010; originally announced May 2010.

Report number: RR-7291

arXiv:cs/0508108 [pdf, ps, other]

Proving or Disproving likely Invariants with Constraint Reasoning

Authors: Tristan Denmat, Arnaud Gotlieb, Mireille Ducasse

Abstract: A program invariant is a property that holds for every execution of the program. Recent work suggest to infer likely-only invariants, via dynamic analysis. A likely invariant is a property that holds for some executions but is not guaranteed to hold for all executions. In this paper, we present work in progress addressing the challenging problem of automatically verifying that likely invariants… ▽ More A program invariant is a property that holds for every execution of the program. Recent work suggest to infer likely-only invariants, via dynamic analysis. A likely invariant is a property that holds for some executions but is not guaranteed to hold for all executions. In this paper, we present work in progress addressing the challenging problem of automatically verifying that likely invariants are actual invariants. We propose a constraint-based reasoning approach that is able, unlike other approaches, to both prove or disprove likely invariants. In the latter case, our approach provides counter-examples. We illustrate the approach on a motivating example where automatically generated likely invariants are verified. △ Less

Submitted 24 August, 2005; originally announced August 2005.

Comments: In A. Serebrenik and S. Munoz-Hernandez (editors), Proceedings of the 15th Workshop on Logic-based methods in Programming Environments October 2005, Sitges. cs.PL/0508078

ACM Class: D.2.6

Showing 1–28 of 28 results for author: Gotlieb, A