Search | arXiv e-print repository

arXiv:2405.19976 [pdf, other]

Testing in the Evolving World of DL Systems:Insights from Python GitHub Projects

Authors: Qurban Ali, Oliviero Riganelli, Leonardo Mariani

Abstract: In the ever-evolving field of Deep Learning (DL), ensuring project quality and reliability remains a crucial challenge. This research investigates testing practices within DL projects in GitHub. It quantifies the adoption of testing methodologies, focusing on aspects like test automation, the types of tests (e.g., unit, integration, and system), test suite growth rate, and evolution of testing pra… ▽ More In the ever-evolving field of Deep Learning (DL), ensuring project quality and reliability remains a crucial challenge. This research investigates testing practices within DL projects in GitHub. It quantifies the adoption of testing methodologies, focusing on aspects like test automation, the types of tests (e.g., unit, integration, and system), test suite growth rate, and evolution of testing practices across different project versions. We analyze a subset of 300 carefully selected repositories based on quantitative and qualitative criteria. This study reports insights on the prevalence of testing practices in DL projects within the open-source community. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 11 pages, 3 figures, The 24th IEEE International Conference on Software Quality, Reliability, and Security (QRS) 2024

arXiv:2403.12703 [pdf, other]

ReProbe: An Architecture for Reconfigurable and Adaptive Probes

Authors: Federico Alessi, Alessandro Tundo, Marco Mobilio, Oliviero Riganelli, Leonardo Mariani

Abstract: Modern distributed systems are highly dynamic and scalable, requiring monitoring solutions that can adapt to rapid changes. Monitoring systems that rely on external probes can only achieve adaptation through expensive operations such as deployment, undeployment, and reconfiguration. This poster paper introduces ReProbes, a class of adaptive monitoring probes that can handle rapid changes in data c… ▽ More Modern distributed systems are highly dynamic and scalable, requiring monitoring solutions that can adapt to rapid changes. Monitoring systems that rely on external probes can only achieve adaptation through expensive operations such as deployment, undeployment, and reconfiguration. This poster paper introduces ReProbes, a class of adaptive monitoring probes that can handle rapid changes in data collection strategies. ReProbe offers controllable and configurable self-adaptive capabilities for data transmission, collection, and analysis methods. The resulting architecture can effectively enhance probe adaptability when qualitatively compared to state-of-the-art monitoring solutions. △ Less

Submitted 22 May, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

arXiv:2402.15257 [pdf, other]

doi 10.1145/3643655.3643876

Towards Model-Driven Dashboard Generation for Systems-of-Systems

Authors: Maria Teresa Rossi, Alessandro Tundo, Leonardo Mariani

Abstract: Configuring and evolving dashboards in complex and large-scale Systems-of-Systems (SoS) can be an expensive and cumbersome task due to the many Key Performance Indicators (KPIs) that are usually collected and have to be arranged in a number of visualizations. Unfortunately, setting up dashboards is still a largely manual and error-prone task requiring extensive human intervention. This short pap… ▽ More Configuring and evolving dashboards in complex and large-scale Systems-of-Systems (SoS) can be an expensive and cumbersome task due to the many Key Performance Indicators (KPIs) that are usually collected and have to be arranged in a number of visualizations. Unfortunately, setting up dashboards is still a largely manual and error-prone task requiring extensive human intervention. This short paper describes emerging results about the definition of a model-driven technology-agnostic approach that can automatically transform a simple list of KPIs into a dashboard model, and then translate the model into an actual dashboard for a target dashboard technology. Dashboard customization can be efficiently obtained by solely modifying the abstract model representation, freeing operators from expensive interactions with actual dashboards. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Journal ref: 2024 12th ACM/IEEE International Workshop on Software Engineering for Systems-of-Systems and Software Ecosystems

arXiv:2402.09022 [pdf, ps, other]

doi 10.1145/3639478.3643122

Assessing AI-Based Code Assistants in Method Generation Tasks

Authors: Vincenzo Corso, Leonardo Mariani, Daniela Micucci, Oliviero Riganelli

Abstract: AI-based code assistants are increasingly popular as a means to enhance productivity and improve code quality. This study compares four AI-based code assistants, GitHub Copilot, Tabnine, ChatGPT, and Google Bard, in method generation tasks, assessing their ability to produce accurate, correct, and efficient code. Results show that code assistants are useful, with complementary capabilities, althou… ▽ More AI-based code assistants are increasingly popular as a means to enhance productivity and improve code quality. This study compares four AI-based code assistants, GitHub Copilot, Tabnine, ChatGPT, and Google Bard, in method generation tasks, assessing their ability to produce accurate, correct, and efficient code. Results show that code assistants are useful, with complementary capabilities, although they rarely generate ready-to-use correct code. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Journal ref: In Proceedings of the 46th International Conference on Software Engineering (ICSE 2024)

arXiv:2402.08431 [pdf, other]

doi 10.1145/3643916.3644402

Generating Java Methods: An Empirical Assessment of Four AI-Based Code Assistants

Authors: Vincenzo Corso, Leonardo Mariani, Daniela Micucci, Oliviero Riganelli

Abstract: AI-based code assistants are promising tools that can facilitate and speed up code development. They exploit machine learning algorithms and natural language processing to interact with developers, suggesting code snippets (e.g., method implementations) that can be incorporated into projects. Recent studies empirically investigated the effectiveness of code assistants using simple exemplary proble… ▽ More AI-based code assistants are promising tools that can facilitate and speed up code development. They exploit machine learning algorithms and natural language processing to interact with developers, suggesting code snippets (e.g., method implementations) that can be incorporated into projects. Recent studies empirically investigated the effectiveness of code assistants using simple exemplary problems (e.g., the re-implementation of well-known algorithms), which fail to capture the spectrum and nature of the tasks actually faced by developers. In this paper, we expand the knowledge in the area by comparatively assessing four popular AI-based code assistants, namely GitHub Copilot, Tabnine, ChatGPT, and Google Bard, with a dataset of 100 methods that we constructed from real-life open-source Java projects, considering a variety of cases for complexity and dependency from contextual elements. Results show that Copilot is often more accurate than other techniques, yet none of the assistants is completely subsumed by the rest of the approaches. Interestingly, the effectiveness of these solutions dramatically decreases when dealing with dependencies outside the boundaries of single classes. △ Less

Submitted 14 February, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

Journal ref: Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension (ICPC 2024)

arXiv:2402.08430 [pdf, other]

doi 10.1145/3643916.3644409

Analyzing Prompt Influence on Automated Method Generation: An Empirical Study with Copilot

Authors: Ionut Daniel Fagadau, Leonardo Mariani, Daniela Micucci, Oliviero Riganelli

Abstract: Generative AI is changing the way developers interact with software systems, providing services that can produce and deliver new content, crafted to satisfy the actual needs of developers. For instance, developers can ask for new code directly from within their IDEs by writing natural language prompts, and integrated services based on generative AI, such as Copilot, immediately respond to prompts… ▽ More Generative AI is changing the way developers interact with software systems, providing services that can produce and deliver new content, crafted to satisfy the actual needs of developers. For instance, developers can ask for new code directly from within their IDEs by writing natural language prompts, and integrated services based on generative AI, such as Copilot, immediately respond to prompts by providing ready-to-use code snippets. Formulating the prompt appropriately, and incorporating the useful information while avoiding any information overload, can be an important factor in obtaining the right piece of code. The task of designing good prompts is known as prompt engineering. In this paper, we systematically investigate the influence of eight prompt features on the style and the content of prompts, on the level of correctness, complexity, size, and similarity to the developers' code of the generated code. We specifically consider the task of using Copilot with 124,800 prompts obtained by systematically combining the eight considered prompt features to generate the implementation of 200 Java methods. Results show how some prompt features, such as the presence of examples and the summary of the purpose of the method, can significantly influence the quality of the result. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Journal ref: Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension (ICPC 2024)

arXiv:2402.07460 [pdf, other]

Anonymizing Test Data in Android: Does It Hurt?

Authors: Elena Masserini, Davide Ginelli, Daniela Micucci, Daniela Briola, Leonardo Mariani

Abstract: Failure data collected from the field (e.g., failure traces, bug reports, and memory dumps) represent an invaluable source of information for developers who need to reproduce and analyze failures. Unfortunately, field data may include sensitive information and thus cannot be collected indiscriminately. Privacy-preserving techniques can address this problem anonymizing data and reducing the risk of… ▽ More Failure data collected from the field (e.g., failure traces, bug reports, and memory dumps) represent an invaluable source of information for developers who need to reproduce and analyze failures. Unfortunately, field data may include sensitive information and thus cannot be collected indiscriminately. Privacy-preserving techniques can address this problem anonymizing data and reducing the risk of disclosing personal information. However, collecting anonymized information may harm reproducibility, that is, the anonymized data may not allow the reproduction of a failure observed in the field. In this paper, we present an empirical investigation about the impact of privacy-preserving techniques on the reproducibility of failures. In particular, we study how five privacy-preserving techniques may impact reproducibilty for 19 bugs in 17 Android applications. Results provide insights on how to select and configure privacy-preserving techniques. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2401.10372 [pdf, other]

doi 10.1145/3639478.3640032

MutaBot: A Mutation Testing Approach for Chatbots

Authors: Michael Ferdinando Urrico, Diego Clerissi, Leonardo Mariani

Abstract: Mutation testing is a technique aimed at assessing the effectiveness of test suites by seeding artificial faults into programs. Although available for many platforms and languages, no mutation testing tool is currently available for conversational chatbots, which represent an increasingly popular solution to design systems that can interact with users through a natural language interface. Note tha… ▽ More Mutation testing is a technique aimed at assessing the effectiveness of test suites by seeding artificial faults into programs. Although available for many platforms and languages, no mutation testing tool is currently available for conversational chatbots, which represent an increasingly popular solution to design systems that can interact with users through a natural language interface. Note that since conversations must be explicitly engineered by the developers of conversational chatbots, these systems are exposed to specific types of faults not supported by existing mutation testing tools. In this paper, we present MutaBot, a mutation testing tool for conversational chatbots. MutaBot addresses mutations at multiple levels, including conversational flows, intents, and contexts. We designed the tool to potentially target multiple platforms, while we implemented initial support for Google Dialogflow chatbots. We assessed the tool with three Dialogflow chatbots and test cases generated with Botium, revealing weaknesses in the test suites. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: 5 pages, 2 figures, 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion '24)

arXiv:2309.11870 [pdf, other]

doi 10.1109/TSC.2022.3180500

Automated Probe Life-Cycle Management for Monitoring-as-a-Service

Authors: Alessandro Tundo, Marco Mobilio, Oliviero Riganelli, Leonardo Mariani

Abstract: Cloud services must be continuously monitored to guarantee that misbehaviors can be timely revealed, compensated, and fixed. While simple applications can be easily monitored and controlled, monitoring non-trivial cloud systems with dynamic behavior requires the operators to be able to rapidly adapt the set of collected indicators. Although the currently available monitoring frameworks are equippe… ▽ More Cloud services must be continuously monitored to guarantee that misbehaviors can be timely revealed, compensated, and fixed. While simple applications can be easily monitored and controlled, monitoring non-trivial cloud systems with dynamic behavior requires the operators to be able to rapidly adapt the set of collected indicators. Although the currently available monitoring frameworks are equipped with a rich set of probes to virtually collect any indicator, they do not provide the automation capabilities required to quickly and easily change (i.e., deploy and undeploy) the probes used to monitor a target system. Indeed, changing the collected indicators beyond standard platform-level indicators can be an error-prone and expensive process, which often requires manual intervention. This paper presents a Monitoring-as-a-Service framework that provides the capability to automatically deploy and undeploy arbitrary probes based on a user-provided set of indicators to be collected. The life-cycle of the probes is fully governed by the framework, including the detection and resolution of the erroneous states at deployment time. The framework can be used jointly with existing monitoring technologies, without requiring the adoption of a specific probing technology. We experimented our framework with cloud systems based on containers and virtual machines, obtaining evidence of the efficiency and effectiveness of the proposed solution. △ Less

Submitted 21 September, 2023; originally announced September 2023.

Journal ref: in IEEE Transactions on Services Computing, vol. 16, no. 2, pp. 969-982, 1 March-April 2023

arXiv:2309.02985 [pdf, other]

Supporting Early-Safety Analysis of IoT Systems by Exploiting Testing Techniques

Authors: Diego Clerissi, Juri Di Rocco, Davide Di Ruscio, Claudio Di Sipio, Felicien Ihirwe, Leonardo Mariani, Daniela Micucci, Maria Teresa Rossi, Riccardo Rubei

Abstract: IoT systems complexity and susceptibility to failures pose significant challenges in ensuring their reliable operation Failures can be internally generated or caused by external factors impacting both the systems correctness and its surrounding environment To investigate these complexities various modeling approaches have been proposed to raise the level of abstraction facilitating automation and… ▽ More IoT systems complexity and susceptibility to failures pose significant challenges in ensuring their reliable operation Failures can be internally generated or caused by external factors impacting both the systems correctness and its surrounding environment To investigate these complexities various modeling approaches have been proposed to raise the level of abstraction facilitating automation and analysis FailureLogic Analysis FLA is a technique that helps predict potential failure scenarios by defining how a components failure logic behaves and spreads throughout the system However manually specifying FLA rules can be arduous and errorprone leading to incomplete or inaccurate specifications In this paper we propose adopting testing methodologies to improve the completeness and correctness of these rules How failures may propagate within an IoT system can be observed by systematically injecting failures while running test cases to collect evidence useful to add complete and refine FLA rules △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2309.00022 [pdf, other]

An Energy-Aware Approach to Design Self-Adaptive AI-based Applications on the Edge

Authors: Alessandro Tundo, Marco Mobilio, Shashikant Ilager, Ivona Brandić, Ezio Bartocci, Leonardo Mariani

Abstract: The advent of edge devices dedicated to machine learning tasks enabled the execution of AI-based applications that efficiently process and classify the data acquired by the resource-constrained devices populating the Internet of Things. The proliferation of such applications (e.g., critical monitoring in smart cities) demands new strategies to make these systems also sustainable from an energetic… ▽ More The advent of edge devices dedicated to machine learning tasks enabled the execution of AI-based applications that efficiently process and classify the data acquired by the resource-constrained devices populating the Internet of Things. The proliferation of such applications (e.g., critical monitoring in smart cities) demands new strategies to make these systems also sustainable from an energetic point of view. In this paper, we present an energy-aware approach for the design and deployment of self-adaptive AI-based applications that can balance application objectives (e.g., accuracy in object detection and frames processing rate) with energy consumption. We address the problem of determining the set of configurations that can be used to self-adapt the system with a meta-heuristic search procedure that only needs a small number of empirical samples. The final set of configurations are selected using weighted gray relational analysis, and mapped to the operation modes of the self-adaptive application. We validate our approach on an AI-based application for pedestrian detection. Results show that our self-adaptive application can outperform non-adaptive baseline configurations by saving up to 81\% of energy while loosing only between 2% and 6% in accuracy. △ Less

Submitted 31 August, 2023; originally announced September 2023.

arXiv:2307.16185 [pdf, other]

Measuring Software Testability via Automatically Generated Test Cases

Authors: Luca Guglielmo, Leonardo Mariani, Giovanni Denaro

Abstract: Estimating software testability can crucially assist software managers to optimize test budgets and software quality. In this paper, we propose a new approach that radically differs from the traditional approach of pursuing testability measurements based on software metrics, e.g., the size of the code or the complexity of the designs. Our approach exploits automatic test generation and mutation an… ▽ More Estimating software testability can crucially assist software managers to optimize test budgets and software quality. In this paper, we propose a new approach that radically differs from the traditional approach of pursuing testability measurements based on software metrics, e.g., the size of the code or the complexity of the designs. Our approach exploits automatic test generation and mutation analysis to quantify the evidence about the relative hardness of develo** effective test cases. In the paper, we elaborate on the intuitions and the methodological choices that underlie our proposal for estimating testability, introduce a technique and a prototype that allows for concretely estimating testability accordingly, and discuss our findings out of a set of experiments in which we compare the performance of our estimations both against and in combination with traditional software metrics. The results show that our testability estimates capture a complementary dimension of testability that can be synergistically combined with approaches based on software metrics to improve the accuracy of predictions. △ Less

Submitted 30 July, 2023; originally announced July 2023.

Comments: 11 pages, 2 figures

arXiv:2301.13615 [pdf, other]

Property-Based Mutation Testing

Authors: Ezio Bartocci, Leonardo Mariani, Dejan Nickovic, Drishti Yadav

Abstract: Mutation testing is an established software quality assurance technique for the assessment of test suites. While it is well-suited to estimate the general fault-revealing capability of a test suite, it is not practical and informative when the software under test must be validated against specific requirements. This is often the case for embedded software, where the software is typically validated… ▽ More Mutation testing is an established software quality assurance technique for the assessment of test suites. While it is well-suited to estimate the general fault-revealing capability of a test suite, it is not practical and informative when the software under test must be validated against specific requirements. This is often the case for embedded software, where the software is typically validated against rigorously-specified safety properties. In such a scenario (i) a mutant is relevant only if it can impact the satisfaction of the tested properties, and (ii) a mutant is meaningfully-killed with respect to a property only if it causes the violation of that property. To address these limitations of mutation testing, we introduce property-based mutation testing, a method for assessing the capability of a test suite to exercise the software with respect to a given property. We evaluate our property-based mutation testing framework on Simulink models of safety-critical Cyber-Physical Systems (CPS) from the automotive and avionic domains and demonstrate how property-based mutation testing is more informative than regular mutation testing. These results open new perspectives in both mutation testing and test case generation of CPS. △ Less

Submitted 31 January, 2023; originally announced January 2023.

Comments: Accepted at the 16th IEEE International Conference on Software Testing, Verification and Validation (ICST) 2023

arXiv:2210.12155 [pdf, other]

Non-Functional Testing of Runtime Enforcers in Android

Authors: Oliviero Riganelli, Daniela Micucci, Leonardo Mariani

Abstract: Runtime enforcers can be used to ensure that running applications satisfy desired correctness properties. Although runtime enforcers that are correct-by-construction with respect to abstract behavioral models are relatively easy to specify, the concrete software enforcers generated from these specifications may easily introduce issues in the target application. Indeed developers can generate test… ▽ More Runtime enforcers can be used to ensure that running applications satisfy desired correctness properties. Although runtime enforcers that are correct-by-construction with respect to abstract behavioral models are relatively easy to specify, the concrete software enforcers generated from these specifications may easily introduce issues in the target application. Indeed developers can generate test suites to verify the functional behavior of the enforcers, for instance exploiting the same models used to specify them. However, it remains challenging and tedious to verify the behavior of enforcers in terms of non-functional performance characteristics. This paper describes a practical approach to reveal runtime enforcers that may introduce inefficiencies in the target application. The approach relies on a combination of automatic test generation and runtime monitoring of multiple key performance indicators. We designed our approach to reveal issues in four indicators for mobile systems: responsiveness, launch time, memory, and energy consumption. Experimental results show that our approach can detect performance issues that might be introduced by automatically generated enforcers. △ Less

Submitted 14 September, 2022; originally announced October 2022.

Comments: paper accepted at the 11th International Symposium On Leveraging Applications of Formal Methods, Verification and Validation (ISoLA 2022). arXiv admin note: text overlap with arXiv:2010.04258

arXiv:2208.10545 [pdf, other]

Information-theoretical analysis of the neural code for decoupled face representation

Authors: Miguel Ibáñez-Berganza, Carlo Lucibello, Luca Mariani, Giovanni Pezzulo

Abstract: Processing faces accurately and efficiently is a key capability of humans and other animals that engage in sophisticated social tasks. Recent studies reported a decoupled coding for faces in the primate inferotemporal cortex, with two separate neural populations coding for the geometric position of (texture-free) facial landmarks and for the image texture at fixed landmark positions, respectively.… ▽ More Processing faces accurately and efficiently is a key capability of humans and other animals that engage in sophisticated social tasks. Recent studies reported a decoupled coding for faces in the primate inferotemporal cortex, with two separate neural populations coding for the geometric position of (texture-free) facial landmarks and for the image texture at fixed landmark positions, respectively. Here, we formally assess the efficiency of this decoupled coding by appealing to the information-theoretic notion of description length, which quantifies the amount of information that is saved when encoding novel facial images, with a given precision. We show that despite decoupled coding describes the facial images in terms of two sets of principal components (of landmark shape and image texture), it is more efficient (i.e., yields more information compression) than the encoding in terms of the image principal components only, which corresponds to the widely used eigenface method. The advantage of decoupled coding over eigenface coding increases with image resolution and is especially prominent when coding variants of training set images that only differ in facial expressions. Moreover, we demonstrate that decoupled coding entails better performance in three different tasks: the representation of facial images, the (daydream) sampling of novel facial images, and the recognition of facial identities and gender. In summary, our study provides a first principle perspective on the efficiency and accuracy of the decoupled coding of facial stimuli reported in the primate inferotemporal cortex. △ Less

Submitted 18 January, 2023; v1 submitted 22 August, 2022; originally announced August 2022.

Comments: 26 pages, 8 figures (+11 pages, 7 figures in the supporting information section). In v3: new figure 8 in section 3.2.3; further details added to the supporting information; title changed

arXiv:2205.04142 [pdf, other]

doi 10.1145/3524844.3528055

Towards Self-Adaptive Peer-to-Peer Monitoring for Fog Environments

Authors: Vera Colombo, Alessandro Tundo, Michele Ciavotta, Leonardo Mariani

Abstract: Monitoring is a critical component in fog environments: it promptly provides insights about the behavior of systems, reveals Service Level Agreements (SLAs) violations, enables the autonomous orchestration of services and platforms, calls for the intervention of operators, and triggers self-healing actions. In such environments, monitoring solutions have to cope with the heterogeneity of the dev… ▽ More Monitoring is a critical component in fog environments: it promptly provides insights about the behavior of systems, reveals Service Level Agreements (SLAs) violations, enables the autonomous orchestration of services and platforms, calls for the intervention of operators, and triggers self-healing actions. In such environments, monitoring solutions have to cope with the heterogeneity of the devices and platforms present in the Fog, the limited resources available at the edge of the network, and the high dynamism of the whole Cloud-to-Thing continuum. This paper addresses the challenge of accurately and efficiently monitoring the Fog with a self-adaptive peer-to-peer (P2P) monitoring solution that can opportunistically adjust its behavior according to the collected data exploiting a lightweight rule-based expert system. Empirical results show that adaptation can improve monitoring accuracy, while reducing network and power consumption at the cost of higher memory consumption. △ Less

Submitted 9 May, 2022; originally announced May 2022.

arXiv:2202.11999 [pdf, other]

doi 10.1145/3510454.3516837

Proactive Libraries: Enforcing Correct Behaviors in Android Apps

Authors: Oliviero Riganelli, Ionut Daniel Fagadau, Daniela Micucci, Leonardo Mariani

Abstract: The Android framework provides a rich set of APIs that can be exploited by developers to build their apps. However, the rapid evolution of these APIs jointly with the specific characteristics of the lifecycle of the Android components challenge developers, who may release apps that use APIs incorrectly. In this demo, we present Proactive Libraries, a tool that can be used to decorate regular libra… ▽ More The Android framework provides a rich set of APIs that can be exploited by developers to build their apps. However, the rapid evolution of these APIs jointly with the specific characteristics of the lifecycle of the Android components challenge developers, who may release apps that use APIs incorrectly. In this demo, we present Proactive Libraries, a tool that can be used to decorate regular libraries with the capability of proactively detecting and healing API misuses at runtime. Proactive Libraries blend libraries with multiple proactive modules that collect data, check the compliance of API usages with correctness policies, and heal executions as soon as the possible violation of a policy is detected. The results of our evaluation with 27 possible API misuses show the effectiveness of Proactive Libraries in correcting API misuses with negligible runtime overhead. △ Less

Submitted 24 February, 2022; originally announced February 2022.

Comments: Accepted for publication in the Proceedings of the 44th International Conference on Software Engineering (ICSE 2022). arXiv admin note: substantial text overlap with arXiv:1911.09357, arXiv:1703.08005

arXiv:2201.00736 [pdf, other]

Exception-Driven Fault Localization for Automated Program Repair

Authors: Davide Ginelli, Oliviero Riganelli, Daniela Micucci, Leonardo Mariani

Abstract: Automated Program Repair (APR) techniques typically exploit spectrum-based fault localization (SBFL) to identify the program locations that should be patched, making the effectiveness of APR techniques dependent on the effectiveness of fault localization. Indeed, results show that SBFL often does not localize faults accurately, hindering the effectiveness of APR. In this paper, we propose EXCEPT,… ▽ More Automated Program Repair (APR) techniques typically exploit spectrum-based fault localization (SBFL) to identify the program locations that should be patched, making the effectiveness of APR techniques dependent on the effectiveness of fault localization. Indeed, results show that SBFL often does not localize faults accurately, hindering the effectiveness of APR. In this paper, we propose EXCEPT, a technique that addresses the localization problem by focusing on the semantics of failures rather than on the correlation between the executed statements and the failed tests, as SBFL does. We focus on failures due to exceptions and we exploit their type and source to localize and guess the faults. Experiments with 43 exception-raising faults from the Defects4J benchmark show that EXCEPT can perform better than Ochiai and ssFix. △ Less

Submitted 3 January, 2022; originally announced January 2022.

Comments: In Proc. of the IEEE International Conference on Software Quality, Reliability and Security (QRS 2021). For associated video presentation, see https://youtu.be/PulKnHk-kp4

arXiv:2110.03431 [pdf, other]

Cloud Failure Prediction with Hierarchical Temporal Memory: An Empirical Assessment

Authors: Oliviero Riganelli, Paolo Saltarel, Alessandro Tundo, Marco Mobilio, Leonardo Mariani

Abstract: Hierarchical Temporal Memory (HTM) is an unsupervised learning algorithm inspired by the features of the neocortex that can be used to continuously process stream data and detect anomalies, without requiring a large amount of data for training nor requiring labeled data. HTM is also able to continuously learn from samples, providing a model that is always up-to-date with respect to observations. T… ▽ More Hierarchical Temporal Memory (HTM) is an unsupervised learning algorithm inspired by the features of the neocortex that can be used to continuously process stream data and detect anomalies, without requiring a large amount of data for training nor requiring labeled data. HTM is also able to continuously learn from samples, providing a model that is always up-to-date with respect to observations. These characteristics make HTM particularly suitable for supporting online failure prediction in cloud systems, which are systems with a dynamically changing behavior that must be monitored to anticipate problems. This paper presents the first systematic study that assesses HTM in the context of failure prediction. The results that we obtained considering 72 configurations of HTM applied to 12 different types of faults introduced in the Clearwater cloud system show that HTM can help to predict failures with sufficient effectiveness (F-measure = 0.76), representing an interesting practical alternative to (semi-)supervised algorithms. △ Less

Submitted 15 December, 2021; v1 submitted 6 October, 2021; originally announced October 2021.

Comments: For associated video presentation, see https://youtu.be/Y6hTjEYAa-w , for associated slides, see https://www.slideshare.net/OlivieroRiganelli/cloud-failure-prediction-with-hierarchical-temporal-memory-an-empirical-assessment . In Proc. of the IEEE International Conference on Machine Learning and Applications (ICMLA 2021)

arXiv:2104.05233 [pdf, other]

An Evolutionary Approach to Adapt Tests Across Mobile Apps

Authors: Leonardo Mariani, Mauro Pezzè, Valerio Terragni, Daniele Zuddas

Abstract: Automatic generators of GUI tests often fail to generate semantically relevant test cases, and thus miss important test scenarios. To address this issue, test adaptation techniques can be used to automatically generate semantically meaningful GUI tests from test cases of applications with similar functionalities. In this paper, we present ADAPTDROID, a technique that approaches the test adaptation… ▽ More Automatic generators of GUI tests often fail to generate semantically relevant test cases, and thus miss important test scenarios. To address this issue, test adaptation techniques can be used to automatically generate semantically meaningful GUI tests from test cases of applications with similar functionalities. In this paper, we present ADAPTDROID, a technique that approaches the test adaptation problem as a search-problem, and uses evolutionary testing to adapt GUI tests (including oracles) across similar Android apps. In our evaluation with 32 popular Android apps, ADAPTDROID successfully adapted semantically relevant test cases in 11 out of 20 cross-app adaptation scenarios. △ Less

Submitted 12 April, 2021; originally announced April 2021.

arXiv:2103.00465 [pdf, other]

On Introducing Automatic Test Case Generation in Practice: A Success Story and Lessons Learned

Authors: Matteo Brunetto, Giovanni Denaro, Leonardo Mariani, Mauro Pezzè

Abstract: The level and quality of automation dramatically affects software testing activities, determines costs and effectiveness of the testing process, and largely impacts on the quality of the final product. While costs and benefits of automating many testing activities in industrial practice (including managing the quality process, executing large test suites, and managing regression test suites) are w… ▽ More The level and quality of automation dramatically affects software testing activities, determines costs and effectiveness of the testing process, and largely impacts on the quality of the final product. While costs and benefits of automating many testing activities in industrial practice (including managing the quality process, executing large test suites, and managing regression test suites) are well understood and documented, the benefits and obstacles of automatically generating system test suites in industrial practice are not well reported yet, despite the recent progresses of automated test case generation tools. Proprietary tools for automatically generating test cases are becoming common practice in large software organisations, and commercial tools are becoming available for some application domains and testing levels. However, generating system test cases in small and medium-size software companies is still largely a manual, inefficient and ad-hoc activity. This paper reports our experience in introducing techniques for automatically generating system test suites in a medium-size company. We describe the technical and organisational obstacles that we faced when introducing automatic test case generation in the development process of the company, and present the solutions that we successfully experienced in that context. In particular, the paper discusses the problems of automating the generation of test cases by referring to a customised ERP application that the medium-size company developed for a third party multinational company, and presents ABT2.0, the test case generator that we developed by tailoring ABT, a research state-of-the-art GUI test generator, to their industrial environment. This paper presents the new features of ABT2.0, and discusses how these new features address the issues that we faced. △ Less

Submitted 28 February, 2021; originally announced March 2021.

arXiv:2101.00274 [pdf, other]

Declarative Dashboard Generation

Authors: Alessandro Tundo, Chiara Castelnovo, Marco Mobilio, Oliviero Riganelli, Leonardo Mariani

Abstract: Systems of systems are highly dynamic software systems that require flexible monitoring solutions to be observed and controlled. Indeed, operators have to frequently adapt the set of collected indicators according to changing circumstances, to visualize the behavior of the monitored systems and timely take actions, if needed. Unfortunately, dashboard systems are still quite cumbersome to configure… ▽ More Systems of systems are highly dynamic software systems that require flexible monitoring solutions to be observed and controlled. Indeed, operators have to frequently adapt the set of collected indicators according to changing circumstances, to visualize the behavior of the monitored systems and timely take actions, if needed. Unfortunately, dashboard systems are still quite cumbersome to configure and adapt to a changing set of indicators that must be visualized. This paper reports our initial effort towards the definition of an automatic dashboard generation process that exploits metamodel layouts to create a full dashboard from a set of indicators selected by operators. △ Less

Submitted 1 January, 2021; originally announced January 2021.

Comments: A. Tundo, C. Castelnovo, M. Mobilio, O. Riganelli and L. Mariani. (2020). Declarative Dashboard Generation. In Proc. of the International Workshop on Governing Adaptive and Unplanned Systems of Systems Co-located with the 31st International Symposium on Software Reliability Engineering (ISSRE 2020)

arXiv:2012.15627 [pdf, other]

doi 10.1145/3324884.3415290

FILO: FIx-LOcus Localization for Backward Incompatibilities Caused by Android Framework Upgrades

Authors: Marco Mobilio, Oliviero Riganelli, Daniela Micucci, Leonardo Mariani

Abstract: Mobile operating systems evolve quickly, frequently updating the APIs that app developers use to build their apps. Unfortunately, API updates do not always guarantee backward compatibility, causing apps to not longer work properly or even crash when running with an updated system. This paper presents FILO, a tool that assists Android developers in resolving backward compatibility issues introduced… ▽ More Mobile operating systems evolve quickly, frequently updating the APIs that app developers use to build their apps. Unfortunately, API updates do not always guarantee backward compatibility, causing apps to not longer work properly or even crash when running with an updated system. This paper presents FILO, a tool that assists Android developers in resolving backward compatibility issues introduced by API upgrades. FILO both suggests the method that needs to be modified in the app in order to adapt the app to an upgraded API, and reports key symptoms observed in the failed execution to facilitate the fixing activity. Results obtained with the analysis of 12 actual upgrade problems and the feedback produced by early tool adopters show that FILO can practically support Android developers.FILO can be downloaded from https://gitlab.com/learnERC/filo, and its video demonstration is available at https://youtu.be/WDvkKj-wnlQ. △ Less

Submitted 31 December, 2020; originally announced December 2020.

Comments: for associated video, see http://myhost.domain/file.mpg and FILO source code can be downloaded from https://gitlab.com/learnERC/filo . In the International Conference on Automated Software Engineering (2020)

arXiv:2012.06264 [pdf, ps, other]

doi 10.1007/s10664-021-10100-7

A Comprehensive Study of Code-removal Patches in Automated Program Repair

Authors: Davide Ginelli, Matias Martinez, Leonardo Mariani, Martin Monperrus

Abstract: Automatic Program Repair (APR) techniques can promisingly help reducing the cost of debugging. Many relevant APR techniques follow the generate-and-validate approach, that is, the faulty program is iteratively modified with different change operators and then validated with a test suite until a plausible patch is generated. In particular, Kali is a generate-and-validate technique developed to inve… ▽ More Automatic Program Repair (APR) techniques can promisingly help reducing the cost of debugging. Many relevant APR techniques follow the generate-and-validate approach, that is, the faulty program is iteratively modified with different change operators and then validated with a test suite until a plausible patch is generated. In particular, Kali is a generate-and-validate technique developed to investigate the possibility of generating plausible patches by only removing code. Former studies show that indeed Kali successfully addressed several faults. This paper addresses the case of code-removal patches in automated program repair investigating the reasons and the scenarios that make their creation possible, and the relationship with patches implemented by developers. Our study reveals that code-removal patches are often insufficient to fix bugs, and proposes a comprehensive taxonomy of code-removal patches that provides evidence of the problems that may affect test suites, opening new opportunities for researchers in the field of automatic program repair. △ Less

Submitted 15 December, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

Comments: New version of the manuscript

Journal ref: Empirical Software Engineering, Springer, 2022

arXiv:2010.05584 [pdf, other]

doi 10.1145/3395363.3397379

Data Loss Detector: Automatically Revealing Data Loss Bugs in Android Apps

Authors: Oliviero Riganelli, Simone Paolo Mottadelli, Claudio Rota, Daniela Micucci, Leonardo Mariani

Abstract: Android apps must work correctly even if their execution is interrupted by external events. For instance, an app must work properly even if a phone call is received, or after its layout is redrawn because the smartphone has been rotated. Since these events may require destroying, when the execution is interrupted, and recreating, when the execution is resumed, the foreground activity of the app, t… ▽ More Android apps must work correctly even if their execution is interrupted by external events. For instance, an app must work properly even if a phone call is received, or after its layout is redrawn because the smartphone has been rotated. Since these events may require destroying, when the execution is interrupted, and recreating, when the execution is resumed, the foreground activity of the app, the only way to prevent the loss of state information is saving and restoring it. This behavior must be explicitly implemented by app developers, who often miss to implement it properly, releasing apps affected by data loss problems, that is, apps that may lose state information when their execution is interrupted. Although several techniques can be used to automatically generate test cases for Android apps, the obtained test cases seldom include the interactions and the checks necessary to exercise and reveal data loss faults. To address this problem, this paper presents Data Loss Detector (DLD), a test case generation technique that integrates an exploration strategy, data-loss-revealing actions, and two customized oracle strategies for the detection of data loss failures. DLD has been able to reveal 75% of the faults in a benchmark of 54 Android app releases affected by 110 known data loss faults. DLD also revealed unknown data loss problems, outperforming competing approaches. △ Less

Submitted 12 October, 2020; originally announced October 2020.

Comments: for associated video presentation, see https://youtu.be/s6XZ7F8L3nY for associated slides, see https://www.slideshare.net/OlivieroRiganelli/oliviero-riganelli-data-loss-detector-automatically-revealing-data-loss-bugs-in-android-apps . In Proc. of the International Symposium on Software Testing and Analysis (ISSTA 2020)

arXiv:2010.04258 [pdf, other]

doi 10.1007/978-3-030-60508-7_15

Test4Enforcers: Test Case Generation for Software Enforcers

Authors: Michell Guzman, Oliviero Riganelli, Daniela Micucci, Leonardo Mariani

Abstract: Software enforcers can be used to modify the runtime behavior of software applications to guarantee that relevant correctness policies are satisfied. Indeed, the implementation of software enforcers can be tricky, due to the heterogeneity of the situations that they must be able to handle. Assessing their ability to steer the behavior of the target system without introducing any side effect is an… ▽ More Software enforcers can be used to modify the runtime behavior of software applications to guarantee that relevant correctness policies are satisfied. Indeed, the implementation of software enforcers can be tricky, due to the heterogeneity of the situations that they must be able to handle. Assessing their ability to steer the behavior of the target system without introducing any side effect is an important challenge to fully trust the resulting system. To address this challenge, this paper presents Test4Enforcers, the first approach to derive thorough test suites that can validate the impact of enforcers on a target system. The paper also shows how to implement the Test4Enforcers approach in the DroidBot test generator to validate enforcers for Android apps. △ Less

Submitted 13 October, 2020; v1 submitted 8 October, 2020; originally announced October 2020.

Comments: For associated video presentation, see https://youtu.be/TcOTlo4ILmo. For associated slides, see https://www.slideshare.net/OlivieroRiganelli/test4enforcers-test-case-generation-for-software-enforcers

Journal ref: In: Runtime Verification. RV 2020. Lecture Notes in Computer Science, vol 12399. Springer, Cham

arXiv:2002.01872 [pdf, other]

CBR: Controlled Burst Recording

Authors: Oscar Cornejo, Daniela Briola, Daniela Micucci, Leonardo Mariani

Abstract: Collecting traces from software running in the field is both useful and challenging. Traces may indeed help revealing unexpected usage scenarios, detecting and reproducing failures, and building behavioral models that reflect how the software is actually used. On the other hand, recording traces is an intrusive activity that may annoy users, negatively affecting the usability of the applications,… ▽ More Collecting traces from software running in the field is both useful and challenging. Traces may indeed help revealing unexpected usage scenarios, detecting and reproducing failures, and building behavioral models that reflect how the software is actually used. On the other hand, recording traces is an intrusive activity that may annoy users, negatively affecting the usability of the applications, if not properly designed. In this paper we address field monitoring by introducing Controlled Burst Recording, a monitoring solution that can collect comprehensive runtime data without compromising the quality of the user experience. The technique encodes the knowledge extracted from the monitored application as a finite state model that both represents the sequences of operations that can be executed by the users and the corresponding internal computations that might be activated by each operation. Our initial assessment with information extracted from ArgoUML shows that Controlled Burst Recording can reconstruct behavioral information more effectively than competing sampling techniques, with a low impact on the system response time. △ Less

Submitted 8 February, 2020; v1 submitted 5 February, 2020; originally announced February 2020.

Comments: accepted at ICST2020 https://icst2020.info/

MSC Class: 68 ACM Class: D.2

arXiv:2002.01785 [pdf, other]

A Framework for In-Vivo Testing of Mobile Applications

Authors: Mariano Ceccato, Davide Corradini, Luca Gazzola, Fitsum Meshesha Kifetew, Leonardo Mariani, Matteo Orrù, Paolo Tonella

Abstract: The ecosystem in which mobile applications run is highly heterogeneous and configurable. All layers upon which mobile apps are built offer wide possibilities of variations, from the device and the hardware, to the operating system and middleware, up to the user preferences and settings. Testing all possible configurations exhaustively, before releasing the app, is unaffordable. As a consequence, t… ▽ More The ecosystem in which mobile applications run is highly heterogeneous and configurable. All layers upon which mobile apps are built offer wide possibilities of variations, from the device and the hardware, to the operating system and middleware, up to the user preferences and settings. Testing all possible configurations exhaustively, before releasing the app, is unaffordable. As a consequence, the app may exhibit different, including faulty, behaviours when executed in the field, under specific configurations. In this paper, we describe a framework that can be instantiated to support in-vivo testing of a mobile app. The framework monitors the configuration in the field and triggers in-vivo testing when an untested configuration is recognized. Experimental results show that the overhead introduced by monitoring is unnoticeable to negligible (i.e., 0-6%) depending on the device being used (high- vs. low-end). In-vivo test execution required on average 3s: if performed upon screen lock activation, it introduces just a slight delay before locking the device. △ Less

Submitted 5 February, 2020; originally announced February 2020.

Comments: Research paper accepted to ICST'20, 10+1 pages

arXiv:2001.07283 [pdf, other]

doi 10.1016/j.jss.2020.110523

In-The-Field Monitoring of Functional Calls: Is It Feasible?

Authors: Oscar Cornejo, Daniela Briola, Daniela Micucci, Leonardo Mariani

Abstract: Collecting data about the sequences of function calls executed by an application while running in the field can be useful to a number of applications, including failure reproduction, profiling, and debugging. Unfortunately, collecting data from the field may introduce annoying slowdowns that negatively affect the quality of the user experience. So far, the impact of monitoring has been mainly stud… ▽ More Collecting data about the sequences of function calls executed by an application while running in the field can be useful to a number of applications, including failure reproduction, profiling, and debugging. Unfortunately, collecting data from the field may introduce annoying slowdowns that negatively affect the quality of the user experience. So far, the impact of monitoring has been mainly studied in terms of the overhead that it may introduce in the monitored applications, rather than considering if the introduced overhead can be really recognized by users. In this paper we take a different perspective studying to what extent collecting data about sequences of function calls may impact the quality of the user experience, producing recognizable effects. Interestingly we found that, depending on the nature of the executed operation and its execution context, users may tolerate a non-trivial overhead. This information can be potentially exploited to collect significant amount of data without annoying users. △ Less

Submitted 20 January, 2020; originally announced January 2020.

MSC Class: 68 ACM Class: D.2

arXiv:1911.09561 [pdf, other]

Predicting Failures in Multi-Tier Distributed Systems

Authors: Leonardo Mariani, Mauro Pezzè, Oliviero Riganelli, Rui Xin

Abstract: Many applications are implemented as multi-tier software systems, and are executed on distributed infrastructures, like cloud infrastructures, to benefit from the cost reduction that derives from dynamically allocating resources on-demand. In these systems, failures are becoming the norm rather than the exception, and predicting their occurrence, as well as locating the responsible faults, are ess… ▽ More Many applications are implemented as multi-tier software systems, and are executed on distributed infrastructures, like cloud infrastructures, to benefit from the cost reduction that derives from dynamically allocating resources on-demand. In these systems, failures are becoming the norm rather than the exception, and predicting their occurrence, as well as locating the responsible faults, are essential enablers of preventive and corrective actions that can mitigate the impact of failures, and significantly improve the dependability of the systems. Current failure prediction approaches suffer either from false positives or limited accuracy, and do not produce enough information to effectively locate the responsible faults. In this paper, we present PreMiSE, a lightweight and precise approach to predict failures and locate the corresponding faults in multi-tier distributed systems. PreMiSE blends anomaly-based and signature-based techniques to identify multi-tier failures that impact on performance indicators, with high precision and low false positive rate. The experimental results that we obtained on a Cloud-based IP Multimedia Subsystem indicate that PreMiSE can indeed predict and locate possible failure occurrences with high precision and low overhead. △ Less

Submitted 21 November, 2019; originally announced November 2019.

Comments: Accepted for publication in Journal of Systems and Software

arXiv:1911.09388 [pdf, other]

FILO: FIx-LOcus Recommendation for Problems Caused by Android Framework Upgrade

Authors: Marco Mobilio, Oliviero Riganelli, Daniela Micucci, Leonardo Mariani

Abstract: Dealing with the evolution of operating systems is challenging for developers of mobile apps, who have to deal with frequent upgrades that often include backward incompatible changes of the underlying API framework. As a consequence of framework upgrades, apps may show misbehaviours and unexpected crashes once executed within an evolved environment. Identifying the portion of the app that must be… ▽ More Dealing with the evolution of operating systems is challenging for developers of mobile apps, who have to deal with frequent upgrades that often include backward incompatible changes of the underlying API framework. As a consequence of framework upgrades, apps may show misbehaviours and unexpected crashes once executed within an evolved environment. Identifying the portion of the app that must be modified to correctly execute on a newly released operating system can be challenging. Although incompatibilities are visibile at the level of the interactions between the app and its execution environment, the actual methods to be changed are often located in classes that do not directly interact with any external element. To facilitate debugging activities for problems introduced by backward incompatible upgrades of the operating system, this paper presents FILO, a technique that can recommend the method that must be changed to implement the fix from the analysis of a single failing execution. FILO can also select key symptomatic anomalous events that can help the developer understanding the reason of the failure and facilitate the implementation of the fix. Our evaluation with multiple known compatibility problems introduced by Android upgrades shows that FILO can effectively and efficiently identify the faulty methods in the apps. △ Less

Submitted 21 November, 2019; originally announced November 2019.

Comments: accepted for inclusion in Proceedings of the International Symposium on Software Reliability Engineering (ISSRE) 2019

arXiv:1911.09357 [pdf, other]

Controlling Interactions with Libraries in Android Apps Through Runtime Enforcement

Authors: Oliviero Riganelli, Daniela Micucci, Leonardo Mariani

Abstract: Android applications are executed on smartphones equipped with a variety of resources that must be properly accessed and controlled, otherwise the correctness of the executions and the stability of the entire environment might be negatively affected. For example, apps must properly acquire, use, and release microphones, cameras, and other multimedia devices otherwise the behavior of the apps that… ▽ More Android applications are executed on smartphones equipped with a variety of resources that must be properly accessed and controlled, otherwise the correctness of the executions and the stability of the entire environment might be negatively affected. For example, apps must properly acquire, use, and release microphones, cameras, and other multimedia devices otherwise the behavior of the apps that use the same resources might be compromised. Unfortunately, several apps do not use resources correctly, for instance due to faults and inaccurate design decisions. By interacting with these apps users may experience unexpected behaviors, which in turn may cause instability and sporadic failures, especially when resources are accessed. In this paper, we present an approach that lets users protect their environment from the apps that use resources improperly by enforcing the correct usage protocol. This is achieved by using software enforcers that can observe executions and change them when necessary. For instance, enforcers can detect that a resource has been acquired but not released, and automatically perform the release operation, thus giving the possibility to use that same resource to the other apps. The main idea is that software libraries, in particular the ones controlling access to resources, can be augmented with enforcers that can be activated and deactivated on demand by users to protect their environment from unwanted app behaviors. We call the software libraries augmented with one or more enforcers proactive libraries because the activation of the enforcer decorates the library with proactive behaviors that can guarantee the correctness of the execution despite the invocation of the operations implemented by the library. △ Less

Submitted 21 November, 2019; originally announced November 2019.

Comments: accepted for publication in ACM Transactions on Autonomous and Adaptive Systems (TAAS). Special Section on Best Papers from SEAMS 2017 (The conference paper is arXiv:1703.08005)

arXiv:1909.08378 [pdf, other]

Anomaly Detection As-a-Service

Authors: Marco Mobilio, Matteo Orrù, Oliviero Riganelli, Alessandro Tundo, Leonardo Mariani

Abstract: Cloud systems are complex, large, and dynamic systems whose behavior must be continuously analyzed to timely detect misbehaviors and failures. Although there are solutions to flexibly monitor cloud systems, cost-effectively controlling the anomaly detection logic is still a challenge. In particular, cloud operators may need to quickly change the types of detected anomalies and the scope of anomaly… ▽ More Cloud systems are complex, large, and dynamic systems whose behavior must be continuously analyzed to timely detect misbehaviors and failures. Although there are solutions to flexibly monitor cloud systems, cost-effectively controlling the anomaly detection logic is still a challenge. In particular, cloud operators may need to quickly change the types of detected anomalies and the scope of anomaly detection, for instance based on observations. This kind of intervention still consists of a largely manual and inefficient ad-hoc effort. In this paper, we present Anomaly Detection as-a-Service (ADaaS), which uses the same as-a-service paradigm often exploited in cloud systems to declarative control the anomaly detection logic. Operators can use ADaaS to specify the set of indicators that must be analyzed and the types of anomalies that must be detected, without having to address any operational aspect. Early results with lightweight detectors show that the presented approach is a promising solution to deliver better control of the anomaly detection logic. △ Less

Submitted 18 September, 2019; originally announced September 2019.

Comments: Paper accepted at the Intl. Workshop on Governing Adaptive and Unplanned Systems of Systems (GAUSS)

arXiv:1905.11040 [pdf, ps, other]

A Benchmark of Data Loss Bugs for Android Apps

Authors: Oliviero Riganelli, Marco Mobilio, Daniela Micucci, Leonardo Mariani

Abstract: Android apps must be able to deal with both stop events, which require immediately stop** the execution of the app without losing state information, and start events, which require resuming the execution of the app at the same point it was stopped. Support to these kinds of events must be explicitly implemented by developers who unfortunately often fail to implement the proper logic for saving a… ▽ More Android apps must be able to deal with both stop events, which require immediately stop** the execution of the app without losing state information, and start events, which require resuming the execution of the app at the same point it was stopped. Support to these kinds of events must be explicitly implemented by developers who unfortunately often fail to implement the proper logic for saving and restoring the state of an app. As a consequence apps can lose data when moved to background and then back to foreground (e.g., to answer a call) or when the screen is simply rotated. These faults can be the cause of annoying usability issues and unexpected crashes. This paper presents a public benchmark of 110 data loss faults in Android apps that we systematically collected to facilitate research and experimentation with these problems. The benchmark is available on GitLab and includes the faulty apps, the fixed apps (when available), the test cases to automatically reproduce the problems, and additional information that may help researchers in their tasks. △ Less

Submitted 27 May, 2019; originally announced May 2019.

arXiv:1903.12468 [pdf, other]

Automatic Failure Explanation in CPS Models

Authors: Ezio Bartocci, Niveditha Manjunath, Leonardo Mariani, Cristinel Mateis, Dejan Ničković

Abstract: Debugging Cyber-Physical System (CPS) models can be extremely complex. Indeed, only the detection of a failure is insuffcient to know how to correct a faulty model. Faults can propagate in time and in space producing observable misbehaviours in locations completely different from the location of the fault. Understanding the reason of an observed failure is typically a challenging and laborious tas… ▽ More Debugging Cyber-Physical System (CPS) models can be extremely complex. Indeed, only the detection of a failure is insuffcient to know how to correct a faulty model. Faults can propagate in time and in space producing observable misbehaviours in locations completely different from the location of the fault. Understanding the reason of an observed failure is typically a challenging and laborious task left to the experience and domain knowledge of the designer. \n In this paper, we propose CPSDebug, a novel approach that by combining testing, specification mining, and failure analysis, can automatically explain failures in Simulink/Stateflow models. We evaluate CPSDebug on two case studies, involving two use scenarios and several classes of faults, demonstrating the potential value of our approach. △ Less

Submitted 29 March, 2019; originally announced March 2019.

arXiv:1902.03776 [pdf, other]

COST Action IC 1402 ArVI: Runtime Verification Beyond Monitoring -- Activity Report of Working Group 1

Authors: Wolfgang Ahrendt, Cyrille Artho, Christian Colombo, Yliès Falcone, Srdan Krstic, Martin Leucker, Florian Lorber, Joao Lourenço, Leonardo Mariani, César Sánchez, Gerardo Schneider, Volker Stolz

Abstract: This report presents the activities of the first working group of the COST Action ArVI, Runtime Verification beyond Monitoring. The report aims to provide an overview of some of the major core aspects involved in Runtime Verification. Runtime Verification is the field of research dedicated to the analysis of system executions. It is often seen as a discipline that studies how a system run satisfie… ▽ More This report presents the activities of the first working group of the COST Action ArVI, Runtime Verification beyond Monitoring. The report aims to provide an overview of some of the major core aspects involved in Runtime Verification. Runtime Verification is the field of research dedicated to the analysis of system executions. It is often seen as a discipline that studies how a system run satisfies or violates correctness properties. The report exposes a taxonomy of Runtime Verification (RV) presenting the terminology involved with the main concepts of the field. The report also develops the concept of instrumentation, the various ways to instrument systems, and the fundamental role of instrumentation in designing an RV framework. We also discuss how RV interplays with other verification techniques such as model-checking, deductive verification, model learning, testing, and runtime assertion checking. Finally, we propose challenges in monitoring quantitative and statistical data beyond detecting property violation. △ Less

Submitted 11 February, 2019; originally announced February 2019.

arXiv:1810.04893 [pdf, other]

Increasing the Reusability of Enforcers with Lifecycle Events

Authors: Oliviero Riganelli, Daniela Micucci, Leonardo Mariani

Abstract: Runtime enforcement can be effectively used to improve the reliability of software applications. However, it often requires the definition of ad hoc policies and enforcement strategies, which might be expensive to identify and implement. This paper discusses how to exploit lifecycle events to obtain useful enforcement strategies that can be easily reused across applications, thus reducing the cost… ▽ More Runtime enforcement can be effectively used to improve the reliability of software applications. However, it often requires the definition of ad hoc policies and enforcement strategies, which might be expensive to identify and implement. This paper discusses how to exploit lifecycle events to obtain useful enforcement strategies that can be easily reused across applications, thus reducing the cost of adoption of the runtime enforcement technology. The paper finally sketches how this idea can be used to define libraries that can automatically overcome problems related to applications misusing them. △ Less

Submitted 11 October, 2018; originally announced October 2018.

Comments: International Symposium On Leveraging Applications of Formal Methods, Verification and Validation (ISoLA'18) [Invited Talk Paper]

arXiv:1807.07460 [pdf, ps, other]

Model-Based Monitoring for IoTs Smart Cities Applications

Authors: Matteo Orrù, Marco Mobilio, Anas Shatnawi, Oliviero Riganelli, Alessandro Tundo, Leonardo Mariani

Abstract: Smart Cities are future urban aggregations, where a multitude of heterogeneous systems and IoT devices interact to provide a safer, more efficient, and greener environment. The vision of smart cities is adapting accordingly to the evolution of software and IoT based services. The current trend is not to have a big comprehensive system, but a plethora of small, well integrated systems that interact… ▽ More Smart Cities are future urban aggregations, where a multitude of heterogeneous systems and IoT devices interact to provide a safer, more efficient, and greener environment. The vision of smart cities is adapting accordingly to the evolution of software and IoT based services. The current trend is not to have a big comprehensive system, but a plethora of small, well integrated systems that interact one with each other. Monitoring these kinds of systems is challenging for a number of reasons. △ Less

Submitted 19 July, 2018; originally announced July 2018.

arXiv:1803.05233 [pdf, other]

doi 10.1145/3194124.3194130

CloudHealth: A Model-Driven Approach to Watch the Health of Cloud Services

Authors: Anas Shatnawi, Matteo Orrù, Marco Mobilio, Oliviero Riganelli, Leonardo Mariani

Abstract: Cloud systems are complex and large systems where services provided by different operators must coexist and eventually cooperate. In such a complex environment, controlling the health of both the whole environment and the individual services is extremely important to timely and effectively react to misbehaviours, unexpected events, and failures. Although there are solutions to monitor cloud system… ▽ More Cloud systems are complex and large systems where services provided by different operators must coexist and eventually cooperate. In such a complex environment, controlling the health of both the whole environment and the individual services is extremely important to timely and effectively react to misbehaviours, unexpected events, and failures. Although there are solutions to monitor cloud systems at different granularity levels, how to relate the many KPIs that can be collected about the health of the system and how health information can be properly reported to operators are open questions. This paper reports the early results we achieved in the challenge of monitoring the health of cloud systems. In particular we present CloudHealth, a model-based health monitoring approach that can be used by operators to watch specific quality attributes. The CloudHealth Monitoring Model describes how to operationalize high level monitoring goals by dividing them into subgoals, deriving metrics for the subgoals, and using probes to collect the metrics. We use the CloudHealth Monitoring Model to control the probes that must be deployed on the target system, the KPIs that are dynamically collected, and the visualization of the data in dashboards. △ Less

Submitted 14 March, 2018; originally announced March 2018.

Comments: 8 pages, 2 figures, 1 table

arXiv:1803.00356 [pdf, other]

Localizing Faults in Cloud Systems

Authors: Leonardo Mariani, Cristina Monni, Mauro Pezzé, Oliviero Riganelli, Rui Xin

Abstract: By leveraging large clusters of commodity hardware, the Cloud offers great opportunities to optimize the operative costs of software systems, but impacts significantly on the reliability of software applications. The lack of control of applications over Cloud execution environments largely limits the applicability of state-of-the-art approaches that address reliability issues by relying on heavywe… ▽ More By leveraging large clusters of commodity hardware, the Cloud offers great opportunities to optimize the operative costs of software systems, but impacts significantly on the reliability of software applications. The lack of control of applications over Cloud execution environments largely limits the applicability of state-of-the-art approaches that address reliability issues by relying on heavyweight training with injected faults. In this paper, we propose \emph(LOUD}, a lightweight fault localization approach that relies on positive training only, and can thus operate within the constraints of Cloud systems. \emph{LOUD} relies on machine learning and graph theory. It trains machine learning models with correct executions only, and compensates the inaccuracy that derives from training with positive samples, by elaborating the outcome of machine learning techniques with graph theory algorithms. The experimental results reported in this paper confirm that \emph{LOUD} can localize faults with high precision, by relying only on a lightweight positive training. △ Less

Submitted 1 March, 2018; originally announced March 2018.

Comments: 12 pages, 8 figures, paper accepted at ICST 2018

arXiv:1708.09494 [pdf, other]

An Exploratory Study of Field Failures

Authors: Luca Gazzola, Leonardo Mariani, Fabrizio Pastore, Mauro Pezz`e

Abstract: Field failures, that is, failures caused by faults that escape the testing phase leading to failures in the field, are unavoidable. Improving verification and validation activities before deployment can identify and timely remove many but not all faults, and users may still experience a number of annoying problems while using their software systems. This paper investigates the nature of field fail… ▽ More Field failures, that is, failures caused by faults that escape the testing phase leading to failures in the field, are unavoidable. Improving verification and validation activities before deployment can identify and timely remove many but not all faults, and users may still experience a number of annoying problems while using their software systems. This paper investigates the nature of field failures, to understand to what extent further improving in-house verification and validation activities can reduce the number of failures in the field, and frames the need of new approaches that operate in the field. We report the results of the analysis of the bug reports of five applications belonging to three different ecosystems, propose a taxonomy of field failures, and discuss the reasons why failures belonging to the identified classes cannot be detected at design time but shall be addressed at runtime. We observe that many faults (70%) are intrinsically hard to detect at design-time. △ Less

Submitted 30 August, 2017; originally announced August 2017.

arXiv:1708.07232 [pdf, other]

doi 10.4204/EPTCS.254.5

Fragmented Monitoring

Authors: Oscar Cornejo, Daniela Briola, Daniela Micucci, Leonardo Mariani

Abstract: Field data is an invaluable source of information for testers and developers because it witnesses how software systems operate in real environments, capturing scenarios and configurations relevant to end-users. Unfortunately, collecting traces might be resource-consuming and can significantly affect the user experience, for instance causing annoying slowdowns. Existing monitoring techniques ca… ▽ More Field data is an invaluable source of information for testers and developers because it witnesses how software systems operate in real environments, capturing scenarios and configurations relevant to end-users. Unfortunately, collecting traces might be resource-consuming and can significantly affect the user experience, for instance causing annoying slowdowns. Existing monitoring techniques can control the overhead introduced in the applications by reducing the amount of collected data, for instance by collecting each event only with a given probability. However, collecting fewer events limits the amount of information extracted from the field and may fail in providing a comprehensive picture of the behavior of a program. In this paper we present fragmented monitoring, a monitoring technique that addresses the issue of collecting information from the field without annoying users. The key idea of fragmented monitoring is to reduce the overhead by recording partial traces (fragments) instead of full traces, while annotating the beginning and the end of each fragment with state information. These annotations are exploited offline to derive traces that might be likely observed in the field and that could not be collected directly due to the overhead that would be introduced in a program. △ Less

Submitted 23 August, 2017; originally announced August 2017.

Comments: In Proceedings PrePost 2017, arXiv:1708.06889

ACM Class: D.2.5

Journal ref: EPTCS 254, 2017, pp. 57-68

arXiv:1708.02052 [pdf, other]

doi 10.1145/3106237.3122819

VART: A Tool for the Automatic Detection of Regression Faults

Authors: Fabrizio Pastore, Leonardo Mariani

Abstract: In this paper we present VART, a tool for automatically revealing regression faults missed by regression test suites. Interestingly, VART is not limited to faults causing crashing or exceptions, but can reveal faults that cause the violation of application-specific correctness properties. VART achieves this goal by combining static and dynamic program analysis. In this paper we present VART, a tool for automatically revealing regression faults missed by regression test suites. Interestingly, VART is not limited to faults causing crashing or exceptions, but can reveal faults that cause the violation of application-specific correctness properties. VART achieves this goal by combining static and dynamic program analysis. △ Less

Submitted 7 August, 2017; originally announced August 2017.

arXiv:1708.01650 [pdf, other]

BDCI: Behavioral Driven Conflict Identification

Authors: Fabrizio Pastore, Leonardo Mariani, Daniela Micucci

Abstract: Source Code Management (SCM) systems support software evolution by providing features, such as version control, branching, and conflict detection. Despite the presence of these features, support to parallel software development is often limited. SCM systems can only address a subset of the conflicts that might be introduced by developers when concurrently working on multiple parallel branches. In… ▽ More Source Code Management (SCM) systems support software evolution by providing features, such as version control, branching, and conflict detection. Despite the presence of these features, support to parallel software development is often limited. SCM systems can only address a subset of the conflicts that might be introduced by developers when concurrently working on multiple parallel branches. In fact, SCM systems can detect textual conflicts, which are generated by the concurrent modification of the same program locations, but they are unable to detect higher-order conflicts, which are generated by the concurrent modification of different program locations that generate program misbehaviors once merged. Higher-order conflicts are painful to detect and expensive to fix because they might be originated by the interference of apparently unrelated changes. In this paper we present Behavioral Driven Conflict Identification (BDCI), a novel approach to conflict detection. BDCI moves the analysis of conflicts from the source code level to the level of program behavior by generating and comparing behavioral models. The analysis based on behavioral models can reveal interfering changes as soon as they are introduced in the SCM system, even if they do not introduce any textual conflict. To evaluate the effectiveness and the cost of the proposed approach, we developed BDCIf , a specific instance of BDCI dedicated to the detection of higher-order conflicts related to the functional behavior of a program. The evidence collected by analyzing multiple versions of Git and Redis suggests that BDCIf can effectively detect higher-order conflicts and report how changes might interfere. △ Less

Submitted 4 August, 2017; originally announced August 2017.

arXiv:1707.07473 [pdf, other]

Verifying Policy Enforcers

Authors: Oliviero Riganelli, Daniela Micucci, Leonardo Mariani, Yliès Falcone

Abstract: Policy enforcers are sophisticated runtime components that can prevent failures by enforcing the correct behavior of the software. While a single enforcer can be easily designed focusing only on the behavior of the application that must be monitored, the effect of multiple enforcers that enforce different policies might be hard to predict. So far, mechanisms to resolve interferences between enforc… ▽ More Policy enforcers are sophisticated runtime components that can prevent failures by enforcing the correct behavior of the software. While a single enforcer can be easily designed focusing only on the behavior of the application that must be monitored, the effect of multiple enforcers that enforce different policies might be hard to predict. So far, mechanisms to resolve interferences between enforcers have been based on priority mechanisms and heuristics. Although these methods provide a mechanism to take decisions when multiple enforcers try to affect the execution at a same time, they do not guarantee the lack of interference on the global behavior of the system. In this paper we present a verification strategy that can be exploited to discover interferences between sets of enforcers and thus safely identify a-priori the enforcers that can co-exist at run-time. In our evaluation, we experimented our verification method with several policy enforcers for Android and discovered some incompatibilities. △ Less

Submitted 26 July, 2017; v1 submitted 24 July, 2017; originally announced July 2017.

Comments: Oliviero Riganelli, Daniela Micucci, Leonardo Mariani, and Yliès Falcone. Verifying Policy Enforcers. Proceedings of 17th International Conference on Runtime Verification (RV), 2017. (to appear)

arXiv:1705.08418 [pdf, other]

doi 10.1007/978-3-319-47169-3

Dynamic Analysis of Regression Problems in Industrial Systems: Challenges and Solutions

Authors: Fabrizio Pastore, Leonardo Mariani

Abstract: This paper presents the result of our experience with the ap- plication of runtime verification, testing and static analysis techniques to several industrial projects. We discuss the eight most relevant challenges that we experienced, and the strategies that we elaborated to face them. This paper presents the result of our experience with the ap- plication of runtime verification, testing and static analysis techniques to several industrial projects. We discuss the eight most relevant challenges that we experienced, and the strategies that we elaborated to face them. △ Less

Submitted 23 May, 2017; originally announced May 2017.

arXiv:1705.08399 [pdf, other]

Timed k-Tail: Automatic Inference of Timed Automata

Authors: Fabrizio Pastore, Daniela Micucci, Leonardo Mariani

Abstract: Accurate and up-to-date models describing the be- havior of software systems are seldom available in practice. To address this issue, software engineers may use specification mining techniques, which can automatically derive models that capture the behavior of the system under analysis. So far, most specification mining techniques focused on the functional behavior of the systems, with specific em… ▽ More Accurate and up-to-date models describing the be- havior of software systems are seldom available in practice. To address this issue, software engineers may use specification mining techniques, which can automatically derive models that capture the behavior of the system under analysis. So far, most specification mining techniques focused on the functional behavior of the systems, with specific emphasis on models that represent the ordering of operations, such as tempo- ral rules and finite state models. Although useful, these models are inherently partial. For instance, they miss the timing behavior, which is extremely relevant for many classes of systems and com- ponents, such as shared libraries and user-driven applications. Mining specifications that include both the functional and the timing aspects can improve the applicability of many testing and analysis solutions. This paper addresses this challenge by presenting the Timed k-Tail (TkT) specification mining technique that can mine timed automata from program traces. Since timed automata can effectively represent the interplay between the functional and the timing behavior of a system, TkT could be exploited in those contexts where time-related information is relevant. Our empirical evaluation shows that TkT can efficiently and effectively mine accurate models. The mined models have been used to identify executions with anomalous timing. The evaluation shows that most of the anomalous executions have been correctly identified while producing few false positives. △ Less

Submitted 23 May, 2017; originally announced May 2017.

arXiv:1705.06511 [pdf, other]

In The Field Monitoring of Interactive Applications

Authors: Oscar Cornejo, Daniela Briola, Daniela Micucci, Leonardo Mariani

Abstract: Monitoring techniques can extract accurate data about the behavior of software systems. When used in the field, they can reveal how applications behave in real-world contexts and how programs are actually exercised by their users. Nevertheless, since monitoring might need significant storage and computational resources, it may interfere with users activities degrading the quality of the user exper… ▽ More Monitoring techniques can extract accurate data about the behavior of software systems. When used in the field, they can reveal how applications behave in real-world contexts and how programs are actually exercised by their users. Nevertheless, since monitoring might need significant storage and computational resources, it may interfere with users activities degrading the quality of the user experience. While the impact of monitoring has been typically studied by measuring the overhead that it may introduce in a monitored application, there is little knowledge about how monitoring solutions may actually impact on the user experience and to what extent users may recognize their presence. In this paper, we present our investigation on how collecting data in the field may impact the quality of the user experience. Our initial results show that non-trivial overhead can be tolerated by users, depending on the kind of activity that is performed. This opens interesting opportunities for research in monitoring solutions, which could be designed to opportunistically △ Less

Submitted 18 May, 2017; originally announced May 2017.

arXiv:1703.08005 [pdf, other]

doi 10.1109/SEAMS.2017.9

Policy Enforcement with Proactive Libraries

Authors: Oliviero Riganelli, Daniela Micucci, Leonardo Mariani

Abstract: Software libraries implement APIs that deliver reusable functionalities. To correctly use these functionalities, software applications must satisfy certain correctness policies, for instance policies about the order some API methods can be invoked and about the values that can be used for the parameters. If these policies are violated, applications may produce misbehaviors and failures at runtime.… ▽ More Software libraries implement APIs that deliver reusable functionalities. To correctly use these functionalities, software applications must satisfy certain correctness policies, for instance policies about the order some API methods can be invoked and about the values that can be used for the parameters. If these policies are violated, applications may produce misbehaviors and failures at runtime. Although this problem is general, applications that incorrectly use API methods are more frequent in certain contexts. For instance, Android provides a rich and rapidly evolving set of APIs that might be used incorrectly by app developers who often implement and publish faulty apps in the marketplaces. To mitigate this problem, we introduce the novel notion of proactive library, which augments classic libraries with the capability of proactively detecting and healing misuses at run- time. Proactive libraries blend libraries with multiple proactive modules that collect data, check the correctness policies of the libraries, and heal executions as soon as the violation of a correctness policy is detected. The proactive modules can be activated or deactivated at runtime by the users and can be implemented without requiring any change to the original library and any knowledge about the applications that may use the library. We evaluated proactive libraries in the context of the Android ecosystem. Results show that proactive libraries can automati- cally overcome several problems related to bad resource usage at the cost of a small overhead. △ Less

Submitted 26 May, 2017; v1 submitted 23 March, 2017; originally announced March 2017.

Comments: O. Riganelli, D. Micucci and L. Mariani, "Policy Enforcement with Proactive Libraries" 2017 IEEE/ACM 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS), Buenos Aires, Argentina, 2017, pp. 182-192

arXiv:1701.05467 [pdf, other]

doi 10.1109/ISSREW.2016.50

Healing Data Loss Problems in Android Apps

Authors: Oliviero Riganelli, Daniela Micucci, Leonardo Mariani

Abstract: Android apps should be designed to cope with stop-start events, which are the events that require stop** and restoring the execution of an app while leaving its state unaltered. These events can be caused by run-time configuration changes, such as a screen rotation, and by context-switches, such as a switch from one app to another. When a stop-start event occurs, Android saves the state of the a… ▽ More Android apps should be designed to cope with stop-start events, which are the events that require stop** and restoring the execution of an app while leaving its state unaltered. These events can be caused by run-time configuration changes, such as a screen rotation, and by context-switches, such as a switch from one app to another. When a stop-start event occurs, Android saves the state of the app, handles the event, and finally restores the saved state. To let Android save and restore the state correctly, apps must provide the appropriate support. Unfortunately, Android developers often implement this support incorrectly, or do not implement it at all. This bad practice makes apps to incorrectly react to stop-start events, thus generating what we defined data loss problems, that is Android apps that lose user data, behave unexpectedly, and crash due to program variables that lost their values. Data loss problems are difficult to detect because they might be observed only when apps are in specific states and with specific inputs. Covering all the possible cases with testing may require a large number of test cases whose execution must be checked manually to discover whether the app under test has been correctly restored after each stop-start event. It is thus important to complement traditional in-house testing activities with mechanisms that can protect apps as soon as a data loss problem occurs in the field. In this paper we present DataLossHealer, a technique for automatically identifying and healing data loss problems in the field as soon as they occur. DataLossHealer is a technique that checks at run-time whether states are recovered correctly, and heals the app when needed. DataLossHealer can learn from experience, incrementally reducing the overhead that is introduced avoiding to monitor interactions that have been managed correctly by the app in the past. △ Less

Submitted 19 January, 2017; originally announced January 2017.

Comments: IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), 2016

Showing 1–50 of 50 results for author: Mariani, L