-
Improving web element localization by using a large language model
Authors:
Michel Nass,
Emil Alegroth,
Robert Feldt
Abstract:
Web-based test automation heavily relies on accurately finding web elements. Traditional methods compare attributes but don't grasp the context and meaning of elements and words. The emergence of Large Language Models (LLMs) like GPT-4, which can show human-like reasoning abilities on some tasks, offers new opportunities for software engineering and web element localization. This paper introduces…
▽ More
Web-based test automation heavily relies on accurately finding web elements. Traditional methods compare attributes but don't grasp the context and meaning of elements and words. The emergence of Large Language Models (LLMs) like GPT-4, which can show human-like reasoning abilities on some tasks, offers new opportunities for software engineering and web element localization. This paper introduces and evaluates VON Similo LLM, an enhanced web element localization approach. Using an LLM, it selects the most likely web element from the top-ranked ones identified by the existing VON Similo method, ideally aiming to get closer to human-like selection accuracy. An experimental study was conducted using 804 web element pairs from 48 real-world web applications. We measured the number of correctly identified elements as well as the execution times, comparing the effectiveness and efficiency of VON Similo LLM against the baseline algorithm. In addition, motivations from the LLM were recorded and analyzed for all instances where the original approach failed to find the right web element. VON Similo LLM demonstrated improved performance, reducing failed localizations from 70 to 39 (out of 804), a 44 percent reduction. Despite its slower execution time and additional costs of using the GPT-4 model, the LLMs human-like reasoning showed promise in enhancing web element localization. LLM technology can enhance web element identification in GUI test automation, reducing false positives and potentially lowering maintenance costs. However, further research is necessary to fully understand LLMs capabilities, limitations, and practical use in GUI testing.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
Robust web element identification for evolving applications by considering visual overlaps
Authors:
Michel Nass,
Riccardo Coppola,
Emil Alégroth,
Robert Feldt
Abstract:
Fragile (i.e., non-robust) test execution is a common challenge for automated GUI-based testing of web applications as they evolve. Despite recent progress, there is still room for improvement since test execution failures caused by technical limitations result in unnecessary maintenance costs that limit its effectiveness and efficiency. One of the most reported technical challenges for web-based…
▽ More
Fragile (i.e., non-robust) test execution is a common challenge for automated GUI-based testing of web applications as they evolve. Despite recent progress, there is still room for improvement since test execution failures caused by technical limitations result in unnecessary maintenance costs that limit its effectiveness and efficiency. One of the most reported technical challenges for web-based tests concerns how to reliably locate a web element used by a test script. This paper proposes the novel concept of Visually Overlap** Nodes (VON) that reduces fragility by utilizing the phenomenon that visual web elements (observed by the user) are constructed from multiple web-elements in the Document Object Model (DOM) that overlaps visually. We demonstrate the approach in a tool, VON Similo, which extends the state-of-the-art multi-locator approach (Similo) that is also used as the baseline for an experiment. In the experiment, a ground truth set of 1163 manually collected web element pairs, from different releases of the 40 most popular websites on the internet, are used to compare the approaches' precision, recall, and accuracy. Our results show that VON Similo provides 94.7% accuracy in identifying a web element in a new release of the same SUT. In comparison, Similo provides 83.8% accuracy. These results demonstrate the applicability of the visually overlap** nodes concept/tool for web element localization in evolving web applications and contribute a novel way of thinking about web element localization in future research on GUI-based testing.
△ Less
Submitted 13 January, 2023; v1 submitted 10 January, 2023;
originally announced January 2023.
-
Requirements Quality vs Process and Stakeholders' Well-being: A Case of a Nordic Bank
Authors:
Emil Lind,
Javier Gonzalez-Huerta,
Emil Alégroth
Abstract:
Requirements are key artefacts to describe the intended purpose of a software system. The quality of requirements is crucial for deciding what to do next, impacting the development process's effectiveness and efficiency. However, we know very little about the connection between practitioners' perceptions regarding requirements quality and its impact on the process or the feelings of the profession…
▽ More
Requirements are key artefacts to describe the intended purpose of a software system. The quality of requirements is crucial for deciding what to do next, impacting the development process's effectiveness and efficiency. However, we know very little about the connection between practitioners' perceptions regarding requirements quality and its impact on the process or the feelings of the professionals involved in the development process.
Objectives: This study investigates: i) How software development practitioners define requirements quality, ii) how the perceived quality of requirements impact process and stakeholders' well-being, and iii) what are the causes and potential solutions for poor-quality requirements. Method: This study was performed as a descriptive interview study at a sub-organization of a Nordic bank that develops its own web and mobile apps. The data collection comprises interviews with 20 practitioners, including requirements engineers, developers, testers, and newly employed developers, with five interviewees from each group.
Results: The results show that different roles have different views on what makes a requirement good quality. Participants highlighted that, in general, they experience negative emotions, more work, and overhead communication when they work with requirements they perceive to be of poor quality. The practitioners also describe positive effects on their performance and positive feelings when they work with requirements that they perceive to be good.
△ Less
Submitted 11 November, 2022;
originally announced November 2022.
-
Similarity-based web element localization for robust test automation
Authors:
Michel Nass,
Emil Alégroth,
Robert Feldt,
Maurizio Leotta,
Filippo Ricca
Abstract:
Non-robust (fragile) test execution is a commonly reported challenge in GUI-based test automation, despite much research and several proposed solutions. A test script needs to be resilient to (minor) changes in the tested application but, at the same time, fail when detecting potential issues that require investigation. Test script fragility is a multi-faceted problem, but one crucial challenge is…
▽ More
Non-robust (fragile) test execution is a commonly reported challenge in GUI-based test automation, despite much research and several proposed solutions. A test script needs to be resilient to (minor) changes in the tested application but, at the same time, fail when detecting potential issues that require investigation. Test script fragility is a multi-faceted problem, but one crucial challenge is reliably identifying and locating the correct target web elements when the website evolves between releases or otherwise fails and reports an issue. This paper proposes and evaluates a novel approach called similarity-based web element localization (Similo), which leverages information from multiple web element locator parameters to identify a target element using a weighted similarity score. The experimental study compares Similo to a baseline approach for web element localization. To get an extensive empirical basis, we target 40 of the most popular websites on the Internet in our evaluation. Robustness is considered by counting the number of web elements found in a recent website version compared to how many of these existed in an older version. Results of the experiment show that Similo outperforms the baseline representing the current state-of-the-art; it failed to locate the correct target web element in 72 out of 598 considered cases compared to 146 failed cases for the baseline approach. This study presents evidence that quantifying the similarity between multiple attributes of web elements when trying to locate them, as in our proposed Similo approach, is beneficial. With acceptable efficiency, Similo gives significantly higher effectiveness (i.e., robustness) than the baseline web element localization approach.
△ Less
Submitted 1 August, 2022;
originally announced August 2022.
-
When Traceability Goes Awry: an Industrial Experience Report
Authors:
Davide Fucci,
Emil Alégroth,
Thomas Axelsson
Abstract:
The concept of traceability between artifacts is considered an enabler for software project success. This concept has received plenty of attention from the research community and is by many perceived to always be available in an industrial setting. In this industry-academia collaborative project, a team of researchers, supported by testing practitioners from a large telecommunication company, soug…
▽ More
The concept of traceability between artifacts is considered an enabler for software project success. This concept has received plenty of attention from the research community and is by many perceived to always be available in an industrial setting. In this industry-academia collaborative project, a team of researchers, supported by testing practitioners from a large telecommunication company, sought to investigate the partner company's issues related to software quality. However, it was soon identified that the fundamental traceability links between requirements and test cases were missing. This lack of traceability impeded the implementation of a solution to help the company deal with its quality issues. In this experience report, we discuss lessons learned about the practical value of creating and maintaining traceability links in complex industrial settings and provide a cautionary tale for researchers.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
Compliance Requirements in Large-Scale Software Development: An Industrial Case Study
Authors:
Muhammad Usman,
Michael Felderer,
Michael Unterkalmsteiner,
Eriks Klotins,
Daniel Mendez,
Emil Alegroth
Abstract:
Regulatory compliance is a well-studied area, including research on how to model, check, analyse, enact, and verify compliance of software. However, while the theoretical body of knowledge is vast, empirical evidence on challenges with regulatory compliance, as faced by industrial practitioners particularly in the Software Engineering domain, is still lacking. In this paper, we report on an indust…
▽ More
Regulatory compliance is a well-studied area, including research on how to model, check, analyse, enact, and verify compliance of software. However, while the theoretical body of knowledge is vast, empirical evidence on challenges with regulatory compliance, as faced by industrial practitioners particularly in the Software Engineering domain, is still lacking. In this paper, we report on an industrial case study which aims at providing insights into common practices and challenges with checking and analysing regulatory compliance, and we discuss our insights in direct relation to the state of reported evidence. Our study is performed at Ericsson AB, a large telecommunications company, which must comply to both locally and internationally governing regulatory entities and standards such as GDPR. The main contributions of this work are empirical evidence on challenges experienced by Ericsson that complement the existing body of knowledge on regulatory compliance.
△ Less
Submitted 2 March, 2021;
originally announced March 2021.
-
A Taxonomy of Assets for the Development of Software-Intensive Products and Services
Authors:
Ehsan Zabardast,
Javier Gonzalez-Huerta,
Tony Gorschek,
Darja Šmite,
Emil Alégroth,
Fabian Fagerholm
Abstract:
Context: Develo** software-intensive products or services usually involves a plethora of software artefacts. Assets are artefacts intended to be used more than once and have value for organisations; examples include test cases, code, requirements, and documentation. During the development process, assets might degrade, affecting the effectiveness and efficiency of the development process. Theref…
▽ More
Context: Develo** software-intensive products or services usually involves a plethora of software artefacts. Assets are artefacts intended to be used more than once and have value for organisations; examples include test cases, code, requirements, and documentation. During the development process, assets might degrade, affecting the effectiveness and efficiency of the development process. Therefore, assets are an investment that requires continuous management. Identifying assets is the first step for their effective management. However, there is a lack of awareness of what assets and types of assets are common in software-develo** organisations. Most types of assets are understudied, and their state of quality and how they degrade over time have not been well-understood. Method: We perform a systematic literature review and a field study at five companies to study and identify assets to fill the gap in research. The results were analysed qualitatively and summarised in a taxonomy. Results: We create the first comprehensive, structured, yet extendable taxonomy of assets, containing 57 types of assets. Conclusions: The taxonomy serves as a foundation for identifying assets that are relevant for an organisation and enables the study of asset management and asset degradation concepts.
△ Less
Submitted 21 October, 2022; v1 submitted 19 February, 2021;
originally announced February 2021.
-
Software Engineers' Information Seeking Behavior in Change Impact Analysis - An Interview Study
Authors:
Markus Borg,
Emil Alégroth,
Per Runeson
Abstract:
Software engineers working in large projects must navigate complex information landscapes. Change Impact Analysis (CIA) is a task that relies on engineers' successful information seeking in databases storing, e.g., source code, requirements, design descriptions, and test case specifications. Several previous approaches to support information seeking are task-specific, thus understanding engineers'…
▽ More
Software engineers working in large projects must navigate complex information landscapes. Change Impact Analysis (CIA) is a task that relies on engineers' successful information seeking in databases storing, e.g., source code, requirements, design descriptions, and test case specifications. Several previous approaches to support information seeking are task-specific, thus understanding engineers' seeking behavior in specific tasks is fundamental. We present an industrial case study on how engineers seek information in CIA, with a particular focus on traceability and development artifacts that are not source code. We show that engineers have different information seeking behavior, and that some do not consider traceability particularly useful when conducting CIA. Furthermore, we observe a tendency for engineers to prefer less rigid types of support rather than formal approaches, i.e., engineers value support that allows flexibility in how to practically conduct CIA. Finally, due to diverse information seeking behavior, we argue that future CIA support should embrace individual preferences to identify change impact by empowering several seeking alternatives, including searching, browsing, and tracing.
△ Less
Submitted 6 March, 2017;
originally announced March 2017.
-
Maintenance of Automated Test Suites in Industry: An Empirical study on Visual GUI Testing
Authors:
Emil Alégroth,
Robert Feldt,
Pirjo Kolström
Abstract:
Context: Verification and validation (V&V) activities make up 20 to 50 percent of the total development costs of a software system in practice. Test automation is proposed to lower these V&V costs but available research only provides limited empirical data from industrial practice about the maintenance costs of automated tests and what factors affect these costs. In particular, these costs and fac…
▽ More
Context: Verification and validation (V&V) activities make up 20 to 50 percent of the total development costs of a software system in practice. Test automation is proposed to lower these V&V costs but available research only provides limited empirical data from industrial practice about the maintenance costs of automated tests and what factors affect these costs. In particular, these costs and factors are unknown for automated GUI-based testing.
Objective: This paper addresses this lack of knowledge through analysis of the costs and factors associated with the maintenance of automated GUI-based tests in industrial practice.
Method: An empirical study at two companies, Siemens and Saab, is reported where interviews about, and empirical work with, Visual GUI Testing is performed to acquire data about the technique's maintenance costs and feasibility.
Results: 13 factors are observed that affect maintenance, e.g. tester knowledge/experience and test case complexity. Further, statistical analysis shows that develo** new test scripts is costlier than maintenance but also that frequent maintenance is less costly than infrequent, big bang maintenance. In addition a cost model, based on previous work, is presented that estimates the time to positive return on investment (ROI) of test automation compared to manual testing.
Conclusions: It is concluded that test automation can lower overall software development costs of a project whilst also having positive effects on software quality. However, maintenance costs can still be considerable and the less time a company currently spends on manual testing, the more time is required before positive, economic, ROI is reached after automation.
△ Less
Submitted 3 February, 2016;
originally announced February 2016.