-
Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation
Authors:
Wendkûuni C. Ouédraogo,
Kader Kaboré,
Haoye Tian,
Yewei Song,
Anil Koyuncu,
Jacques Klein,
David Lo,
Tegawendé F. Bissyandé
Abstract:
Unit testing, crucial for identifying bugs in code modules like classes and methods, is often neglected by developers due to time constraints. Automated test generation techniques have emerged to address this, but often lack readability and require developer intervention. Large Language Models (LLMs), like GPT and Mistral, show promise in software engineering, including in test generation. However…
▽ More
Unit testing, crucial for identifying bugs in code modules like classes and methods, is often neglected by developers due to time constraints. Automated test generation techniques have emerged to address this, but often lack readability and require developer intervention. Large Language Models (LLMs), like GPT and Mistral, show promise in software engineering, including in test generation. However, their effectiveness remains unclear.
This study conducts the first comprehensive investigation of LLMs, evaluating the effectiveness of four LLMs and five prompt engineering techniques, for unit test generation. We analyze 216\,300 tests generated by the selected advanced instruct-tuned LLMs for 690 Java classes collected from diverse datasets. We assess correctness, understandability, coverage, and bug detection capabilities of LLM-generated tests, comparing them to EvoSuite, a popular automated testing tool. While LLMs show potential, improvements in test correctness are necessary. This study reveals the strengths and limitations of LLMs compared to traditional methods, paving the way for further research on LLMs in software engineering.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Can Edge Computing fulfill the requirements of automated vehicular services using 5G network ?
Authors:
Wendlasida Ouedraogo,
Andrea Araldo,
Badii Jouaber,
Hind Castel,
Remy Grunblatt
Abstract:
Communication and computation services supporting Connected and Automated Vehicles (CAVs) are characterized by stringent requirements, in terms of response time and reliability. Fulfilling these requirements is crucial for ensuring road safety and traffic optimization. The conceptually simple solution of hosting these services in the vehicles increases their cost (mainly due to the installation an…
▽ More
Communication and computation services supporting Connected and Automated Vehicles (CAVs) are characterized by stringent requirements, in terms of response time and reliability. Fulfilling these requirements is crucial for ensuring road safety and traffic optimization. The conceptually simple solution of hosting these services in the vehicles increases their cost (mainly due to the installation and maintenance of computation infrastructure) and may drain their battery excessively. Such disadvantages can be tackled via Multi-Access Edge Computing (MEC), consisting in deploying computation capability in network nodes deployed close to the devices (vehicles in this case), such as to satisfy the stringent CAV requirements. However, it is not yet clear under which conditions MEC can support CAV requirements and for which services. To shed light on this question, we conduct a simulation campaign using well-known open-source simulation tools, namely OMNeT++, Simu5G, Veins, INET, and SUMO. We are thus able to provide a reality check on MEC for CAV, pinpointing what are the computation capacities that must be installed in the MEC, to support the different services, and the amount of vehicles that a single MEC node can support. We find that such parameters must vary a lot, depending on the service considered. This study can serve as a preliminary basis for network operators to plan future deployment of MEC to support CAV.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Enriching Automatic Test Case Generation by Extracting Relevant Test Inputs from Bug Reports
Authors:
Wendkûuni C. Ouédraogo,
Laura Plein,
Kader Kaboré,
Andrew Habib,
Jacques Klein,
David Lo,
Tegawendé F. Bissyandé
Abstract:
The quality of a software is highly dependent on the quality of the tests it is submitted to. Writing tests for bug detection is thus essential. However, it is time-consuming when done manually. Automating test cases generation has therefore been an exciting research area in the software engineering community. Most approaches have been focused on generating unit tests. Unfortunately, current effor…
▽ More
The quality of a software is highly dependent on the quality of the tests it is submitted to. Writing tests for bug detection is thus essential. However, it is time-consuming when done manually. Automating test cases generation has therefore been an exciting research area in the software engineering community. Most approaches have been focused on generating unit tests. Unfortunately, current efforts often do not lead to the generation of relevant inputs, which limits the efficiency of automatically generated tests. Towards improving the relevance of test inputs, we present \name, a technique for exploring bug reports to identify input values that can be fed to automatic test generation tools. In this work, we investigate the performance of using inputs extracted from bug reports with \name to generate test cases with Evosuite. The evaluation is performed on the Defects4J benchmark. For Defects4J projects, our study has shown that \name successfully extracted 68.68\% of relevant inputs when using regular expression in its approach versus 50.21\% relevant inputs without regular expression. Further, our study has shown the potential to improve the Line and Instruction Coverage across all projects. Overall, we successfully collected relevant inputs that led to the detection of 45 bugs that were previously undetected by the baseline.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Automatic Generation of Test Cases based on Bug Reports: a Feasibility Study with Large Language Models
Authors:
Laura Plein,
Wendkûuni C. Ouédraogo,
Jacques Klein,
Tegawendé F. Bissyandé
Abstract:
Software testing is a core discipline in software engineering where a large array of research results has been produced, notably in the area of automatic test generation. Because existing approaches produce test cases that either can be qualified as simple (e.g. unit tests) or that require precise specifications, most testing procedures still rely on test cases written by humans to form test suite…
▽ More
Software testing is a core discipline in software engineering where a large array of research results has been produced, notably in the area of automatic test generation. Because existing approaches produce test cases that either can be qualified as simple (e.g. unit tests) or that require precise specifications, most testing procedures still rely on test cases written by humans to form test suites. Such test suites, however, are incomplete: they only cover parts of the project or they are produced after the bug is fixed. Yet, several research challenges, such as automatic program repair, and practitioner processes, build on the assumption that available test suites are sufficient. There is thus a need to break existing barriers in automatic test case generation. While prior work largely focused on random unit testing inputs, we propose to consider generating test cases that realistically represent complex user execution scenarios, which reveal buggy behaviour. Such scenarios are informally described in bug reports, which should therefore be considered as natural inputs for specifying bug-triggering test cases. In this work, we investigate the feasibility of performing this generation by leveraging large language models (LLMs) and using bug reports as inputs. Our experiments include the use of ChatGPT, as an online service, as well as CodeGPT, a code-related pre-trained LLM that was fine-tuned for our task. Overall, we experimentally show that bug reports associated to up to 50% of Defects4J bugs can prompt ChatGPT to generate an executable test case. We show that even new bug reports can indeed be used as input for generating executable test cases. Finally, we report experimental results which confirm that LLM-generated test cases are immediately useful in software engineering tasks such as fault localization as well as patch validation in automated program repair.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Coqlex: Generating Formally Verified Lexers
Authors:
Wendlasida Ouedraogo,
Gabriel Scherer,
Lutz Strassburger
Abstract:
A compiler consists of a sequence of phases going from lexical analysis to code generation. Ideally, the formal verification of a compiler should include the formal verification of each component of the tool-chain. An example is the CompCert project, a formally verified C compiler, that comes with associated tools and proofs that allow to formally verify most of those components. However, some com…
▽ More
A compiler consists of a sequence of phases going from lexical analysis to code generation. Ideally, the formal verification of a compiler should include the formal verification of each component of the tool-chain. An example is the CompCert project, a formally verified C compiler, that comes with associated tools and proofs that allow to formally verify most of those components. However, some components, in particular the lexer, remain unverified. In fact, the lexer of Compcert is generated using OCamllex, a lex-like OCaml lexer generator that produces lexers from a set of regular expressions with associated semantic actions. Even though there exist various approaches, like CakeML or Verbatim++, to write verified lexers, they all have only limited practical applicability. In order to contribute to the end-to-end verification of compilers, we implemented a generator of verified lexers whose usage is similar to OCamllex. Our software, called Coqlex, reads a lexer specification and generates a lexer equipped with a Coq proof of its correctness. It provides a formally verified implementation of most features of standard, unverified lexer generators.
The conclusions of our work are two-fold: Firstly, verified lexers gain to follow a user experience similar to lex/flex or OCamllex, with a domain-specific syntax to write lexers comfortably. This introduces a small gap between the written artifact and the verified lexer, but our design minimizes this gap and makes it practical to review the generated lexer. The user remains able to prove further properties of their lexer. Secondly, it is possible to combine simplicity and decent performance. Our implementation approach that uses Brzozowski derivatives is noticeably simpler than the previous work in Verbatim++ that tries to generate a deterministic finite automaton (DFA) ahead of time, and it is also noticeably faster thanks to careful design choices.
We wrote several example lexers that suggest that the convenience of using Coqlex is close to that of standard verified generators, in particular, OCamllex. We used Coqlex in an industrial project to implement a verified lexer of Ada. This lexer is part of a tool to optimize safety-critical programs, some of which are very large. This experience confirmed that Coqlex is usable in practice, and in particular that its performance is good enough. Finally, we performed detailed performance comparisons between Coqlex, OCamllex, and Verbatim++. Verbatim++ is the state-of-the-art tool for verified lexers in Coq, and the performance of its lexer was carefully optimized in previous work by Egolf and al. (2022). Our results suggest that Coqlex is two orders of magnitude slower than OCamllex, but two orders of magnitude faster than Verbatim++. Verified compilers and other language-processing tools are becoming important tools for safety-critical or security-critical applications. They provide trust and replace more costly approaches to certification, such as manually reading the generated code. Verified lexers are a missing piece in several Coq-based verified compilers today. Coqlex comes with safety guarantees, and thus shows that it is possible to build formally verified front-ends.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.