User Centric Evaluation of Code Generation Tools

Miah, Tanha; Zhu, Hong

Computer Science > Software Engineering

arXiv:2402.03130 (cs)

[Submitted on 5 Feb 2024 (v1), last revised 18 Jun 2024 (this version, v3)]

Title:User Centric Evaluation of Code Generation Tools

Authors:Tanha Miah, Hong Zhu

View PDF HTML (experimental)

Abstract:With the rapid advance of machine learning (ML) technology, large language models (LLMs) are increasingly explored as an intelligent tool to generate program code from natural language specifications. However, existing evaluations of LLMs have focused on their capabilities in comparison with humans. It is desirable to evaluate their usability when deciding on whether to use a LLM in software production. This paper proposes a user centric method for this purpose. It includes metadata in the test cases of a benchmark to describe their usages, conducts testing in a multi-attempt process that mimics the uses of LLMs, measures LLM generated solutions on a set of quality attributes that reflect usability, and evaluates the performance based on user experiences in the uses of LLMs as a tool.
The paper also reports a case study with the method in the evaluation of ChatGPT's usability as a code generation tool for the R programming language. Our experiments demonstrated that ChatGPT is highly useful for generating R program code although it may fail on hard programming tasks. The user experiences are good with overall average number of attempts being 1.61 and the average time of completion being 47.02 seconds. Our experiments also found that the weakest aspect of usability is conciseness, which has a score of 3.80 out of 5.

Comments:	The paper is accepted by IEEE AITest 2024 at IEEE CISOSE 2024 Congress as an invited paper, and will appear in the AITest 2024 Conference Proceedings
Subjects:	Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2402.03130 [cs.SE]
	(or arXiv:2402.03130v3 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2402.03130

Submission history

From: Hong Zhu [view email]
[v1] Mon, 5 Feb 2024 15:56:19 UTC (557 KB)
[v2] Tue, 9 Apr 2024 12:37:56 UTC (4,614 KB)
[v3] Tue, 18 Jun 2024 13:45:05 UTC (3,066 KB)

Computer Science > Software Engineering

Title:User Centric Evaluation of Code Generation Tools

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:User Centric Evaluation of Code Generation Tools

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators