Search | arXiv e-print repository

End-to-End Software Construction using ChatGPT: An Experience Report

Authors: Mauricio Monteiro, Bruno Castelo Branco, Samuel Silvestre, Guilherme Avelino, Marco Tulio Valente

Abstract: In this paper, we explore the application of Large Language Models (LLMs) in the particular context of end-to-end software construction, i.e., in contexts where software developers have a set of requirements and have to design, implement, test, and validate a new software system. Particularly, we report an experiment where we asked three software developers to use ChatGPT to fully implement a Web-… ▽ More In this paper, we explore the application of Large Language Models (LLMs) in the particular context of end-to-end software construction, i.e., in contexts where software developers have a set of requirements and have to design, implement, test, and validate a new software system. Particularly, we report an experiment where we asked three software developers to use ChatGPT to fully implement a Web-based application using mainstream software architectures and technologies. After that, we compare the apps produced by ChatGPT with a reference implementation that we manually implemented for our research. As a result, we document four categories of prompts that can be used by developers in similar contexts, including initialization prompts, feature requests, bug-fixing, and layout prompts. Additionally, we discuss the advantages and disadvantages of two prompt construction approaches: top-down (where we start with a high-level description of the target software, typically in the form of user stories) and bottom-up (where we request the construction of the system feature by feature). △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2208.07501 [pdf, other]

Identifying Source Code File Experts

Authors: Otávio Cury, Guilherme Avelino, Pedro Santos Neto, Ricardo Britto, Marco Túlio Valente

Abstract: In software development, the identification of source code file experts is an important task. Identifying these experts helps to improve software maintenance and evolution activities, such as develo** new features, code reviews, and bug fixes. Although some studies have proposed repository mining techniques to automatically identify source code experts, there are still gaps in this area that can… ▽ More In software development, the identification of source code file experts is an important task. Identifying these experts helps to improve software maintenance and evolution activities, such as develo** new features, code reviews, and bug fixes. Although some studies have proposed repository mining techniques to automatically identify source code experts, there are still gaps in this area that can be explored. For example, investigating new variables related to source code knowledge and applying machine learning aiming to improve the performance of techniques to identify source code experts. The goal of this study is to investigate opportunities to improve the performance of existing techniques to recommend source code files experts. We built an oracle by collecting data from the development history and surveying developers of 113 software projects. Then, we use this oracle to: (i) analyze the correlation between measures extracted from the development history and the developers source code knowledge and (ii) investigate the use of machine learning classifiers by evaluating their performance in identifying source code files experts. First Authorship and Recency of Modification are the variables with the highest positive and negative correlations with source code knowledge, respectively. Machine learning classifiers outperformed the linear techniques (F-Measure = 71% to 73%) in the public dataset, but this advantage is not clear in the private dataset, with F-Measure ranging from 55% to 68% for the linear techniques and 58% to 67% for ML techniques. Overall, the linear techniques and the machine learning classifiers achieved similar performance, particularly if we analyze F-Measure. However, machine learning classifiers usually get higher precision while linear techniques obtained the highest recall values. △ Less

Submitted 15 August, 2022; originally announced August 2022.

Comments: Accepted at 16th International Symposium on Empirical Software Engineering and Measurement (ESEM), 12 pages, 2022

arXiv:1906.08058 [pdf, other]

On the abandonment and survival of open source projects: An empirical investigation

Authors: Guilherme Avelino, Eleni Constantinou, Marco Tulio Valente, Alexander Serebrenik

Abstract: Background: Evolution of open source projects frequently depends on a small number of core developers. The loss of such core developers might be detrimental for projects and even threaten their entire continuation. However, it is possible that new core developers assume the project maintenance and allow the project to survive. Aims: The objective of this paper is to provide empirical evidence on:… ▽ More Background: Evolution of open source projects frequently depends on a small number of core developers. The loss of such core developers might be detrimental for projects and even threaten their entire continuation. However, it is possible that new core developers assume the project maintenance and allow the project to survive. Aims: The objective of this paper is to provide empirical evidence on: 1) the frequency of project abandonment and survival, 2) the differences between abandoned and surviving projects, and 3) the motivation and difficulties faced when assuming an abandoned project. Method: We adopt a mixed-methods approach to investigate project abandonment and survival. We carefully select 1,932 popular GitHub projects and recover the abandoned and surviving projects, and conduct a survey with developers that have been instrumental in the survival of the projects. Results: We found that 315 projects (16%) were abandoned and 128 of these projects (41%) survived because of new core developers who assumed the project development. The survey indicates that (i) in most cases the new maintainers were aware of the project abandonment risks when they started to contribute; (ii) their own usage of the systems is the main motivation to contribute to such projects; (iii) human and social factors played a key role when making these contributions; and (iv) lack of time and the difficulty to obtain push access to the repositories are the main barriers faced by them. Conclusions: Project abandonment is a reality even in large open source projects and our work enables a better understanding of such risks, as well as highlights ways in avoiding them. △ Less

Submitted 19 June, 2019; originally announced June 2019.

Comments: 11 pages, 12 figures

arXiv:1703.02925 [pdf, other]

doi 10.1007/978-3-319-57735-7_15

Assessing Code Authorship: The Case of the Linux Kernel

Authors: Guilherme Avelino, Leonardo Passos, Andre Hora, Marco Tulio Valente

Abstract: Code authorship is a key information in large-scale open source systems. Among others, it allows maintainers to assess division of work and identify key collaborators. Interestingly, open-source communities lack guidelines on how to manage authorship. This could be mitigated by setting to build an empirical body of knowledge on how authorship-related measures evolve in successful open-source commu… ▽ More Code authorship is a key information in large-scale open source systems. Among others, it allows maintainers to assess division of work and identify key collaborators. Interestingly, open-source communities lack guidelines on how to manage authorship. This could be mitigated by setting to build an empirical body of knowledge on how authorship-related measures evolve in successful open-source communities. Towards that direction, we perform a case study on the Linux kernel. Our results show that: (a) only a small portion of developers (26 %) makes significant contributions to the code base; (b) the distribution of the number of files per author is highly skewed --- a small group of top authors (3 %) is responsible for hundreds of files, while most authors (75 %) are responsible for at most 11 files; (c) most authors (62 %) have a specialist profile; (d) authors with a high number of co-authorship connections tend to collaborate with others with less connections. △ Less

Submitted 8 March, 2017; originally announced March 2017.

Comments: Accepted at 13th International Conference on Open Source Systems (OSS). 12 pages

arXiv:1604.06766 [pdf, other]

doi 10.1109/ICPC.2016.7503718

A Novel Approach for Estimating Truck Factors

Authors: Guilherme Avelino, Leonardo Passos, Andre Hora, Marco Tulio Valente

Abstract: Truck Factor (TF) is a metric proposed by the agile community as a tool to identify concentration of knowledge in software development environments. It states the minimal number of developers that have to be hit by a truck (or quit) before a project is incapacitated. In other words, TF helps to measure how prepared is a project to deal with developer turnover. Despite its clear relevance, few stud… ▽ More Truck Factor (TF) is a metric proposed by the agile community as a tool to identify concentration of knowledge in software development environments. It states the minimal number of developers that have to be hit by a truck (or quit) before a project is incapacitated. In other words, TF helps to measure how prepared is a project to deal with developer turnover. Despite its clear relevance, few studies explore this metric. Altogether there is no consensus about how to calculate it, and no supporting evidence backing estimates for systems in the wild. To mitigate both issues, we propose a novel (and automated) approach for estimating TF-values, which we execute against a corpus of 133 popular project in GitHub. We later survey developers as a means to assess the reliability of our results. Among others, we find that the majority of our target systems (65%) have TF <= 2. Surveying developers from 67 target systems provides confidence towards our estimates; in 84% of the valid answers we collect, developers agree or partially agree that the TF's authors are the main authors of their systems; in 53% we receive a positive or partially positive answer regarding our estimated truck factors. △ Less

Submitted 22 April, 2016; originally announced April 2016.

Comments: Accepted at 24th International Conference on Program Comprehension (ICPC)

Showing 1–5 of 5 results for author: Avelino, G