Search | arXiv e-print repository

Can GPT-4 Replicate Empirical Software Engineering Research?

Authors: Jenny T. Liang, Carmen Badea, Christian Bird, Robert DeLine, Denae Ford, Nicole Forsgren, Thomas Zimmermann

Abstract: Empirical software engineering research on production systems has brought forth a better understanding of the software engineering process for practitioners and researchers alike. However, only a small subset of production systems is studied, limiting the impact of this research. While software engineering practitioners could benefit from replicating research on their own data, this poses its own… ▽ More Empirical software engineering research on production systems has brought forth a better understanding of the software engineering process for practitioners and researchers alike. However, only a small subset of production systems is studied, limiting the impact of this research. While software engineering practitioners could benefit from replicating research on their own data, this poses its own set of challenges, since performing replications requires a deep understanding of research methodologies and subtle nuances in software engineering data. Given that large language models (LLMs), such as GPT-4, show promise in tackling both software engineering- and science-related tasks, these models could help replicate and thus democratize empirical software engineering research. In this paper, we examine GPT-4's abilities to perform replications of empirical software engineering research on new data. We study their ability to surface assumptions made in empirical software engineering research methodologies, as well as their ability to plan and generate code for analysis pipelines on seven empirical software engineering papers. We perform a user study with 14 participants with software engineering research expertise, who evaluate GPT-4-generated assumptions and analysis plans (i.e., a list of module specifications) from the papers. We find that GPT-4 is able to surface correct assumptions, but struggles to generate ones that apply common knowledge about software engineering data. In a manual analysis of the generated code, we find that the GPT-4-generated code contains correct high-level logic, given a subset of the methodology. However, the code contains many small implementation-level errors, reflecting a lack of software engineering knowledge. Our findings have implications for leveraging LLMs for software engineering research as well as practitioner data scientists in software teams. △ Less

Submitted 19 June, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

arXiv:2110.10248 [pdf]

2020 State of the Octoverse: Finding Balance Between Work and Play

Authors: Nicole Forsgren, Greg Ceccarelli, Derek Jedamski, Scot Kelly, Clair Sullivan

Abstract: Over the past year, many developers and other technology professionals have transitioned to a remote-first world, as COVID-19 pressed organizations to support working from home whenever possible. This shift quickly changed the routines and environments where we work and learn, redrawing the lines between personal and professional lives. How does this affect the ways we develop and deliver software… ▽ More Over the past year, many developers and other technology professionals have transitioned to a remote-first world, as COVID-19 pressed organizations to support working from home whenever possible. This shift quickly changed the routines and environments where we work and learn, redrawing the lines between personal and professional lives. How does this affect the ways we develop and deliver software, both at work and in our open source projects? △ Less

Submitted 19 October, 2021; originally announced October 2021.

Comments: GitHub

arXiv:2110.10246 [pdf]

2020 State of the Octoverse: Securing the World's Software

Authors: Nicole Forsgren, Bas Alberts, Kevin Backhouse, Grey Baker, Greg Cecarelli, Derek Jedamski, Scot Kelly, Clair Sullivan

Abstract: Open source is the connective tissue for much of the information economy. You would be hard-pressed to find a scenario where your data does not pass through at least one open source component. Many of the services and technology we all rely on, from banking to healthcare, also rely on open source software. The artifacts of open source code serve as critical i infrastructure for much of the global… ▽ More Open source is the connective tissue for much of the information economy. You would be hard-pressed to find a scenario where your data does not pass through at least one open source component. Many of the services and technology we all rely on, from banking to healthcare, also rely on open source software. The artifacts of open source code serve as critical i infrastructure for much of the global economy, making the security of open source software mission-critical to the world. △ Less

Submitted 19 October, 2021; originally announced October 2021.

Comments: published by GitHub

arXiv:2110.08403 [pdf, other]

Nalanda: A Socio-Technical Graph for Building Software Analytics Tools at Enterprise Scale

Authors: Chandra Maddila, Suhas Shanbhogue, Apoorva Agrawal, Thomas Zimmermann, Chetan Bansal, Nicole Forsgren, Divyanshu Agrawal, Kim Herzig, Arie van Deursen

Abstract: Software development is information-dense knowledge work that requires collaboration with other developers and awareness of artifacts such as work items, pull requests, and files. With the speed of development increasing, information overload is a challenge for people develo** and maintaining these systems. Finding information and people is difficult for software engineers, especially when they… ▽ More Software development is information-dense knowledge work that requires collaboration with other developers and awareness of artifacts such as work items, pull requests, and files. With the speed of development increasing, information overload is a challenge for people develo** and maintaining these systems. Finding information and people is difficult for software engineers, especially when they work in large software systems or have just recently joined a project. In this paper, we build a large scale data platform named Nalanda platform, which contains two subsystems: 1. A large scale socio-technical graph system, named Nalanda graph system 2. A large scale recommendation system, named Nalanda index system that aims at satisfying the information needs of software developers. The Nalanda graph is an enterprise scale graph with data from 6,500 repositories, with 37,410,706 nodes and 128,745,590 edges. On top of the Nalanda graph system, we built software analytics applications including a newsfeed named MyNalanda, and based on organic growth alone, it has Daily Active Users (DAU) of 290 and Monthly Active Users (MAU) of 590. A preliminary user study shows that 74% of developers and engineering managers surveyed are favorable toward continued use of the platform for information discovery. The Nalanda index system constitutes two indices: artifact index and expert index. It uses the socio-technical graph (Nalanda graph system) to rank the results and provide better recommendations to software developers. A large scale quantitative evaluation shows that the Nalanda index system provides recommendations with an accuracy of 78% for the top three recommendations. △ Less

Submitted 19 September, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

Showing 1–4 of 4 results for author: Forsgren, N