Showing 1–2 of 2 results for author: Caddy, J

Search v0.5.6 released 2020-02-24

arXiv:2405.10891 [pdf, other]

cs.SE

doi 10.1145/3663533.3664041

Prioritising GitHub Priority Labels

Authors: James Caddy, Christoph Treude

Abstract: Communities on GitHub often use issue labels as a way of triaging issues by assigning them priority ratings based on how urgently they should be addressed. The labels used are determined by the repository contributors and not standardised by GitHub. This makes it difficult for priority-related reasoning across repositories for both researchers and contributors. Previous work shows interest in how… ▽ More Communities on GitHub often use issue labels as a way of triaging issues by assigning them priority ratings based on how urgently they should be addressed. The labels used are determined by the repository contributors and not standardised by GitHub. This makes it difficult for priority-related reasoning across repositories for both researchers and contributors. Previous work shows interest in how issues are labelled and what the consequences for those labels are. For instance, some previous work has used clustering models and natural language processing to categorise labels without a particular emphasis on priority. With this publication, we introduce a unique data set of 812 manually categorised labels pertaining to priority; normalised and ranked as low-, medium-, or high-priority. To provide an example of how this data set could be used, we have created a tool for GitHub contributors that will create a list of the highest priority issues from the repositories to which they contribute. We have released the data set and the tool for anyone to use on Zenodo because we hope that this will help the open source community address high-priority issues more effectively and inspire other uses. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: 4 pages, 5 tables, 2 figures, appearing in PROMISE 2024
arXiv:2204.07363 [pdf, ps, other]

cs.CL cs.SE

Is Surprisal in Issue Trackers Actionable?

Authors: James Caddy, Markus Wagner, Christoph Treude, Earl T. Barr, Miltiadis Allamanis

Abstract: Background. From information theory, surprisal is a measurement of how unexpected an event is. Statistical language models provide a probabilistic approximation of natural languages, and because surprisal is constructed with the probability of an event occuring, it is therefore possible to determine the surprisal associated with English sentences. The issues and pull requests of software repositor… ▽ More Background. From information theory, surprisal is a measurement of how unexpected an event is. Statistical language models provide a probabilistic approximation of natural languages, and because surprisal is constructed with the probability of an event occuring, it is therefore possible to determine the surprisal associated with English sentences. The issues and pull requests of software repository issue trackers give insight into the development process and likely contain the surprising events of this process. Objective. Prior works have identified that unusual events in software repositories are of interest to developers, and use simple code metrics-based methods for detecting them. In this study we will propose a new method for unusual event detection in software repositories using surprisal. With the ability to find surprising issues and pull requests, we intend to further analyse them to determine if they actually hold importance in a repository, or if they pose a significant challenge to address. If it is possible to find bad surprises early, or before they cause additional troubles, it is plausible that effort, cost and time will be saved as a result. Method. After extracting the issues and pull requests from 5000 of the most popular software repositories on GitHub, we will train a language model to represent these issues. We will measure their perceived importance in the repository, measure their resolution difficulty using several analogues, measure the surprisal of each, and finally generate inferential statistics to describe any correlations. △ Less

Submitted 15 April, 2022; originally announced April 2022.

Comments: 8 pages, 1 figure. Submitted to 2022 International Conference on Mining Software Repositories Registered Reports track

ACM Class: H.3.3; I.2.7

Search v0.5.6 released 2020-02-24