-
Can We Identify Stack Overflow Questions Requiring Code Snippets? Investigating the Cause & Effect of Missing Code Snippets
Authors:
Saikat Mondal,
Mohammad Masudur Rahman,
Chanchal K. Roy
Abstract:
On the Stack Overflow (SO) Q&A site, users often request solutions to their code-related problems (e.g., errors, unexpected behavior). Unfortunately, they often miss required code snippets during their question submission, which could prevent their questions from getting prompt and appropriate answers. In this study, we conduct an empirical study investigating the cause & effect of missing code sn…
▽ More
On the Stack Overflow (SO) Q&A site, users often request solutions to their code-related problems (e.g., errors, unexpected behavior). Unfortunately, they often miss required code snippets during their question submission, which could prevent their questions from getting prompt and appropriate answers. In this study, we conduct an empirical study investigating the cause & effect of missing code snippets in SO questions whenever required. Here, our contributions are threefold. First, we analyze how the presence or absence of required code snippets affects the correlation between question types (missed code, included code after requests & had code snippets during submission) and corresponding answer meta-data (e.g., presence of an accepted answer). According to our analysis, the chance of getting accepted answers is three times higher for questions that include required code snippets during their question submission than those that missed the code. We also investigate whether the confounding factors (e.g., user reputation) affect questions receiving answers besides the presence or absence of required code snippets. We found that such factors do not hurt the correlation between the presence or absence of required code snippets and answer meta-data. Second, we surveyed 64 practitioners to understand why users miss necessary code snippets. About 60% of them agree that users are unaware of whether their questions require any code snippets. Third, we thus extract four text-based features (e.g., keywords) and build six ML models to identify the questions that need code snippets. Our models can predict the target questions with 86.5% precision, 90.8% recall, 85.3% F1-score, and 85.2% overall accuracy. Our work has the potential to save significant time in programming question-answering and improve the quality of the valuable knowledge base by decreasing unanswered and unresolved questions.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Enhancing User Interaction in ChatGPT: Characterizing and Consolidating Multiple Prompts for Issue Resolution
Authors:
Saikat Mondal,
Suborno Deb Bappon,
Chanchal K. Roy
Abstract:
Prompt design plays a crucial role in sha** the efficacy of ChatGPT, influencing the model's ability to extract contextually accurate responses. Thus, optimal prompt construction is essential for maximizing the utility and performance of ChatGPT. However, sub-optimal prompt design may necessitate iterative refinement, as imprecise or ambiguous instructions can lead to undesired responses from Ch…
▽ More
Prompt design plays a crucial role in sha** the efficacy of ChatGPT, influencing the model's ability to extract contextually accurate responses. Thus, optimal prompt construction is essential for maximizing the utility and performance of ChatGPT. However, sub-optimal prompt design may necessitate iterative refinement, as imprecise or ambiguous instructions can lead to undesired responses from ChatGPT. Existing studies explore several prompt patterns and strategies to improve the relevance of responses generated by ChatGPT. However, the exploration of constraints that necessitate the submission of multiple prompts is still an unmet attempt. In this study, our contributions are twofold. First, we attempt to uncover gaps in prompt design that demand multiple iterations. In particular, we manually analyze 686 prompts that were submitted to resolve issues related to Java and Python programming languages and identify eleven prompt design gaps (e.g., missing specifications). Such gap exploration can enhance the efficacy of single prompts in ChatGPT. Second, we attempt to reproduce the ChatGPT response by consolidating multiple prompts into a single one. We can completely consolidate prompts with four gaps (e.g., missing context) and partially consolidate prompts with three gaps (e.g., additional functionality). Such an effort provides concrete evidence to users to design more optimal prompts mitigating these gaps. Our study findings and evidence can - (a) save users time, (b) reduce costs, and (c) increase user satisfaction.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Investigating the Utility of ChatGPT in the Issue Tracking System: An Exploratory Study
Authors:
Joy Krishan Das,
Saikat Mondal,
Chanchal K. Roy
Abstract:
Issue tracking systems serve as the primary tool for incorporating external users and customizing a software project to meet the users' requirements. However, the limited number of contributors and the challenge of identifying the best approach for each issue often impede effective resolution. Recently, an increasing number of developers are turning to AI tools like ChatGPT to enhance problem-solv…
▽ More
Issue tracking systems serve as the primary tool for incorporating external users and customizing a software project to meet the users' requirements. However, the limited number of contributors and the challenge of identifying the best approach for each issue often impede effective resolution. Recently, an increasing number of developers are turning to AI tools like ChatGPT to enhance problem-solving efficiency. While previous studies have demonstrated the potential of ChatGPT in areas such as automatic program repair, debugging, and code generation, there is a lack of study on how developers explicitly utilize ChatGPT to resolve issues in their tracking system. Hence, this study aims to examine the interaction between ChatGPT and developers to analyze their prevalent activities and provide a resolution. In addition, we assess the code reliability by confirming if the code produced by ChatGPT was integrated into the project's codebase using the clone detection tool NiCad. Our investigation reveals that developers mainly use ChatGPT for brainstorming solutions but often opt to write their code instead of using ChatGPT-generated code, possibly due to concerns over the generation of "hallucinated code", as highlighted in the literature.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Investigating Technology Usage Span by Analyzing Users' Q&A Traces in Stack Overflow
Authors:
Saikat Mondal,
Debajyoti Mondal,
Chanchal K. Roy
Abstract:
Choosing an appropriate software development technology (e.g., programming language) is challenging due to the proliferation of diverse options. The selection of inappropriate technologies for development may have a far-reaching effect on software developers' career growth. Switching to a different technology after working with one may lead to a complex learning curve and, thus, be more challengin…
▽ More
Choosing an appropriate software development technology (e.g., programming language) is challenging due to the proliferation of diverse options. The selection of inappropriate technologies for development may have a far-reaching effect on software developers' career growth. Switching to a different technology after working with one may lead to a complex learning curve and, thus, be more challenging. Therefore, it is crucial for software developers to find technologies that have a high usage span. Intuitively, the usage span of a technology can be determined by the time span developers have used that technology. Existing literature focuses on the technology landscape to explore the complex and implicit dependencies among technologies but lacks formal studies to draw insights about their usage span. This paper investigates the technology usage span by analyzing the question and answering (Q&A) traces of Stack Overflow (SO), the largest technical Q&A website available to date. In particular, we analyze 6.7 million Q&A traces posted by about 97K active SO users and see what technologies have appeared in their questions or answers over 15 years. According to our analysis, C# and Java programming languages have a high usage span, followed by JavaScript. Besides, developers used the .NET framework, iOS & Windows Operating Systems (OS), and SQL query language for a long time (on average). Our study also exposes the emerging (i.e., newly growing) technologies. For example, usages of technologies such as SwiftUI, .NET-6.0, Visual Studio 2022, and Blazor WebAssembly framework are increasing. The findings from our study can assist novice developers, startup software industries, and software users in determining appropriate technologies. This also establishes an initial benchmark for future investigation on the use span of software technologies.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Unveiling the potential of large language models in generating semantic and cross-language clones
Authors:
Palash R. Roy,
Ajmain I. Alam,
Farouq Al-omari,
Banani Roy,
Chanchal K. Roy,
Kevin A. Schneider
Abstract:
Semantic and Cross-language code clone generation may be useful for code reuse, code comprehension, refactoring and benchmarking. OpenAI's GPT model has potential in such clone generation as GPT is used for text generation. When developers copy/paste codes from Stack Overflow (SO) or within a system, there might be inconsistent changes leading to unexpected behaviours. Similarly, if someone posses…
▽ More
Semantic and Cross-language code clone generation may be useful for code reuse, code comprehension, refactoring and benchmarking. OpenAI's GPT model has potential in such clone generation as GPT is used for text generation. When developers copy/paste codes from Stack Overflow (SO) or within a system, there might be inconsistent changes leading to unexpected behaviours. Similarly, if someone possesses a code snippet in a particular programming language but seeks equivalent functionality in a different language, a semantic cross-language code clone generation approach could provide valuable assistance. In this study, using SemanticCloneBench as a vehicle, we evaluated how well the GPT-3 model could help generate semantic and cross-language clone variants for a given fragment.We have comprised a diverse set of code fragments and assessed GPT-3s performance in generating code variants.Through extensive experimentation and analysis, where 9 judges spent 158 hours to validate, we investigate the model's ability to produce accurate and semantically correct variants. Our findings shed light on GPT-3's strengths in code generation, offering insights into the potential applications and challenges of using advanced language models in software development. Our quantitative analysis yields compelling results. In the realm of semantic clones, GPT-3 attains an impressive accuracy of 62.14% and 0.55 BLEU score, achieved through few-shot prompt engineering. Furthermore, the model shines in transcending linguistic confines, boasting an exceptional 91.25% accuracy in generating cross-language clones
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
GPTCloneBench: A comprehensive benchmark of semantic clones and cross-language clones using GPT-3 model and SemanticCloneBench
Authors:
Ajmain Inqiad Alam,
Palash Ranjan Roy,
Farouq Al-omari,
Chanchal Kumar Roy,
Banani Roy,
Kevin Schneider
Abstract:
With the emergence of Machine Learning, there has been a surge in leveraging its capabilities for problem-solving across various domains. In the code clone realm, the identification of type-4 or semantic clones has emerged as a crucial yet challenging task. Researchers aim to utilize Machine Learning to tackle this challenge, often relying on the BigCloneBench dataset. However, it's worth noting t…
▽ More
With the emergence of Machine Learning, there has been a surge in leveraging its capabilities for problem-solving across various domains. In the code clone realm, the identification of type-4 or semantic clones has emerged as a crucial yet challenging task. Researchers aim to utilize Machine Learning to tackle this challenge, often relying on the BigCloneBench dataset. However, it's worth noting that BigCloneBench, originally not designed for semantic clone detection, presents several limitations that hinder its suitability as a comprehensive training dataset for this specific purpose. Furthermore, CLCDSA dataset suffers from a lack of reusable examples aligning with real-world software systems, rendering it inadequate for cross-language clone detection approaches. In this work, we present a comprehensive semantic clone and cross-language clone benchmark, GPTCloneBench by exploiting SemanticCloneBench and OpenAI's GPT-3 model. In particular, using code fragments from SemanticCloneBench as sample inputs along with appropriate prompt engineering for GPT-3 model, we generate semantic and cross-language clones for these specific fragments and then conduct a combination of extensive manual analysis, tool-assisted filtering, functionality testing and automated validation in building the benchmark. From 79,928 clone pairs of GPT-3 output, we created a benchmark with 37,149 true semantic clone pairs, 19,288 false semantic pairs(Type-1/Type-2), and 20,770 cross-language clones across four languages (Java, C, C#, and Python). Our benchmark is 15-fold larger than SemanticCloneBench, has more functional code examples for software systems and programming language support than CLCDSA, and overcomes BigCloneBench's qualities, quantification, and language variety limitations.
△ Less
Submitted 1 September, 2023; v1 submitted 26 August, 2023;
originally announced August 2023.
-
Do Subjectivity and Objectivity Always Agree? A Case Study with Stack Overflow Questions
Authors:
Saikat Mondal,
Mohammad Masudur Rahman,
Chanchal K. Roy
Abstract:
In Stack Overflow (SO), the quality of posts (i.e., questions and answers) is subjectively evaluated by users through a voting mechanism. The net votes (upvotes - downvotes) obtained by a post are often considered an approximation of its quality. However, about half of the questions that received working solutions got more downvotes than upvotes. Furthermore, about 18% of the accepted answers (i.e…
▽ More
In Stack Overflow (SO), the quality of posts (i.e., questions and answers) is subjectively evaluated by users through a voting mechanism. The net votes (upvotes - downvotes) obtained by a post are often considered an approximation of its quality. However, about half of the questions that received working solutions got more downvotes than upvotes. Furthermore, about 18% of the accepted answers (i.e., verified solutions) also do not score the maximum votes. All these counter-intuitive findings cast doubts on the reliability of the evaluation mechanism employed at SO. Moreover, many users raise concerns against the evaluation, especially downvotes to their posts. Therefore, rigorous verification of the subjective evaluation is highly warranted to ensure a non-biased and reliable quality assessment mechanism. In this paper, we compare the subjective assessment of questions with their objective assessment using 2.5 million questions and ten text analysis metrics. According to our investigation, four objective metrics agree with the subjective evaluation, two do not agree, one either agrees or disagrees, and the remaining three neither agree nor disagree with the subjective evaluation. We then develop machine learning models to classify the promoted and discouraged questions. Our models outperform the state-of-the-art models with a maximum of about 76% - 87% accuracy.
△ Less
Submitted 7 April, 2023;
originally announced April 2023.
-
Pathways to Leverage Transcompiler based Data Augmentation for Cross-Language Clone Detection
Authors:
Subroto Nag Pinku,
Debajyoti Mondal,
Chanchal K. Roy
Abstract:
Software clones are often introduced when developers reuse code fragments to implement similar functionalities in the same or different software systems. Many high-performing clone detection tools today are based on deep learning techniques and are mostly used for detecting clones written in the same programming language, whereas clone detection tools for detecting cross-language clones are also e…
▽ More
Software clones are often introduced when developers reuse code fragments to implement similar functionalities in the same or different software systems. Many high-performing clone detection tools today are based on deep learning techniques and are mostly used for detecting clones written in the same programming language, whereas clone detection tools for detecting cross-language clones are also emerging rapidly. The popularity of deep learning-based clone detection tools creates an opportunity to investigate how known strategies that boost the performances of deep learning models could be further leveraged to improve clone detection tools. In this paper, we investigate such a strategy, data augmentation, which has not yet been explored for cross-language clone detection as opposed to single-language clone detection. We show how the existing knowledge on transcompilers (source-to-source translators) can be used for data augmentation to boost the performance of cross-language clone detection models, as well as to adapt single-language clone detection models to create cross-language clone detection pipelines. To demonstrate the performance boost for cross-language clone detection through data augmentation, we exploit Transcoder, which is a pre-trained source-to-source translator. To show how to extend single-language models for cross-language clone detection, we extend a popular single-language model, Graph Matching Network (GMN) in a combination with the transcompilers. We evaluated our models on popular benchmark datasets. Our experimental results showed improvements in F1 scores (sometimes up to 3%) for the cutting-edge cross-language clone detection models. Even when extending GMN for cross-language clone detection, the models built leveraging data augmentation outperformed the baseline with scores of 0.90, 0.92, and 0.91 for precision, recall, and F1 score, respectively.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
OCFormer: One-Class Transformer Network for Image Classification
Authors:
Prerana Mukherjee,
Chandan Kumar Roy,
Swalpa Kumar Roy
Abstract:
We propose a novel deep learning framework based on Vision Transformers (ViT) for one-class classification. The core idea is to use zero-centered Gaussian noise as a pseudo-negative class for latent space representation and then train the network using the optimal loss function. In prior works, there have been tremendous efforts to learn a good representation using varieties of loss functions, whi…
▽ More
We propose a novel deep learning framework based on Vision Transformers (ViT) for one-class classification. The core idea is to use zero-centered Gaussian noise as a pseudo-negative class for latent space representation and then train the network using the optimal loss function. In prior works, there have been tremendous efforts to learn a good representation using varieties of loss functions, which ensures both discriminative and compact properties. The proposed one-class Vision Transformer (OCFormer) is exhaustively experimented on CIFAR-10, CIFAR-100, Fashion-MNIST and CelebA eyeglasses datasets. Our method has shown significant improvements over competing CNN based one-class classifier approaches.
△ Less
Submitted 25 April, 2022;
originally announced April 2022.
-
Backports: Change Types, Challenges and Strategies
Authors:
Debasish Chakroborti,
Kevin A. Schneider,
Chanchal K. Roy
Abstract:
Source code repositories allow developers to manage multiple versions (or branches) of a software system. Pull-requests are used to modify a branch, and backporting is a regular activity used to port changes from a current development branch to other versions. In open-source software, backports are common and often need to be adapted by hand, which motivates us to explore backports and backporting…
▽ More
Source code repositories allow developers to manage multiple versions (or branches) of a software system. Pull-requests are used to modify a branch, and backporting is a regular activity used to port changes from a current development branch to other versions. In open-source software, backports are common and often need to be adapted by hand, which motivates us to explore backports and backporting challenges and strategies. In our exploration of 68,424 backports from 10 GitHub projects, we found that bug, test, document, and feature changes are commonly backported. We identified a number of backporting challenges, including that backports were inconsistently linked to their original pull-request (49%), that backports had incompatible code (13%), that backports failed to be accepted (10%), and that there were backporting delays (16 days to create, 5 days to merge). We identified some general strategies for addressing backporting issues. We also noted that backporting strategies depend on the project type and that further investigation is needed to determine their suitability. Furthermore, we created the first-ever backports dataset that can be used by other researchers and practitioners for investigating backports and backporting.
△ Less
Submitted 7 April, 2022;
originally announced April 2022.
-
Leveraging Structural Properties of Source Code Graphs for Just-In-Time Bug Prediction
Authors:
Md Nadim,
Debajyoti Mondal,
Chanchal K. Roy
Abstract:
The most common use of data visualization is to minimize the complexity for proper understanding. A graph is one of the most commonly used representations for understanding relational data. It produces a simplified representation of data that is challenging to comprehend if kept in a textual format. In this study, we propose a methodology to utilize the relational properties of source code in the…
▽ More
The most common use of data visualization is to minimize the complexity for proper understanding. A graph is one of the most commonly used representations for understanding relational data. It produces a simplified representation of data that is challenging to comprehend if kept in a textual format. In this study, we propose a methodology to utilize the relational properties of source code in the form of a graph to identify Just-in-Time (JIT) bug prediction in software systems during different revisions of software evolution and maintenance. We presented a method to convert the source codes of commit patches to equivalent graph representations and named it Source Code Graph (SCG). To understand and compare multiple source code graphs, we extracted several structural properties of these graphs, such as the density, number of cycles, nodes, edges, etc. We then utilized the attribute values of those SCGs to visualize and detect buggy software commits. We process more than 246K software commits from 12 subject systems in this investigation. Our investigation on these 12 open-source software projects written in C++ and Java programming languages shows that if we combine the features from SCG with conventional features used in similar studies, we will get the increased performance of Machine Learning (ML) based buggy commit detection models. We also find the increase of F1~Scores in predicting buggy and non-buggy commits statistically significant using the Wilcoxon Signed Rank Test. Since SCG-based feature values represent the style or structural properties of source code updates or changes in the software system, it suggests the importance of careful maintenance of source code style or structure for kee** a software system bug-free.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
Evaluating the Performance of Clone Detection Tools in Detecting Cloned Co-change Candidates
Authors:
Md Nadim,
Manishankar Mondal,
Chanchal K. Roy,
Kevin Schneider
Abstract:
Co-change candidates are the group of code fragments that require a change if any of these fragments experience a modification in a commit operation during software evolution. The cloned co-change candidates are a subset of the co-change candidates, and the members in this subset are clones of one another. The cloned co-change candidates are usually created by reusing existing code fragments in a…
▽ More
Co-change candidates are the group of code fragments that require a change if any of these fragments experience a modification in a commit operation during software evolution. The cloned co-change candidates are a subset of the co-change candidates, and the members in this subset are clones of one another. The cloned co-change candidates are usually created by reusing existing code fragments in a software system. Detecting cloned co-change candidates is essential for clone-tracking, and studies have shown that we can use clone detection tools to find cloned co-change candidates. However, although several studies evaluate clone detection tools for their accuracy in detecting cloned fragments, we found no study that evaluates clone detection tools for detecting cloned co-change candidates. In this study, we explore the dimension of code clone research for detecting cloned co-change candidates. We compare the performance of 12 different configurations of nine promising clone detection tools in identifying cloned co-change candidates from eight open-source C and Java-based subject systems of various sizes and application domains. A ranked list and analysis of the results provides valuable insights and guidelines into selecting and configuring a clone detection tool for identifying co-change candidates and leads to a new dimension of code clone research into change impact analysis.
△ Less
Submitted 19 January, 2022;
originally announced January 2022.
-
The Reproducibility of Programming-Related Issues in Stack Overflow Questions
Authors:
Saikat Mondal,
Mohammad Masudur Rahman,
Chanchal K. Roy,
Kevin Schneider
Abstract:
Software developers often look for solutions to their code-level problems using the Stack Overflow Q&A website. To receive help, developers frequently submit questions containing sample code segments and the description of the programming issue. Unfortunately, it is not always possible to reproduce the issues from the code segments that may impede questions from receiving prompt and appropriate so…
▽ More
Software developers often look for solutions to their code-level problems using the Stack Overflow Q&A website. To receive help, developers frequently submit questions containing sample code segments and the description of the programming issue. Unfortunately, it is not always possible to reproduce the issues from the code segments that may impede questions from receiving prompt and appropriate solutions. We conducted an exploratory study on the reproducibility of issues discussed in 400 Java and 400 Python questions. We parsed, compiled, executed, and carefully examined the code segments from these questions to reproduce the reported programming issues. The outcomes of our study are three-fold. First, we found that we can reproduce approximately 68% of Java and 71% of Python issues, whereas we were unable to reproduce approximately 22% of Java and 19% of Python issues using the code segments. Of the issues that were reproducible, approximately 67% of the Java code segments and 20% of the Python code segments required minor or major modifications to reproduce the issues. Second, we carefully investigated why programming issues could not be reproduced and provided evidence-based guidelines for writing effective code examples for Stack Overflow questions. Third, we investigated the correlation between the issue reproducibility status of questions and the corresponding answer meta-data, such as the presence of an accepted answer. According to our analysis, a reproducible question has at least two times higher chance of receiving an accepted answer than an irreproducible question. Besides, the median time delay in receiving accepted answers is double if the issues reported in questions could not be reproduced. We also investigate the confounding factors (e.g., reputation) and find that confounding factors do not hurt the correlation between reproducibility status and answer meta-data.
△ Less
Submitted 25 December, 2021; v1 submitted 23 November, 2021;
originally announced November 2021.
-
An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Datasets
Authors:
Gias Uddin,
Yann-Gael Gueheneuc,
Foutse Khomh,
Chanchal K Roy
Abstract:
Sentiment analysis in software engineering (SE) has shown promise to analyze and support diverse development activities. We report the results of an empirical study that we conducted to determine the feasibility of develo** an ensemble engine by combining the polarity labels of stand-alone SE-specific sentiment detectors. Our study has two phases. In the first phase, we pick five SE-specific sen…
▽ More
Sentiment analysis in software engineering (SE) has shown promise to analyze and support diverse development activities. We report the results of an empirical study that we conducted to determine the feasibility of develo** an ensemble engine by combining the polarity labels of stand-alone SE-specific sentiment detectors. Our study has two phases. In the first phase, we pick five SE-specific sentiment detection tools from two recently published papers by Lin et al. [31, 32], who first reported negative results with standalone sentiment detectors and then proposed an improved SE-specific sentiment detector, POME [31]. We report the study results on 17,581 units (sentences/documents) coming from six currently available sentiment benchmarks for SE. We find that the existing tools can be complementary to each other in 85-95% of the cases, i.e., one is wrong, but another is right. However, a majority voting-based ensemble of those tools fails to improve the accuracy of sentiment detection. We develop Sentisead, a supervised tool by combining the polarity labels and bag of words as features. Sentisead improves the performance (F1-score) of the individual tools by 4% (over Senti4SD [5]) - 100% (over POME [31]). In a second phase, we compare and improve Sentisead infrastructure using Pre-trained Transformer Models (PTMs). We find that a Sentisead infrastructure with RoBERTa as the ensemble of the five stand-alone rule-based and shallow learning SE-specific tools from Lin et al. [31, 32] offers the best F1-score of 0.805 across the six datasets, while a stand-alone RoBERTa shows an F1-score of 0.801.
△ Less
Submitted 4 November, 2021;
originally announced November 2021.
-
Semantic Slicing of Architectural Change Commits: Towards Semantic Design Review
Authors:
Amit Kumar Mondal,
Chanchal K. Roy,
Kevin A. Schneider,
Banani Roy,
Sristy Sumana Nath
Abstract:
Software architectural changes involve more than one module or component and are complex to analyze compared to local code changes. Development teams aiming to review architectural aspects (design) of a change commit consider many essential scenarios such as access rules and restrictions on usage of program entities across modules. Moreover, design review is essential when proper architectural for…
▽ More
Software architectural changes involve more than one module or component and are complex to analyze compared to local code changes. Development teams aiming to review architectural aspects (design) of a change commit consider many essential scenarios such as access rules and restrictions on usage of program entities across modules. Moreover, design review is essential when proper architectural formulations are paramount for develo** and deploying a system. Untangling architectural changes, recovering semantic design, and producing design notes are the crucial tasks of the design review process. To support these tasks, we construct a lightweight tool [4] that can detect and decompose semantic slices of a commit containing architectural instances. A semantic slice consists of a description of relational information of involved modules, their classes, methods and connected modules in a change instance, which is easy to understand to a reviewer. We extract various directory and naming structures (DANS) properties from the source code for develo** our tool. Utilizing the DANS properties, our tool first detects architectural change instances based on our defined metric and then decomposes the slices (based on string processing). Our preliminary investigation with ten open-source projects (developed in Java and Kotlin) reveals that the DANS properties produce highly reliable precision and recall (93-100%) for detecting and generating architectural slices. Our proposed tool will serve as the preliminary approach for the semantic design recovery and design summary generation for the project releases.
△ Less
Submitted 1 September, 2021;
originally announced September 2021.
-
A Systematic Review of Automated Query Reformulations in Source Code Search
Authors:
Mohammad Masudur Rahman,
Chanchal K. Roy
Abstract:
Fixing software bugs and adding new features are two of the major maintenance tasks. Software bugs and features are reported as change requests. Developers consult these requests and often choose a few keywords from them as an ad hoc query. Then they execute the query with a search engine to find the exact locations within software code that need to be changed. Unfortunately, even experienced deve…
▽ More
Fixing software bugs and adding new features are two of the major maintenance tasks. Software bugs and features are reported as change requests. Developers consult these requests and often choose a few keywords from them as an ad hoc query. Then they execute the query with a search engine to find the exact locations within software code that need to be changed. Unfortunately, even experienced developers often fail to choose appropriate queries, which leads to costly trials and errors during a code search. Over the years, many studies attempt to reformulate the ad hoc queries from developers to support them. In this systematic literature review, we carefully select 70 primary studies on query reformulations from 2,970 candidate studies, perform an in-depth qualitative analysis (e.g., Grounded Theory), and then answer seven research questions with major findings. First, to date, eight major methodologies (e.g., term weighting, term co-occurrence analysis, thesaurus lookup) have been adopted to reformulate queries. Second, the existing studies suffer from several major limitations (e.g., lack of generalizability, vocabulary mismatch problem, subjective bias) that might prevent their wide adoption. Finally, we discuss the best practices and future opportunities to advance the state of research in search query reformulations.
△ Less
Submitted 8 June, 2023; v1 submitted 22 August, 2021;
originally announced August 2021.
-
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empirical Study
Authors:
Mohammad Masudur Rahman,
Foutse Khomh,
Shamima Yeasmin,
Chanchal K. Roy
Abstract:
Being light-weight and cost-effective, IR-based approaches for bug localization have shown promise in finding software bugs. However, the accuracy of these approaches heavily depends on their used bug reports. A significant number of bug reports contain only plain natural language texts. According to existing studies, IR-based approaches cannot perform well when they use these bug reports as searc…
▽ More
Being light-weight and cost-effective, IR-based approaches for bug localization have shown promise in finding software bugs. However, the accuracy of these approaches heavily depends on their used bug reports. A significant number of bug reports contain only plain natural language texts. According to existing studies, IR-based approaches cannot perform well when they use these bug reports as search queries. On the other hand, there is a piece of recent evidence that suggests that even these natural language-only reports contain enough good keywords that could help localize the bugs successfully. On one hand, these findings suggest that natural language-only bug reports might be a sufficient source for good query keywords. On the other hand, they cast serious doubt on the query selection practices in the IR-based bug localization. In this article, we attempted to clear the sky on this aspect by conducting an in-depth empirical study that critically examines the state-of-the-art query selection practices in IR-based bug localization. In particular, we use a dataset of 2,320 bug reports, employ ten existing approaches from the literature, exploit the Genetic Algorithm-based approach to construct optimal, near-optimal search queries from these bug reports, and then answer three research questions. We confirmed that the state-of-the-art query construction approaches are indeed not sufficient for constructing appropriate queries (for bug localization) from certain natural language-only bug reports although they contain such queries. We also demonstrate that optimal queries and non-optimal queries chosen from bug report texts are significantly different in terms of several keyword characteristics, which has led us to actionable insights. Furthermore, we demonstrate 27%--34% improvement in the performance of non-optimal queries through the application of our actionable insights to them.
△ Less
Submitted 11 August, 2021;
originally announced August 2021.
-
Mining API Usage Scenarios from Stack Overflow
Authors:
Gias Uddin,
Foutse Khomh,
Chanchal K Roy
Abstract:
We propose a framework to mine API usage scenarios from Stack Overflow. Each task consists of a code example, the task description, and the reactions of developers towards the code example. First, we present an algorithm to automatically link a code example in a forum post to an API mentioned in the textual contents of the forum post. Second, we generate a natural language description of the task…
▽ More
We propose a framework to mine API usage scenarios from Stack Overflow. Each task consists of a code example, the task description, and the reactions of developers towards the code example. First, we present an algorithm to automatically link a code example in a forum post to an API mentioned in the textual contents of the forum post. Second, we generate a natural language description of the task by summarizing the discussions around the code example. Third, we automatically associate developers reactions (i.e., positive and negative opinions) towards the code example to offer information about code quality. We evaluate the algorithms using three benchmarks.
△ Less
Submitted 17 February, 2021;
originally announced February 2021.
-
Automatic API Usage Scenario Documentation from Technical Q&A Sites
Authors:
Gias Uddin,
Foutse Khomh,
Chanchal K Roy
Abstract:
The online technical Q&A site Stack Overflow (SO) is popular among developers to support their coding and diverse development needs. To address shortcomings in API official documentation resources, several research has thus focused on augmenting official API documentation with insights (e.g., code examples) from SO. The techniques propose to add code examples/insights about APIs into its official…
▽ More
The online technical Q&A site Stack Overflow (SO) is popular among developers to support their coding and diverse development needs. To address shortcomings in API official documentation resources, several research has thus focused on augmenting official API documentation with insights (e.g., code examples) from SO. The techniques propose to add code examples/insights about APIs into its official documentation. Reviews are opinionated sentences with positive/negative sentiments. However, we are aware of no previous research that attempts to automatically produce API documentation from SO by considering both API code examples and reviews. In this paper, we present two novel algorithms that can be used to automatically produce API documentation from SO by combining code examples and reviews towards those examples. The first algorithm is called statistical documentation, which shows the distribution of positivity and negativity around the code examples of an API using different metrics (e.g., star ratings). The second algorithm is called concept-based documentation, which clusters similar and conceptually relevant usage scenarios. An API usage scenario contains a code example, a textual description of the underlying task addressed by the code example, and the reviews (i.e., opinions with positive and negative sentiments) from other developers towards the code example. We deployed the algorithms in Opiner, a web-based platform to aggregate information about APIs from online forums. We evaluated the algorithms by mining all Java JSON-based posts in SO and by conducting three user studies based on produced documentation from the posts.
△ Less
Submitted 16 February, 2021;
originally announced February 2021.
-
A Survey on the Evaluation of Clone Detection Performance and Benchmarking
Authors:
Jeffrey Svajlenko,
Chanchal K. Roy
Abstract:
There are a great many clone detection tools proposed in the literature. In this paper, we investigate the state of clone detection tool evaluation. We begin by surveying the clone detection benchmarks, and performing a multi-faceted evaluation and comparison of their features and capabilities. We then survey the existing clone detection tool and technique publications, and evaluate how the author…
▽ More
There are a great many clone detection tools proposed in the literature. In this paper, we investigate the state of clone detection tool evaluation. We begin by surveying the clone detection benchmarks, and performing a multi-faceted evaluation and comparison of their features and capabilities. We then survey the existing clone detection tool and technique publications, and evaluate how the authors of these works evaluate their own tools/techniques. We rank the individual works by how well they measure recall, precision, execution time and scalability. We select the works the best evaluate all four metrics as exemplars that should be considered by future researchers publishing clone detection tools/techniques when designing the empirical evaluation of their tool/technique. We measure statistics on tool evaluation by the authors, and find that evaluation is poor amongst the authors. We finish our investigation into clone detection evaluation by surveying the existing tool comparison studies, including both the qualitative and quantitative studies.
△ Less
Submitted 28 June, 2020;
originally announced June 2020.
-
The Vision of Software Clone Management: Past, Present, and Future
Authors:
Chanchal K. Roy,
Minhaz F. Zibran,
Rainer Koschke
Abstract:
Duplicated code or code clones are a kind of code smell that have both positive and negative impacts on the development and maintenance of software systems. Software clone research in the past mostly focused on the detection and analysis of code clones, while research in recent years extends to the whole spectrum of clone management. In the last decade, three surveys appeared in the literature, wh…
▽ More
Duplicated code or code clones are a kind of code smell that have both positive and negative impacts on the development and maintenance of software systems. Software clone research in the past mostly focused on the detection and analysis of code clones, while research in recent years extends to the whole spectrum of clone management. In the last decade, three surveys appeared in the literature, which cover the detection, analysis, and evolutionary characteristics of code clones. This paper presents a comprehensive survey on the state of the art in clone management, with in-depth investigation of clone management activities (e.g., tracing, refactoring, cost-benefit analysis) beyond the detection and analysis. This is the first survey on clone management, where we point to the achievements so far, and reveal avenues for further research necessary towards an integrated clone management system. We believe that we have done a good job in surveying the area of clone management and that this work may serve as a kind of roadmap for future research in the area
△ Less
Submitted 3 May, 2020;
originally announced May 2020.
-
An Exploratory Study to Find Motives Behind Cross-platform Forks from Software Heritage Dataset
Authors:
Avijit Bhattacharjee,
Sristy Sumana Nath,
Shurui Zhou,
Debasish Chakroborti,
Banani Roy,
Chanchal K. Roy,
Kevin Schneider
Abstract:
The fork-based development mechanism provides the flexibility and the unified processes for software teams to collaborate easily in a distributed setting without too much coordination overhead.Currently, multiple social coding platforms support fork-based development, such as GitHub, GitLab, and Bitbucket. Although these different platforms virtually share the same features, they have different em…
▽ More
The fork-based development mechanism provides the flexibility and the unified processes for software teams to collaborate easily in a distributed setting without too much coordination overhead.Currently, multiple social coding platforms support fork-based development, such as GitHub, GitLab, and Bitbucket. Although these different platforms virtually share the same features, they have different emphasis. As GitHub is the most popular platform and the corresponding data is publicly available, most of the current studies are focusing on GitHub hosted projects. However, we observed anecdote evidences that people are confused about choosing among these platforms, and some projects are migrating from one platform to another, and the reasons behind these activities remain unknown.With the advances of Software Heritage Graph Dataset (SWHGD),we have the opportunity to investigate the forking activities across platforms. In this paper, we conduct an exploratory study on 10popular open-source projects to identify cross-platform forks and investigate the motivation behind. Preliminary result shows that cross-platform forks do exist. For the 10 subject systems in this study, we found 81,357 forks in total among which 179 forks are on GitLab. Based on our qualitative analysis, we found that most of the cross-platform forks that we identified are mirrors of the repositories on another platform, but we still find cases that were created due to preference of using certain functionalities (e.g. Continuous Integration (CI)) supported by different platforms. This study lays the foundation of future research directions, such as understanding the differences between platforms and supporting cross-platform collaboration.
△ Less
Submitted 17 March, 2020;
originally announced March 2020.
-
Micro-level Modularity of Computaion-intensive Programs in Big Data Platforms: A Case Study with Image Data
Authors:
Amit Kumar Mondal,
Banani Roy,
Chanchal K. Roy,
Kevin A. Schneider
Abstract:
With the rapid advancement of Big Data platforms such as Hadoop, Spark, and Dataflow, many tools are being developed that are intended to provide end users with an interactive environment for large-scale data analysis (e.g., IQmulus). However, there are challenges using these platforms. For example, developers find it difficult to use these platforms when develo** interactive and reusable data a…
▽ More
With the rapid advancement of Big Data platforms such as Hadoop, Spark, and Dataflow, many tools are being developed that are intended to provide end users with an interactive environment for large-scale data analysis (e.g., IQmulus). However, there are challenges using these platforms. For example, developers find it difficult to use these platforms when develo** interactive and reusable data analytic tools. One approach to better support interactivity and reusability is the use of microlevel modularisation for computation-intensive tasks, which splits data operations into independent, composable modules. However, modularizing data and computation-intensive tasks into independent components differs from traditional programming, e.g., when accessing large scale data, controlling data-flow among components, and structuring computation logic. In this paper, we present a case study on modularizing real world computationintensive tasks that investigates the impact of modularization on processing large scale image data. To that end, we synthesize image data-processing patterns and propose a unified modular model for the effective implementation of computation-intensive tasks on data-parallel frameworks considering reproducibility, reusability, and customization. We present various insights of using the modularity model based on our experimental results from running image processing tasks on Spark and Hadoop clusters.
△ Less
Submitted 19 October, 2019;
originally announced October 2019.
-
LVMapper: A Large-variance Clone Detector Using Sequencing Alignment Approach
Authors:
Ming Wu,
Pengcheng Wang,
Kangqi Yin,
Haoyu Cheng,
Yun Xu,
Chanchal K. Roy
Abstract:
To detect large-variance code clones (i.e. clones with relatively more differences) in large-scale code repositories is difficult because most current tools can only detect almost identical or very similar clones. It will make promotion and changes to some software applications such as bug detection, code completion, software analysis, etc. Recently, CCAligner made an attempt to detect clones with…
▽ More
To detect large-variance code clones (i.e. clones with relatively more differences) in large-scale code repositories is difficult because most current tools can only detect almost identical or very similar clones. It will make promotion and changes to some software applications such as bug detection, code completion, software analysis, etc. Recently, CCAligner made an attempt to detect clones with relatively concentrated modifications called large-gap clones. Our contribution is to develop a novel and effective detection approach of large-variance clones to more general cases for not only the concentrated code modifications but also the scattered code modifications. A detector named LVMapper is proposed, borrowing and changing the approach of sequencing alignment in bioinformatics which can find two similar sequences with more differences. The ability of LVMapper was tested on both self-synthetic datasets and real cases, and the results show substantial improvement in detecting large-variance clones compared with other state-of-the-art tools including CCAligner. Furthermore, our new tool also presents good recall and precision for general Type-1, Type-2 and Type-3 clones on the widely used benchmarking dataset, BigCloneBench.
△ Less
Submitted 9 September, 2019;
originally announced September 2019.
-
Recommending Comprehensive Solutions for Programming Tasks by Mining Crowd Knowledge
Authors:
Rodrigo F. G. Silva,
Chanchal K. Roy,
Mohammad Masudur Rahman,
Kevin A. Schneider,
Klerisson Paixao,
Marcelo de Almeida Maia
Abstract:
Developers often search for relevant code examples on the web for their programming tasks. Unfortunately, they face two major problems. First, the search is impaired due to a lexical gap between their query (task description) and the information associated with the solution. Second, the retrieved solution may not be comprehensive, i.e., the code segment might miss a succinct explanation. These pro…
▽ More
Developers often search for relevant code examples on the web for their programming tasks. Unfortunately, they face two major problems. First, the search is impaired due to a lexical gap between their query (task description) and the information associated with the solution. Second, the retrieved solution may not be comprehensive, i.e., the code segment might miss a succinct explanation. These problems make the developers browse dozens of documents in order to synthesize an appropriate solution. To address these two problems, we propose CROKAGE (Crowd Knowledge Answer Generator), a tool that takes the description of a programming task (the query) and provides a comprehensive solution for the task. Our solutions contain not only relevant code examples but also their succinct explanations. Our proposed approach expands the task description with relevant API classes from Stack Overflow Q&A threads and then mitigates the lexical gap problems. Furthermore, we perform natural language processing on the top quality answers and then return such programming solutions containing code examples and code explanations unlike earlier studies. We evaluate our approach using 48 programming queries and show that it outperforms six baselines including the state-of-art by a statistically significant margin. Furthermore, our evaluation with 29 developers using 24 tasks (queries) confirms the superiority of CROKAGE over the state-of-art tool in terms of relevance of the suggested code examples, benefit of the code explanations and the overall solution quality (code + explanation).
△ Less
Submitted 20 March, 2019; v1 submitted 18 March, 2019;
originally announced March 2019.
-
Improving IR-Based Bug Localization with Context-Aware Query Reformulation
Authors:
Mohammad Masudur Rahman,
Chanchal K. Roy
Abstract:
Recent findings suggest that Information Retrieval (IR)-based bug localization techniques do not perform well if the bug report lacks rich structured information (eg relevant program entity names). Conversely, excessive structured information (eg stack traces) in the bug report might not always help the automated localization either. In this paper, we propose a novel technique--BLIZZARD-- that aut…
▽ More
Recent findings suggest that Information Retrieval (IR)-based bug localization techniques do not perform well if the bug report lacks rich structured information (eg relevant program entity names). Conversely, excessive structured information (eg stack traces) in the bug report might not always help the automated localization either. In this paper, we propose a novel technique--BLIZZARD-- that automatically localizes buggy entities from project source using appropriate query reformulation and effective information retrieval. In particular, our technique determines whether there are excessive program entities or not in a bug report (query), and then applies appropriate reformulations to the query for bug localization. Experiments using 5,139 bug reports show that our technique can localize the buggy source documents with 7%--56% higher Hit@10, 6%--62% higher MAP@10 and 6%--62% higher MRR@10 than the baseline technique. Comparison with the state-of-the-art techniques and their variants report that our technique can improve 19% in MAP@10 and 20% in MRR@10 over the state-of-the-art, and can improve 59% of the noisy queries and 39% of the poor queries.
△ Less
Submitted 1 August, 2018;
originally announced August 2018.
-
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge and Extra-Large Data Analytics
Authors:
Mohammad Masudur Rahman,
Chanchal K. Roy
Abstract:
Software developers frequently issue generic natural language queries for code search while using code search engines (e.g., GitHub native search, Krugle). Such queries often do not lead to any relevant results due to vocabulary mismatch problems. In this paper, we propose a novel technique that automatically identifies relevant and specific API classes from Stack Overflow Q & A site for a program…
▽ More
Software developers frequently issue generic natural language queries for code search while using code search engines (e.g., GitHub native search, Krugle). Such queries often do not lead to any relevant results due to vocabulary mismatch problems. In this paper, we propose a novel technique that automatically identifies relevant and specific API classes from Stack Overflow Q & A site for a programming task written as a natural language query, and then reformulates the query for improved code search. We first collect candidate API classes from Stack Overflow using pseudo-relevance feedback and two term weighting algorithms, and then rank the candidates using Borda count and semantic proximity between query keywords and the API classes. The semantic proximity has been determined by an analysis of 1.3 million questions and answers of Stack Overflow. Experiments using 310 code search queries report that our technique suggests relevant API classes with 48% precision and 58% recall which are 32% and 48% higher respectively than those of the state-of-the-art. Comparisons with two state-of-the-art studies and three popular search engines (e.g., Google, Stack Overflow, and GitHub native search) report that our reformulated queries (1) outperform the queries of the state-of-the-art, and (2) significantly improve the code search results provided by these contemporary search engines.
△ Less
Submitted 23 July, 2018;
originally announced July 2018.
-
Poster: Improving Bug Localization with Report Quality Dynamics and Query Reformulation
Authors:
Mohammad Masudur Rahman,
Chanchal K. Roy
Abstract:
Recent findings from a user study suggest that IR-based bug localization techniques do not perform well if the bug report lacks rich structured information such as relevant program entity names. On the contrary, excessive structured information such as stack traces in the bug report might always not be helpful for the automated bug localization. In this paper, we conduct a large empirical study us…
▽ More
Recent findings from a user study suggest that IR-based bug localization techniques do not perform well if the bug report lacks rich structured information such as relevant program entity names. On the contrary, excessive structured information such as stack traces in the bug report might always not be helpful for the automated bug localization. In this paper, we conduct a large empirical study using 5,500 bug reports from eight subject systems and replicating three existing studies from the literature. Our findings (1) empirically demonstrate how quality dynamics of bug reports affect the performances of IR-based bug localization, and (2) suggest potential ways (e.g., query reformulations) to overcome such limitations.
△ Less
Submitted 19 July, 2018;
originally announced July 2018.
-
Improved Query Reformulation for Concept Location using CodeRank and Document Structures
Authors:
Mohammad Masudur Rahman,
Chanchal K. Roy
Abstract:
During software maintenance, developers usually deal with a significant number of software change requests. As a part of this, they often formulate an initial query from the request texts, and then attempt to map the concepts discussed in the request to relevant source code locations in the software system (a.k.a., concept location). Unfortunately, studies suggest that they often perform poorly in…
▽ More
During software maintenance, developers usually deal with a significant number of software change requests. As a part of this, they often formulate an initial query from the request texts, and then attempt to map the concepts discussed in the request to relevant source code locations in the software system (a.k.a., concept location). Unfortunately, studies suggest that they often perform poorly in choosing the right search terms for a change task. In this paper, we propose a novel technique --ACER-- that takes an initial query, identifies appropriate search terms from the source code using a novel term weight --CodeRank, and then suggests effective reformulation to the initial query by exploiting the source document structures, query quality analysis and machine learning. Experiments with 1,675 baseline queries from eight subject systems report that our technique can improve 71% of the baseline queries which is highly promising. Comparison with five closely related existing techniques in query reformulation not only validates our empirical findings but also demonstrates the superiority of our technique.
△ Less
Submitted 12 July, 2018;
originally announced July 2018.
-
Predicting Usefulness of Code Review Comments using Textual Features and Developer Experience
Authors:
Mohammad Masudur Rahman,
Chanchal K. Roy,
Raula G. Kula
Abstract:
Although peer code review is widely adopted in both commercial and open source development, existing studies suggest that such code reviews often contain a significant amount of non-useful review comments. Unfortunately, to date, no tools or techniques exist that can provide automatic support in improving those non-useful comments. In this paper, we first report a comparative study between useful…
▽ More
Although peer code review is widely adopted in both commercial and open source development, existing studies suggest that such code reviews often contain a significant amount of non-useful review comments. Unfortunately, to date, no tools or techniques exist that can provide automatic support in improving those non-useful comments. In this paper, we first report a comparative study between useful and non-useful review comments where we contrast between them using their textual characteristics, and reviewers' experience. Then, based on the findings from the study, we develop RevHelper, a prediction model that can help the developers improve their code review comments through automatic prediction of their usefulness during review submission. Comparative study using 1,116 review comments suggested that useful comments share more vocabulary with the changed code, contain salient items like relevant code elements, and their reviewers are generally more experienced. Experiments using 1,482 review comments report that our model can predict comment usefulness with 66\% prediction accuracy which is promising. Comparison with three variants of a baseline model using a case study validates our empirical findings and demonstrates the potential of our model.
△ Less
Submitted 12 July, 2018;
originally announced July 2018.
-
RACK: Code Search in the IDE using Crowdsourced Knowledge
Authors:
Mohammad Masudur Rahman,
Chanchal K. Roy,
David Lo
Abstract:
Traditional code search engines often do not perform well with natural language queries since they mostly apply keyword matching. These engines thus require carefully designed queries containing information about programming APIs for code search. Unfortunately, existing studies suggest that preparing an effective query for code search is both challenging and time consuming for the developers. In t…
▽ More
Traditional code search engines often do not perform well with natural language queries since they mostly apply keyword matching. These engines thus require carefully designed queries containing information about programming APIs for code search. Unfortunately, existing studies suggest that preparing an effective query for code search is both challenging and time consuming for the developers. In this paper, we propose a novel code search tool--RACK--that returns relevant source code for a given code search query written in natural language text. The tool first translates the query into a list of relevant API classes by mining keyword-API associations from the crowdsourced knowledge of Stack Overflow, and then applies the reformulated query to GitHub code search API for collecting relevant results. Once a query related to a programming task is submitted, the tool automatically mines relevant code snippets from thousands of open-source projects, and displays them as a ranked list within the context of the developer's programming environment--the IDE.
Tool page: http://www.usask.ca/~masud.rahman/rack
△ Less
Submitted 12 July, 2018;
originally announced July 2018.
-
STRICT: Information Retrieval Based Search Term Identification for Concept Location
Authors:
Mohammad Masudur Rahman,
Chanchal K. Roy
Abstract:
During maintenance, software developers deal with numerous change requests that are written in an unstructured fashion using natural language. Such natural language texts illustrate the change requirement involving various domain related concepts. Software developers need to find appropriate search terms from those concepts so that they could locate the possible locations in the source code using…
▽ More
During maintenance, software developers deal with numerous change requests that are written in an unstructured fashion using natural language. Such natural language texts illustrate the change requirement involving various domain related concepts. Software developers need to find appropriate search terms from those concepts so that they could locate the possible locations in the source code using a search technique. Once such locations are identified, they can implement the requested changes there. Studies suggest that developers often perform poorly in coming up with good search terms for a change task. In this paper, we propose a novel technique--STRICT--that automatically identifies suitable search terms for a software change task by analyzing its task description using two information retrieval (IR) techniques-- TextRank and POSRank. These IR techniques determine a term's importance based on not only its co-occurrences with other important terms but also its syntactic relationships with them. Experiments using 1,939 change requests from eight subject systems report that STRICT can identify better quality search terms than baseline terms from 52%--62% of the requests with 30%--57% Top-10 retrieval accuracy which are promising. Comparison with two state-of-the-art techniques not only validates our empirical findings and but also demonstrates the superiority of our technique.
△ Less
Submitted 12 July, 2018;
originally announced July 2018.
-
CORRECT: Code Reviewer Recommendation at GitHub for Vendasta Technologies
Authors:
Mohammad Masudur Rahman,
Chanchal K. Roy,
Jesse Redl,
Jason A. Collins
Abstract:
Peer code review locates common coding standard violations and simple logical errors in the early phases of software development, and thus, reduces overall cost. Unfortunately, at GitHub, identifying an appropriate code reviewer for a pull request is challenging given that reliable information for reviewer identification is often not readily available. In this paper, we propose a code reviewer rec…
▽ More
Peer code review locates common coding standard violations and simple logical errors in the early phases of software development, and thus, reduces overall cost. Unfortunately, at GitHub, identifying an appropriate code reviewer for a pull request is challenging given that reliable information for reviewer identification is often not readily available. In this paper, we propose a code reviewer recommendation tool--CORRECT--that considers not only the relevant cross-project work experience (e.g., external library experience) of a developer but also her experience in certain specialized technologies (e.g., Google App Engine) associated with a pull request for determining her expertise as a potential code reviewer. We design our tool using client-server architecture, and then package the solution as a Google Chrome plug-in. Once the developer initiates a new pull request at GitHub, our tool automatically analyzes the request, mines two relevant histories, and then returns a ranked list of appropriate code reviewers for the request within the browser's context.
Demo: https://www.youtube.com/watch?v=rXU1wTD6QQ0
△ Less
Submitted 9 July, 2018;
originally announced July 2018.
-
CORRECT: Code Reviewer Recommendation in GitHub Based on Cross-Project and Technology Experience
Authors:
Mohammad Masudur Rahman,
Chanchal K. Roy,
Jason A. Collins
Abstract:
Peer code review locates common coding rule violations and simple logical errors in the early phases of software development, and thus reduces overall cost. However, in GitHub, identifying an appropriate code reviewer for a pull request is a non-trivial task given that reliable information for reviewer identification is often not readily available. In this paper, we propose a code reviewer recomme…
▽ More
Peer code review locates common coding rule violations and simple logical errors in the early phases of software development, and thus reduces overall cost. However, in GitHub, identifying an appropriate code reviewer for a pull request is a non-trivial task given that reliable information for reviewer identification is often not readily available. In this paper, we propose a code reviewer recommendation technique that considers not only the relevant cross-project work history (e.g., external library experience) but also the experience of a developer in certain specialized technologies associated with a pull request for determining her expertise as a potential code reviewer. We first motivate our technique using an exploratory study with 10 commercial projects and 10 associated libraries external to those projects. Experiments using 17,115 pull requests from 10 commercial projects and six open source projects show that our technique provides 85%--92% recommendation accuracy, about 86% precision and 79%--81% recall in code reviewer recommendation, which are highly promising. Comparison with the state-of-the-art technique also validates the empirical findings and the superiority of our recommendation technique.
△ Less
Submitted 9 July, 2018;
originally announced July 2018.
-
QUICKAR: Automatic Query Reformulation for Concept Location using Crowdsourced Knowledge
Authors:
Mohammad Masudur Rahman,
Chanchal K. Roy
Abstract:
During maintenance, software developers deal with numerous change requests made by the users of a software system. Studies show that the developers find it challenging to select appropriate search terms from a change request during concept location. In this paper, we propose a novel technique--QUICKAR--that automatically suggests helpful reformulations for a given query by leveraging the crowdsour…
▽ More
During maintenance, software developers deal with numerous change requests made by the users of a software system. Studies show that the developers find it challenging to select appropriate search terms from a change request during concept location. In this paper, we propose a novel technique--QUICKAR--that automatically suggests helpful reformulations for a given query by leveraging the crowdsourced knowledge from Stack Overflow. It determines semantic similarity or relevance between any two terms by analyzing their adjacent word lists from the programming questions of Stack Overflow, and then suggests semantically relevant queries for concept location. Experiments using 510 queries from two software systems suggest that our technique can improve or preserve the quality of 76% of the initial queries on average which is promising. Comparison with one baseline technique validates our preliminary findings, and also demonstrates the potential of our technique.
△ Less
Submitted 9 July, 2018;
originally announced July 2018.
-
RACK: Automatic API Recommendation using Crowdsourced Knowledge
Authors:
Mohammad Masudur Rahman,
Chanchal K. Roy,
David Lo
Abstract:
Traditional code search engines often do not perform well with natural language queries since they mostly apply keyword matching. These engines thus need carefully designed queries containing information about programming APIs for code search. Unfortunately, existing studies suggest that preparing an effective code search query is both challenging and time consuming for the developers. In this pap…
▽ More
Traditional code search engines often do not perform well with natural language queries since they mostly apply keyword matching. These engines thus need carefully designed queries containing information about programming APIs for code search. Unfortunately, existing studies suggest that preparing an effective code search query is both challenging and time consuming for the developers. In this paper, we propose a novel API recommendation technique--RACK that recommends a list of relevant APIs for a natural language query for code search by exploiting keyword-API associations from the crowdsourced knowledge of Stack Overflow. We first motivate our technique using an exploratory study with 11 core Java packages and 344K Java posts from Stack Overflow. Experiments using 150 code search queries randomly chosen from three Java tutorial sites show that our technique recommends correct API classes within the top 10 results for about 79% of the queries which is highly promising. Comparison with two variants of the state-of-the-art technique also shows that RACK outperforms both of them not only in Top-K accuracy but also in mean average precision and mean recall by a large margin.
△ Less
Submitted 9 July, 2018;
originally announced July 2018.
-
Recommending Insightful Comments for Source Code using Crowdsourced Knowledge
Authors:
Mohammad Masudur Rahman,
Chanchal K. Roy,
Iman Keivanloo
Abstract:
Recently, automatic code comment generation is proposed to facilitate program comprehension. Existing code comment generation techniques focus on describing the functionality of the source code. However, there are other aspects such as insights about quality or issues of the code, which are overlooked by earlier approaches. In this paper, we describe a mining approach that recommends insightful co…
▽ More
Recently, automatic code comment generation is proposed to facilitate program comprehension. Existing code comment generation techniques focus on describing the functionality of the source code. However, there are other aspects such as insights about quality or issues of the code, which are overlooked by earlier approaches. In this paper, we describe a mining approach that recommends insightful comments about the quality, deficiencies or scopes for further improvement of the source code. First, we conduct an exploratory study that motivates crowdsourced knowledge from Stack Overflow discussions as a potential resource for source code comment recommendation. Second, based on the findings from the exploratory study, we propose a heuristic-based technique for mining insightful comments from Stack Overflow Q & A site for source code comment recommendation. Experiments with 292 Stack Overflow code segments and 5,039 discussion comments show that our approach has a promising recall of 85.42%. We also conducted a complementary user study which confirms the accuracy and usefulness of the recommended comments.
△ Less
Submitted 6 July, 2018;
originally announced July 2018.
-
Recommending Relevant Sections from a Webpage about Programming Errors and Exceptions
Authors:
Mohammad Masudur Rahman,
Chanchal K. Roy
Abstract:
Programming errors or exceptions are inherent in software development and maintenance, and given today's Internet era, software developers often look at web for finding working solutions. They make use of a search engine for retrieving relevant pages, and then look for the appropriate solutions by manually going through the pages one by one. However, both the manual checking of a page's content ag…
▽ More
Programming errors or exceptions are inherent in software development and maintenance, and given today's Internet era, software developers often look at web for finding working solutions. They make use of a search engine for retrieving relevant pages, and then look for the appropriate solutions by manually going through the pages one by one. However, both the manual checking of a page's content against a given exception (and its context) and then working an appropriate solution out are non-trivial tasks. They are even more complex and time-consuming with the bulk of irrelevant (i.e., off-topic) and noisy (e.g., advertisements) content in the web page. In this paper, we propose an IDE-based and context-aware page content recommendation technique that locates and recommends relevant sections from a given web page by exploiting the technical details, in particular, the context of an encountered exception in the IDE. An evaluation with 250 web pages related to 80 programming exceptions, comparison with the only available closely related technique, and a case study involving comparison with VSM and LSA techniques show that the proposed technique is highly promising in terms of precision, recall and F1-measure.
△ Less
Submitted 6 July, 2018;
originally announced July 2018.
-
TextRank Based Search Term Identification for Software Change Tasks
Authors:
Mohammad Masudur Rahman,
Chanchal K. Roy
Abstract:
During maintenance, software developers deal with a number of software change requests. Each of those requests is generally written using natural language texts, and it involves one or more domain related concepts. A developer needs to map those concepts to exact source code locations within the project in order to implement the requested change. This map** generally starts with a search within…
▽ More
During maintenance, software developers deal with a number of software change requests. Each of those requests is generally written using natural language texts, and it involves one or more domain related concepts. A developer needs to map those concepts to exact source code locations within the project in order to implement the requested change. This map** generally starts with a search within the project that requires one or more suitable search terms. Studies suggest that the developers often perform poorly in coming up with good search terms for a change task. In this paper, we propose and evaluate a novel TextRank-based technique that automatically identifies and suggests search terms for a software change task by analyzing its task description. Experiments with 349 change tasks from two subject systems and comparison with one of the latest and closely related state-of-the-art approaches show that our technique is highly promising in terms of suggestion accuracy, mean average precision and recall.
△ Less
Submitted 6 July, 2018;
originally announced July 2018.
-
On the Use of Context in Recommending Exception Handling Code Examples
Authors:
Mohammad Masudur Rahman,
Chanchal K. Roy
Abstract:
Studies show that software developers often either misuse exception handling features or use them inefficiently, and such a practice may lead an undergoing software project to a fragile, insecure and non-robust application system. In this paper, we propose a context-aware code recommendation approach that recommends exception handling code examples from a number of popular open source code reposit…
▽ More
Studies show that software developers often either misuse exception handling features or use them inefficiently, and such a practice may lead an undergoing software project to a fragile, insecure and non-robust application system. In this paper, we propose a context-aware code recommendation approach that recommends exception handling code examples from a number of popular open source code repositories hosted at GitHub. It collects the code examples exploiting GitHub code search API, and then analyzes, filters and ranks them against the code under development in the IDE by leveraging not only the structural (i.e., graph-based) and lexical features but also the heuristic quality measures of exception handlers in the examples. Experiments with 4,400 code examples and 65 exception handling scenarios as well as comparisons with four existing approaches show that the proposed approach is highly promising.
△ Less
Submitted 6 July, 2018;
originally announced July 2018.
-
SurfClipse: Context-Aware Meta Search in the IDE
Authors:
Mohammad Masudur Rahman,
Chanchal K. Roy
Abstract:
Despite various debugging supports of the existing IDEs for programming errors and exceptions, software developers often look at web for working solutions or any up-to-date information. Traditional web search does not consider the context of the problems that they search solutions for, and thus it often does not help much in problem solving. In this paper, we propose a context-aware meta search to…
▽ More
Despite various debugging supports of the existing IDEs for programming errors and exceptions, software developers often look at web for working solutions or any up-to-date information. Traditional web search does not consider the context of the problems that they search solutions for, and thus it often does not help much in problem solving. In this paper, we propose a context-aware meta search tool, SurfClipse, that analyzes an encountered exception and its context in the IDE, and recommends not only suitable search queries but also relevant web pages for the exception (and its context). The tool collects results from three popular search engines and a programming Q & A site against the exception in the IDE, refines the results for relevance against the context of the exception, and then ranks them before recommendation. It provides two working modes--interactive and proactive to meet the versatile needs of the developers, and one can browse the result pages using a customized embedded browser provided by the tool.
Tool page: www.usask.ca/~masud.rahman/surfclipse
△ Less
Submitted 6 July, 2018;
originally announced July 2018.
-
Towards a Context-Aware IDE-Based Meta Search Engine for Recommendation about Programming Errors and Exceptions
Authors:
Mohammad Masudur Rahman,
Shamima Yeasmin,
Chanchal K. Roy
Abstract:
Study shows that software developers spend about 19% of their time looking for information in the web during software development and maintenance. Traditional web search forces them to leave the working environment (e.g., IDE) and look for information in the web browser. It also does not consider the context of the problems that the developers search solutions for. The frequent switching between w…
▽ More
Study shows that software developers spend about 19% of their time looking for information in the web during software development and maintenance. Traditional web search forces them to leave the working environment (e.g., IDE) and look for information in the web browser. It also does not consider the context of the problems that the developers search solutions for. The frequent switching between web browser and the IDE is both time-consuming and distracting, and the keyword-based traditional web search often does not help much in problem solving. In this paper, we propose an Eclipse IDE-based web search solution that exploits the APIs provided by three popular web search engines-- Google, Yahoo, Bing and a popular programming Q & A site, Stack Overflow, and captures the content-relevance, context-relevance, popularity and search engine confidence of each candidate result against the encountered programming problems. Experiments with 75 programming errors and exceptions using the proposed approach show that inclusion of different types of context information associated with a given exception can enhance the recommendation accuracy of a given exception. Experiments both with two existing approaches and existing web search engines confirm that our approach can perform better than them in terms of recall, mean precision and other performance measures with little computational cost.
△ Less
Submitted 6 July, 2018;
originally announced July 2018.
-
An IDE-Based Context-Aware Meta Search Engine
Authors:
Mohammad Masudur Rahman,
Shamima Yeasmin,
Chanchal K. Roy
Abstract:
Traditional web search forces the developers to leave their working environments and look for solutions in the web browsers. It often does not consider the context of their programming problems. The context-switching between the web browser and the working environment is time-consuming and distracting, and the keyword-based traditional search often does not help much in problem solving. In this pa…
▽ More
Traditional web search forces the developers to leave their working environments and look for solutions in the web browsers. It often does not consider the context of their programming problems. The context-switching between the web browser and the working environment is time-consuming and distracting, and the keyword-based traditional search often does not help much in problem solving. In this paper, we propose an Eclipse IDE-based web search solution that collects the data from three web search APIs-- Google, Yahoo, Bing and a programming Q & A site-- Stack Overflow. It then provides search results within IDE taking not only the content of the selected error into account but also the problem context, popularity and search engine recommendation of the result links. Experiments with 25 run time errors and exceptions show that the proposed approach outperforms the keyword-based search approaches with a recommendation accuracy of 96%. We also validate the results with a user study involving five prospective participants where we get a result agreement of 64.28%. While the preliminary results are promising, the approach needs to be further validated with more errors and exceptions followed by a user study with more participants to establish itself as a complete IDE-based web search solution.
△ Less
Submitted 5 July, 2018;
originally announced July 2018.
-
An Insight into the Pull Requests of GitHub
Authors:
Mohammad Masudur Rahman,
Chanchal K. Roy
Abstract:
Given the increasing number of unsuccessful pull requests in GitHub projects, insights into the success and failure of these requests are essential for the developers. In this paper, we provide a comparative study between successful and unsuccessful pull requests made to 78 GitHub base projects by 20,142 developers from 103,192 forked projects. In the study, we analyze pull request discussion text…
▽ More
Given the increasing number of unsuccessful pull requests in GitHub projects, insights into the success and failure of these requests are essential for the developers. In this paper, we provide a comparative study between successful and unsuccessful pull requests made to 78 GitHub base projects by 20,142 developers from 103,192 forked projects. In the study, we analyze pull request discussion texts, project specific information (e.g., domain, maturity), and developer specific information (e.g., experience) in order to report useful insights, and use them to contrast between successful and unsuccessful pull requests. We believe our study will help developers overcome the issues with pull requests in GitHub, and project administrators with informed decision making.
△ Less
Submitted 5 July, 2018;
originally announced July 2018.
-
Impact of Continuous Integration on Code Reviews
Authors:
Mohammad Masudur Rahman,
Chanchal K. Roy
Abstract:
Peer code review and continuous integration often interleave with each other in the modern software quality management. Although several studies investigate how non-technical factors (e.g., reviewer workload), developer participation and even patch size affect the code review process, the impact of continuous integration on code reviews is not yet properly understood. In this paper, we report an e…
▽ More
Peer code review and continuous integration often interleave with each other in the modern software quality management. Although several studies investigate how non-technical factors (e.g., reviewer workload), developer participation and even patch size affect the code review process, the impact of continuous integration on code reviews is not yet properly understood. In this paper, we report an exploratory study using 578K automated build entries where we investigate the impact of automated builds on the code reviews. Our investigation suggests that successfully passed builds are more likely to encourage new code review participation in a pull request. Frequently built projects are found to be maintaining a steady level of reviewing activities over the years, which was quite missing from the rarely built projects. Experiments with 26,516 automated build entries reported that our proposed model can identify 64% of the builds that triggered new code reviews later.
△ Less
Submitted 5 July, 2018;
originally announced July 2018.
-
An Insight into the Unresolved Questions at Stack Overflow
Authors:
Mohammad Masudur Rahman,
Chanchal K. Roy
Abstract:
For a significant number of questions at Stack Overflow, none of the posted answers were accepted as solutions. Acceptance of an answer indicates that the answer actually solves the discussed problem in the question, and the question is answered sufficiently. In this paper, we investigate 3,956 such unresolved questions using an exploratory study where we analyze four important aspects of those qu…
▽ More
For a significant number of questions at Stack Overflow, none of the posted answers were accepted as solutions. Acceptance of an answer indicates that the answer actually solves the discussed problem in the question, and the question is answered sufficiently. In this paper, we investigate 3,956 such unresolved questions using an exploratory study where we analyze four important aspects of those questions, their answers and the corresponding users that partially explain the observed scenario. We then propose a prediction model by employing five metrics related to user behaviour, topics and popularity of question, which predicts if the best answer for a question at Stack Overflow might remain unaccepted or not. Experiments using 8,057 questions show that the model can predict unresolved questions with 78.70% precision and 76.10% recall.
△ Less
Submitted 5 July, 2018;
originally announced July 2018.
-
SourcererCC: Scaling Code Clone Detection to Big Code
Authors:
Hitesh Sajnani,
Vaibhav Saini,
Jeffrey Svajlenko,
Chanchal K. Roy,
Cristina V. Lopes
Abstract:
Despite a decade of active research, there is a marked lack in clone detectors that scale to very large repositories of source code, in particular for detecting near-miss clones where significant editing activities may take place in the cloned code. We present SourcererCC, a token-based clone detector that targets three clone types, and exploits an index to achieve scalability to large inter-proje…
▽ More
Despite a decade of active research, there is a marked lack in clone detectors that scale to very large repositories of source code, in particular for detecting near-miss clones where significant editing activities may take place in the cloned code. We present SourcererCC, a token-based clone detector that targets three clone types, and exploits an index to achieve scalability to large inter-project repositories using a standard workstation. SourcererCC uses an optimized inverted-index to quickly query the potential clones of a given code block. Filtering heuristics based on token ordering are used to significantly reduce the size of the index, the number of code-block comparisons needed to detect the clones, as well as the number of required token-comparisons needed to judge a potential clone.
We evaluate the scalability, execution time, recall and precision of SourcererCC, and compare it to four publicly available and state-of-the-art tools. To measure recall, we use two recent benchmarks, (1) a large benchmark of real clones, BigCloneBench, and (2) a Mutation/Injection-based framework of thousands of fine-grained artificial clones. We find SourcererCC has both high recall and precision, and is able to scale to a large inter-project repository (250MLOC) using a standard workstation.
△ Less
Submitted 20 December, 2015;
originally announced December 2015.