Search | arXiv e-print repository

Homophilic organization of egocentric communities in ICT services

Authors: Chandreyee Roy, Hang-Hyun Jo, János Kertész, Kimmo Kaski, János Török

Abstract: Members of a society can be characterized by a large number of features, such as gender, age, ethnicity, religion, social status, and shared activities. One of the main tie-forming factors between individuals in human societies is homophily, the tendency of being attracted to similar others. Homophily has been mainly studied with focus on one of the features and little is known about the roles of… ▽ More Members of a society can be characterized by a large number of features, such as gender, age, ethnicity, religion, social status, and shared activities. One of the main tie-forming factors between individuals in human societies is homophily, the tendency of being attracted to similar others. Homophily has been mainly studied with focus on one of the features and little is known about the roles of similarities of different origins in the formation of communities. To close this gap, we analyze three datasets from Information and Communications Technology (ICT) services, namely, two online social networks and a network deduced from mobile phone calls, in all of which metadata about individual features are available. We identify communities within egocentric networks and surprisingly find that the larger the community is, the more overlap is found between features of its members and the ego. We interpret this finding in terms of the effort needed to manage the communities; the larger diversity requires more effort such that to maintain a large diverse group may exceed the capacity of the members. As the ego reaches out to her alters on an ICT service, we observe that the first alter in each community tends to have a higher feature overlap with the ego than the rest. Moreover the feature overlap of the ego with all her alters displays a non-monotonic behaviors as a function of the ego's degree. We propose a simple mechanism of how people add links in their egocentric networks of alters that reproduces all the empirical observations and shows the reason behind non-monotonic tendency of the egocentric feature overlap as a function of the ego's degree. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: 8 pages, 7 figures, 1 table

arXiv:2402.04575 [pdf, other]

Can We Identify Stack Overflow Questions Requiring Code Snippets? Investigating the Cause & Effect of Missing Code Snippets

Authors: Saikat Mondal, Mohammad Masudur Rahman, Chanchal K. Roy

Abstract: On the Stack Overflow (SO) Q&A site, users often request solutions to their code-related problems (e.g., errors, unexpected behavior). Unfortunately, they often miss required code snippets during their question submission, which could prevent their questions from getting prompt and appropriate answers. In this study, we conduct an empirical study investigating the cause & effect of missing code sn… ▽ More On the Stack Overflow (SO) Q&A site, users often request solutions to their code-related problems (e.g., errors, unexpected behavior). Unfortunately, they often miss required code snippets during their question submission, which could prevent their questions from getting prompt and appropriate answers. In this study, we conduct an empirical study investigating the cause & effect of missing code snippets in SO questions whenever required. Here, our contributions are threefold. First, we analyze how the presence or absence of required code snippets affects the correlation between question types (missed code, included code after requests & had code snippets during submission) and corresponding answer meta-data (e.g., presence of an accepted answer). According to our analysis, the chance of getting accepted answers is three times higher for questions that include required code snippets during their question submission than those that missed the code. We also investigate whether the confounding factors (e.g., user reputation) affect questions receiving answers besides the presence or absence of required code snippets. We found that such factors do not hurt the correlation between the presence or absence of required code snippets and answer meta-data. Second, we surveyed 64 practitioners to understand why users miss necessary code snippets. About 60% of them agree that users are unaware of whether their questions require any code snippets. Third, we thus extract four text-based features (e.g., keywords) and build six ML models to identify the questions that need code snippets. Our models can predict the target questions with 86.5% precision, 90.8% recall, 85.3% F1-score, and 85.2% overall accuracy. Our work has the potential to save significant time in programming question-answering and improve the quality of the valuable knowledge base by decreasing unanswered and unresolved questions. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: This paper has been accepted for inclusion in the International Conference on Software Analysis, Evolution, and Reengineering (SANER 2024) technical program

arXiv:2402.04568 [pdf, other]

Enhancing User Interaction in ChatGPT: Characterizing and Consolidating Multiple Prompts for Issue Resolution

Authors: Saikat Mondal, Suborno Deb Bappon, Chanchal K. Roy

Abstract: Prompt design plays a crucial role in sha** the efficacy of ChatGPT, influencing the model's ability to extract contextually accurate responses. Thus, optimal prompt construction is essential for maximizing the utility and performance of ChatGPT. However, sub-optimal prompt design may necessitate iterative refinement, as imprecise or ambiguous instructions can lead to undesired responses from Ch… ▽ More Prompt design plays a crucial role in sha** the efficacy of ChatGPT, influencing the model's ability to extract contextually accurate responses. Thus, optimal prompt construction is essential for maximizing the utility and performance of ChatGPT. However, sub-optimal prompt design may necessitate iterative refinement, as imprecise or ambiguous instructions can lead to undesired responses from ChatGPT. Existing studies explore several prompt patterns and strategies to improve the relevance of responses generated by ChatGPT. However, the exploration of constraints that necessitate the submission of multiple prompts is still an unmet attempt. In this study, our contributions are twofold. First, we attempt to uncover gaps in prompt design that demand multiple iterations. In particular, we manually analyze 686 prompts that were submitted to resolve issues related to Java and Python programming languages and identify eleven prompt design gaps (e.g., missing specifications). Such gap exploration can enhance the efficacy of single prompts in ChatGPT. Second, we attempt to reproduce the ChatGPT response by consolidating multiple prompts into a single one. We can completely consolidate prompts with four gaps (e.g., missing context) and partially consolidate prompts with three gaps (e.g., additional functionality). Such an effort provides concrete evidence to users to design more optimal prompts mitigating these gaps. Our study findings and evidence can - (a) save users time, (b) reduce costs, and (c) increase user satisfaction. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: This paper has been accepted at the 21st International Conference on Mining Software Repositories (MSR 2024)

arXiv:2402.03735 [pdf, other]

Investigating the Utility of ChatGPT in the Issue Tracking System: An Exploratory Study

Authors: Joy Krishan Das, Saikat Mondal, Chanchal K. Roy

Abstract: Issue tracking systems serve as the primary tool for incorporating external users and customizing a software project to meet the users' requirements. However, the limited number of contributors and the challenge of identifying the best approach for each issue often impede effective resolution. Recently, an increasing number of developers are turning to AI tools like ChatGPT to enhance problem-solv… ▽ More Issue tracking systems serve as the primary tool for incorporating external users and customizing a software project to meet the users' requirements. However, the limited number of contributors and the challenge of identifying the best approach for each issue often impede effective resolution. Recently, an increasing number of developers are turning to AI tools like ChatGPT to enhance problem-solving efficiency. While previous studies have demonstrated the potential of ChatGPT in areas such as automatic program repair, debugging, and code generation, there is a lack of study on how developers explicitly utilize ChatGPT to resolve issues in their tracking system. Hence, this study aims to examine the interaction between ChatGPT and developers to analyze their prevalent activities and provide a resolution. In addition, we assess the code reliability by confirming if the code produced by ChatGPT was integrated into the project's codebase using the clone detection tool NiCad. Our investigation reveals that developers mainly use ChatGPT for brainstorming solutions but often opt to write their code instead of using ChatGPT-generated code, possibly due to concerns over the generation of "hallucinated code", as highlighted in the literature. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: Accepted in MSR 2024

arXiv:2312.03182 [pdf, other]

Investigating Technology Usage Span by Analyzing Users' Q&A Traces in Stack Overflow

Authors: Saikat Mondal, Debajyoti Mondal, Chanchal K. Roy

Abstract: Choosing an appropriate software development technology (e.g., programming language) is challenging due to the proliferation of diverse options. The selection of inappropriate technologies for development may have a far-reaching effect on software developers' career growth. Switching to a different technology after working with one may lead to a complex learning curve and, thus, be more challengin… ▽ More Choosing an appropriate software development technology (e.g., programming language) is challenging due to the proliferation of diverse options. The selection of inappropriate technologies for development may have a far-reaching effect on software developers' career growth. Switching to a different technology after working with one may lead to a complex learning curve and, thus, be more challenging. Therefore, it is crucial for software developers to find technologies that have a high usage span. Intuitively, the usage span of a technology can be determined by the time span developers have used that technology. Existing literature focuses on the technology landscape to explore the complex and implicit dependencies among technologies but lacks formal studies to draw insights about their usage span. This paper investigates the technology usage span by analyzing the question and answering (Q&A) traces of Stack Overflow (SO), the largest technical Q&A website available to date. In particular, we analyze 6.7 million Q&A traces posted by about 97K active SO users and see what technologies have appeared in their questions or answers over 15 years. According to our analysis, C# and Java programming languages have a high usage span, followed by JavaScript. Besides, developers used the .NET framework, iOS & Windows Operating Systems (OS), and SQL query language for a long time (on average). Our study also exposes the emerging (i.e., newly growing) technologies. For example, usages of technologies such as SwiftUI, .NET-6.0, Visual Studio 2022, and Blazor WebAssembly framework are increasing. The findings from our study can assist novice developers, startup software industries, and software users in determining appropriate technologies. This also establishes an initial benchmark for future investigation on the use span of software technologies. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: Accepted in the 30th Asia-Pacific Software Engineering Conference (APSEC 2023)

arXiv:2311.13652 [pdf, other]

Differences of communication activity and mobility patterns between urban and rural people

Authors: Fumiko Ogushi, Chandreyee Roy, Kimmo Kaski

Abstract: Human mobility and other social activity patterns influence various aspects of society such as urban planning, traffic predictions, crisis resilience, and epidemic prevention. The behaviour of individuals, like their communication frequencies and movements, are shaped by societal and socio-economic factors. In addition, the differences in the geolocation of people as well as their gender and age c… ▽ More Human mobility and other social activity patterns influence various aspects of society such as urban planning, traffic predictions, crisis resilience, and epidemic prevention. The behaviour of individuals, like their communication frequencies and movements, are shaped by societal and socio-economic factors. In addition, the differences in the geolocation of people as well as their gender and age cast effects on their activity patterns. In this study we focus on investigating these patterns by using mobile phone data, specifically the call detail records (CDRs), to analyze the social communication and mobility patterns of people. This dataset can provide us insight into the individual and population-level behaviours in rural and urban environments on a daily, weekly and seasonal basis. The results of our analyses show that in the urban areas people have high calling activity but low mobility, while in the rural areas they show the opposite behaviour, i.e. low calling activity combined with high mobility. Overall, there is a decreasing trend in people's mobility through the year even though their calling activity remained consistent except for the holidays during which time the communication frequency drops markedly. We have also observed that there are significant differences in the mobility between the work days and free days. Finally, the age and gender of individuals have also been observed to play a role in the seasonal patterns differently in urban and rural areas. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: 16 pages, 9 figures in main text

arXiv:2309.06424 [pdf]

Unveiling the potential of large language models in generating semantic and cross-language clones

Authors: Palash R. Roy, Ajmain I. Alam, Farouq Al-omari, Banani Roy, Chanchal K. Roy, Kevin A. Schneider

Abstract: Semantic and Cross-language code clone generation may be useful for code reuse, code comprehension, refactoring and benchmarking. OpenAI's GPT model has potential in such clone generation as GPT is used for text generation. When developers copy/paste codes from Stack Overflow (SO) or within a system, there might be inconsistent changes leading to unexpected behaviours. Similarly, if someone posses… ▽ More Semantic and Cross-language code clone generation may be useful for code reuse, code comprehension, refactoring and benchmarking. OpenAI's GPT model has potential in such clone generation as GPT is used for text generation. When developers copy/paste codes from Stack Overflow (SO) or within a system, there might be inconsistent changes leading to unexpected behaviours. Similarly, if someone possesses a code snippet in a particular programming language but seeks equivalent functionality in a different language, a semantic cross-language code clone generation approach could provide valuable assistance. In this study, using SemanticCloneBench as a vehicle, we evaluated how well the GPT-3 model could help generate semantic and cross-language clone variants for a given fragment.We have comprised a diverse set of code fragments and assessed GPT-3s performance in generating code variants.Through extensive experimentation and analysis, where 9 judges spent 158 hours to validate, we investigate the model's ability to produce accurate and semantically correct variants. Our findings shed light on GPT-3's strengths in code generation, offering insights into the potential applications and challenges of using advanced language models in software development. Our quantitative analysis yields compelling results. In the realm of semantic clones, GPT-3 attains an impressive accuracy of 62.14% and 0.55 BLEU score, achieved through few-shot prompt engineering. Furthermore, the model shines in transcending linguistic confines, boasting an exceptional 91.25% accuracy in generating cross-language clones △ Less

Submitted 12 September, 2023; originally announced September 2023.

Comments: Accepted in IWSC

arXiv:2308.13963 [pdf]

GPTCloneBench: A comprehensive benchmark of semantic clones and cross-language clones using GPT-3 model and SemanticCloneBench

Authors: Ajmain Inqiad Alam, Palash Ranjan Roy, Farouq Al-omari, Chanchal Kumar Roy, Banani Roy, Kevin Schneider

Abstract: With the emergence of Machine Learning, there has been a surge in leveraging its capabilities for problem-solving across various domains. In the code clone realm, the identification of type-4 or semantic clones has emerged as a crucial yet challenging task. Researchers aim to utilize Machine Learning to tackle this challenge, often relying on the BigCloneBench dataset. However, it's worth noting t… ▽ More With the emergence of Machine Learning, there has been a surge in leveraging its capabilities for problem-solving across various domains. In the code clone realm, the identification of type-4 or semantic clones has emerged as a crucial yet challenging task. Researchers aim to utilize Machine Learning to tackle this challenge, often relying on the BigCloneBench dataset. However, it's worth noting that BigCloneBench, originally not designed for semantic clone detection, presents several limitations that hinder its suitability as a comprehensive training dataset for this specific purpose. Furthermore, CLCDSA dataset suffers from a lack of reusable examples aligning with real-world software systems, rendering it inadequate for cross-language clone detection approaches. In this work, we present a comprehensive semantic clone and cross-language clone benchmark, GPTCloneBench by exploiting SemanticCloneBench and OpenAI's GPT-3 model. In particular, using code fragments from SemanticCloneBench as sample inputs along with appropriate prompt engineering for GPT-3 model, we generate semantic and cross-language clones for these specific fragments and then conduct a combination of extensive manual analysis, tool-assisted filtering, functionality testing and automated validation in building the benchmark. From 79,928 clone pairs of GPT-3 output, we created a benchmark with 37,149 true semantic clone pairs, 19,288 false semantic pairs(Type-1/Type-2), and 20,770 cross-language clones across four languages (Java, C, C#, and Python). Our benchmark is 15-fold larger than SemanticCloneBench, has more functional code examples for software systems and programming language support than CLCDSA, and overcomes BigCloneBench's qualities, quantification, and language variety limitations. △ Less

Submitted 1 September, 2023; v1 submitted 26 August, 2023; originally announced August 2023.

Comments: Accepted in 39th IEEE International Conference on Software Maintenance and Evolution(ICSME 2023)

arXiv:2306.16171 [pdf]

A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges

Authors: Morteza Zakeri-Nasrabadi, Saeed Parsa, Mohammad Ramezani, Chanchal Roy, Masoud Ekhtiarzadeh

Abstract: Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the exist… ▽ More Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: 49 pages, 10 figures, 6 tables

arXiv:2306.14011 [pdf, other]

Machine Learning-driven Autotuning of Graphics Processing Unit Accelerated Computational Fluid Dynamics for Enhanced Performance

Authors: Weicheng Xue, Christohper John Roy

Abstract: Optimizing the performance of computational fluid dynamics (CFD) applications accelerated by graphics processing units (GPUs) is crucial for efficient simulations. In this study, we employed a machine learning-based autotuning technique to optimize 14 key parameters related to GPU kernel scheduling, including the number of thread blocks and threads within a block. Our approach utilizes fully conne… ▽ More Optimizing the performance of computational fluid dynamics (CFD) applications accelerated by graphics processing units (GPUs) is crucial for efficient simulations. In this study, we employed a machine learning-based autotuning technique to optimize 14 key parameters related to GPU kernel scheduling, including the number of thread blocks and threads within a block. Our approach utilizes fully connected neural networks as the underlying machine learning model, with the tuning parameters as inputs to the neural networks and the actual execution time of a simulation as the outputs. To assess the effectiveness of our autotuning approach, we conducted experiments on three different types of GPUs, with computational speeds ranging from low to high. We performed independent training for each GPU model and also explored combined training across multiple GPU models. By leveraging artificial neural networks, our autotuning technique achieved remarkable results in tuning a wide range of parameters, leading to enhanced performance for a CFD code. Importantly, our approach demonstrated its efficacy while requiring only a small fraction of samples from the large parameter search space. This efficiency is attributed to the effectiveness of the fully connected neural networks in capturing the complex relationships between the parameter settings and the resulting performance. Overall, our study showcases the potential of machine learning, specifically fully connected neural networks, in autotuning GPU-accelerated CFD codes. By leveraging this approach, researchers and practitioners can achieve high performance in scientific simulations with optimized parameter configurations. △ Less

Submitted 20 February, 2024; v1 submitted 24 June, 2023; originally announced June 2023.

arXiv:2305.18057 [pdf, other]

CPU-GPU Heterogeneous Code Acceleration of a Finite Volume Computational Fluid Dynamics Solver

Authors: Weicheng Xue, Hongyu Wang, Christopher J. Roy

Abstract: This work deals with the CPU-GPU heterogeneous code acceleration of a finite-volume CFD solver utilizing multiple CPUs and GPUs at the same time. First, a high-level description of the CFD solver called SENSEI, the discretization of SENSEI, and the CPU-GPU heterogeneous computing workflow in SENSEI leveraging MPI and OpenACC are given. Then, a performance model for CPU-GPU heterogeneous computing… ▽ More This work deals with the CPU-GPU heterogeneous code acceleration of a finite-volume CFD solver utilizing multiple CPUs and GPUs at the same time. First, a high-level description of the CFD solver called SENSEI, the discretization of SENSEI, and the CPU-GPU heterogeneous computing workflow in SENSEI leveraging MPI and OpenACC are given. Then, a performance model for CPU-GPU heterogeneous computing requiring ghost cell exchange is proposed to help estimate the performance of the heterogeneous implementation. The scaling performance of the CPU-GPU heterogeneous computing and its comparison with the pure multi-CPU/GPU performance for a supersonic inlet test case is presented to display the advantages of leveraging the computational power of both the CPU and the GPU. Using CPUs and GPUs as workers together, the performance can be improved further compared to using pure CPUs or GPUs, and the advantages can be fairly estimated by the performance model proposed in this work. Finally, conclusions are drawn to provide 1) suggestions for application users who have an interest to leverage the computational power of the CPU and GPU to accelerate their own scientific computing simulations and 2) feedback for hardware architects who have an interest to design a better CPU-GPU heterogeneous system for heterogeneous computing. △ Less

Submitted 29 May, 2023; originally announced May 2023.

arXiv:2304.03563 [pdf, other]

Do Subjectivity and Objectivity Always Agree? A Case Study with Stack Overflow Questions

Authors: Saikat Mondal, Mohammad Masudur Rahman, Chanchal K. Roy

Abstract: In Stack Overflow (SO), the quality of posts (i.e., questions and answers) is subjectively evaluated by users through a voting mechanism. The net votes (upvotes - downvotes) obtained by a post are often considered an approximation of its quality. However, about half of the questions that received working solutions got more downvotes than upvotes. Furthermore, about 18% of the accepted answers (i.e… ▽ More In Stack Overflow (SO), the quality of posts (i.e., questions and answers) is subjectively evaluated by users through a voting mechanism. The net votes (upvotes - downvotes) obtained by a post are often considered an approximation of its quality. However, about half of the questions that received working solutions got more downvotes than upvotes. Furthermore, about 18% of the accepted answers (i.e., verified solutions) also do not score the maximum votes. All these counter-intuitive findings cast doubts on the reliability of the evaluation mechanism employed at SO. Moreover, many users raise concerns against the evaluation, especially downvotes to their posts. Therefore, rigorous verification of the subjective evaluation is highly warranted to ensure a non-biased and reliable quality assessment mechanism. In this paper, we compare the subjective assessment of questions with their objective assessment using 2.5 million questions and ten text analysis metrics. According to our investigation, four objective metrics agree with the subjective evaluation, two do not agree, one either agrees or disagrees, and the remaining three neither agree nor disagree with the subjective evaluation. We then develop machine learning models to classify the promoted and discouraged questions. Our models outperform the state-of-the-art models with a maximum of about 76% - 87% accuracy. △ Less

Submitted 7 April, 2023; originally announced April 2023.

Comments: Accepted in the International Conference on Mining Software Repositories (MSR 2023)

arXiv:2303.01435 [pdf, other]

Pathways to Leverage Transcompiler based Data Augmentation for Cross-Language Clone Detection

Authors: Subroto Nag Pinku, Debajyoti Mondal, Chanchal K. Roy

Abstract: Software clones are often introduced when developers reuse code fragments to implement similar functionalities in the same or different software systems. Many high-performing clone detection tools today are based on deep learning techniques and are mostly used for detecting clones written in the same programming language, whereas clone detection tools for detecting cross-language clones are also e… ▽ More Software clones are often introduced when developers reuse code fragments to implement similar functionalities in the same or different software systems. Many high-performing clone detection tools today are based on deep learning techniques and are mostly used for detecting clones written in the same programming language, whereas clone detection tools for detecting cross-language clones are also emerging rapidly. The popularity of deep learning-based clone detection tools creates an opportunity to investigate how known strategies that boost the performances of deep learning models could be further leveraged to improve clone detection tools. In this paper, we investigate such a strategy, data augmentation, which has not yet been explored for cross-language clone detection as opposed to single-language clone detection. We show how the existing knowledge on transcompilers (source-to-source translators) can be used for data augmentation to boost the performance of cross-language clone detection models, as well as to adapt single-language clone detection models to create cross-language clone detection pipelines. To demonstrate the performance boost for cross-language clone detection through data augmentation, we exploit Transcoder, which is a pre-trained source-to-source translator. To show how to extend single-language models for cross-language clone detection, we extend a popular single-language model, Graph Matching Network (GMN) in a combination with the transcompilers. We evaluated our models on popular benchmark datasets. Our experimental results showed improvements in F1 scores (sometimes up to 3%) for the cutting-edge cross-language clone detection models. Even when extending GMN for cross-language clone detection, the models built leveraging data augmentation outperformed the baseline with scores of 0.90, 0.92, and 0.91 for precision, recall, and F1 score, respectively. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: Accepted at the 31st IEEE/ACM International Conference on Program Comprehension (ICPC 2023)

ACM Class: D.2; D.2.13

arXiv:2210.03281 [pdf, other]

Automatic Prediction of Rejected Edits in Stack Overflow

Authors: Saikat Mondal, Gias Uddin, Chanchal Roy

Abstract: The content quality of shared knowledge in Stack Overflow (SO) is crucial in supporting software developers with their programming problems. Thus, SO allows its users to suggest edits to improve the quality of a post (i.e., question and answer). However, existing research shows that many suggested edits in SO are rejected due to undesired contents/formats or violating edit guidelines. Such a scena… ▽ More The content quality of shared knowledge in Stack Overflow (SO) is crucial in supporting software developers with their programming problems. Thus, SO allows its users to suggest edits to improve the quality of a post (i.e., question and answer). However, existing research shows that many suggested edits in SO are rejected due to undesired contents/formats or violating edit guidelines. Such a scenario frustrates or demotivates users who would like to conduct good-quality edits. Therefore, our research focuses on assisting SO users by offering them suggestions on how to improve their editing of posts. First, we manually investigate 764 (382 questions + 382 answers) rejected edits by rollbacks and produce a catalog of 19 rejection reasons. Second, we extract 15 texts and user-based features to capture those rejection reasons. Third, we develop four machine learning models using those features. Our best-performing model can predict rejected edits with 69.1% precision, 71.2% recall, 70.1% F1-score, and 69.8% overall accuracy. Fourth, we introduce an online tool named EditEx that works with the SO edit system. EditEx can assist users while editing posts by suggesting the potential causes of rejections. We recruit 20 participants to assess the effectiveness of EditEx. Half of the participants (i.e., treatment group) use EditEx and another half (i.e., control group) use the SO standard edit system to edit posts. According to our experiment, EditEx can support SO standard edit system to prevent 49% of rejected edits, including the commonly rejected ones. However, it can prevent 12% rejections even in free-form regular edits. The treatment group finds the potential rejection reasons identified by EditEx influential. Furthermore, the median workload suggesting edits using EditEx is half compared to the SO edit system. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: Accepted for publication in Empirical Software Engineering (EMSE) journal

arXiv:2204.11449 [pdf, other]

OCFormer: One-Class Transformer Network for Image Classification

Authors: Prerana Mukherjee, Chandan Kumar Roy, Swalpa Kumar Roy

Abstract: We propose a novel deep learning framework based on Vision Transformers (ViT) for one-class classification. The core idea is to use zero-centered Gaussian noise as a pseudo-negative class for latent space representation and then train the network using the optimal loss function. In prior works, there have been tremendous efforts to learn a good representation using varieties of loss functions, whi… ▽ More We propose a novel deep learning framework based on Vision Transformers (ViT) for one-class classification. The core idea is to use zero-centered Gaussian noise as a pseudo-negative class for latent space representation and then train the network using the optimal loss function. In prior works, there have been tremendous efforts to learn a good representation using varieties of loss functions, which ensures both discriminative and compact properties. The proposed one-class Vision Transformer (OCFormer) is exhaustively experimented on CIFAR-10, CIFAR-100, Fashion-MNIST and CelebA eyeglasses datasets. Our method has shown significant improvements over competing CNN based one-class classifier approaches. △ Less

Submitted 25 April, 2022; originally announced April 2022.

arXiv:2204.03764 [pdf, other]

doi 10.1145/3524610.3527920

Backports: Change Types, Challenges and Strategies

Authors: Debasish Chakroborti, Kevin A. Schneider, Chanchal K. Roy

Abstract: Source code repositories allow developers to manage multiple versions (or branches) of a software system. Pull-requests are used to modify a branch, and backporting is a regular activity used to port changes from a current development branch to other versions. In open-source software, backports are common and often need to be adapted by hand, which motivates us to explore backports and backporting… ▽ More Source code repositories allow developers to manage multiple versions (or branches) of a software system. Pull-requests are used to modify a branch, and backporting is a regular activity used to port changes from a current development branch to other versions. In open-source software, backports are common and often need to be adapted by hand, which motivates us to explore backports and backporting challenges and strategies. In our exploration of 68,424 backports from 10 GitHub projects, we found that bug, test, document, and feature changes are commonly backported. We identified a number of backporting challenges, including that backports were inconsistently linked to their original pull-request (49%), that backports had incompatible code (13%), that backports failed to be accepted (10%), and that there were backporting delays (16 days to create, 5 days to merge). We identified some general strategies for addressing backporting issues. We also noted that backporting strategies depend on the project type and that further investigation is needed to determine their suitability. Furthermore, we created the first-ever backports dataset that can be used by other researchers and practitioners for investigating backports and backporting. △ Less

Submitted 7 April, 2022; originally announced April 2022.

Comments: In 30th International Conference on Program Comprehension (ICPC 22), May 16 to 17, 2022, Virtual Event, Pittsburgh

arXiv:2201.10137 [pdf, other]

Leveraging Structural Properties of Source Code Graphs for Just-In-Time Bug Prediction

Authors: Md Nadim, Debajyoti Mondal, Chanchal K. Roy

Abstract: The most common use of data visualization is to minimize the complexity for proper understanding. A graph is one of the most commonly used representations for understanding relational data. It produces a simplified representation of data that is challenging to comprehend if kept in a textual format. In this study, we propose a methodology to utilize the relational properties of source code in the… ▽ More The most common use of data visualization is to minimize the complexity for proper understanding. A graph is one of the most commonly used representations for understanding relational data. It produces a simplified representation of data that is challenging to comprehend if kept in a textual format. In this study, we propose a methodology to utilize the relational properties of source code in the form of a graph to identify Just-in-Time (JIT) bug prediction in software systems during different revisions of software evolution and maintenance. We presented a method to convert the source codes of commit patches to equivalent graph representations and named it Source Code Graph (SCG). To understand and compare multiple source code graphs, we extracted several structural properties of these graphs, such as the density, number of cycles, nodes, edges, etc. We then utilized the attribute values of those SCGs to visualize and detect buggy software commits. We process more than 246K software commits from 12 subject systems in this investigation. Our investigation on these 12 open-source software projects written in C++ and Java programming languages shows that if we combine the features from SCG with conventional features used in similar studies, we will get the increased performance of Machine Learning (ML) based buggy commit detection models. We also find the increase of F1~Scores in predicting buggy and non-buggy commits statistically significant using the Wilcoxon Signed Rank Test. Since SCG-based feature values represent the style or structural properties of source code updates or changes in the software system, it suggests the importance of careful maintenance of source code style or structure for kee** a software system bug-free. △ Less

Submitted 25 January, 2022; originally announced January 2022.

Comments: Has been accepted for publication Automated Software Engineering (AUSE), an International Journal published by Springer

arXiv:2201.07996 [pdf, other]

doi 10.1016/j.jss.2022.111229

Evaluating the Performance of Clone Detection Tools in Detecting Cloned Co-change Candidates

Authors: Md Nadim, Manishankar Mondal, Chanchal K. Roy, Kevin Schneider

Abstract: Co-change candidates are the group of code fragments that require a change if any of these fragments experience a modification in a commit operation during software evolution. The cloned co-change candidates are a subset of the co-change candidates, and the members in this subset are clones of one another. The cloned co-change candidates are usually created by reusing existing code fragments in a… ▽ More Co-change candidates are the group of code fragments that require a change if any of these fragments experience a modification in a commit operation during software evolution. The cloned co-change candidates are a subset of the co-change candidates, and the members in this subset are clones of one another. The cloned co-change candidates are usually created by reusing existing code fragments in a software system. Detecting cloned co-change candidates is essential for clone-tracking, and studies have shown that we can use clone detection tools to find cloned co-change candidates. However, although several studies evaluate clone detection tools for their accuracy in detecting cloned fragments, we found no study that evaluates clone detection tools for detecting cloned co-change candidates. In this study, we explore the dimension of code clone research for detecting cloned co-change candidates. We compare the performance of 12 different configurations of nine promising clone detection tools in identifying cloned co-change candidates from eight open-source C and Java-based subject systems of various sizes and application domains. A ranked list and analysis of the results provides valuable insights and guidelines into selecting and configuring a clone detection tool for identifying co-change candidates and leads to a new dimension of code clone research into change impact analysis. △ Less

Submitted 19 January, 2022; originally announced January 2022.

Comments: Has been accepted for publication in The Journal of Systems & Software (JSS)

arXiv:2112.07719 [pdf, other]

Decomposing the Deep: Finding Class Specific Filters in Deep CNNs

Authors: Akshay Badola, Cherian Roy, Vineet Padmanabhan, Rajendra Lal

Abstract: Interpretability of Deep Neural Networks has become a major area of exploration. Although these networks have achieved state of the art accuracy in many tasks, it is extremely difficult to interpret and explain their decisions. In this work we analyze the final and penultimate layers of Deep Convolutional Networks and provide an efficient method for identifying subsets of features that contribute… ▽ More Interpretability of Deep Neural Networks has become a major area of exploration. Although these networks have achieved state of the art accuracy in many tasks, it is extremely difficult to interpret and explain their decisions. In this work we analyze the final and penultimate layers of Deep Convolutional Networks and provide an efficient method for identifying subsets of features that contribute most towards the network's decision for a class. We demonstrate that the number of such features per class is much lower in comparison to the dimension of the final layer and therefore the decision surface of Deep CNNs lies on a low dimensional manifold and is proportional to the network depth. Our methods allow to decompose the final layer into separate subspaces which is far more interpretable and has a lower computational cost as compared to the final layer of the full network. △ Less

Submitted 3 April, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

Comments: 22 pages, 5 figures, 8 tables. github repo: https://github.com/akshaybadola/cnn-class-specific-filters-with-histogram. Preprint submitted to Elsevier. This version contains visualization of filters and ablation study w.r.t. influential features

arXiv:2111.12204 [pdf, other]

The Reproducibility of Programming-Related Issues in Stack Overflow Questions

Authors: Saikat Mondal, Mohammad Masudur Rahman, Chanchal K. Roy, Kevin Schneider

Abstract: Software developers often look for solutions to their code-level problems using the Stack Overflow Q&A website. To receive help, developers frequently submit questions containing sample code segments and the description of the programming issue. Unfortunately, it is not always possible to reproduce the issues from the code segments that may impede questions from receiving prompt and appropriate so… ▽ More Software developers often look for solutions to their code-level problems using the Stack Overflow Q&A website. To receive help, developers frequently submit questions containing sample code segments and the description of the programming issue. Unfortunately, it is not always possible to reproduce the issues from the code segments that may impede questions from receiving prompt and appropriate solutions. We conducted an exploratory study on the reproducibility of issues discussed in 400 Java and 400 Python questions. We parsed, compiled, executed, and carefully examined the code segments from these questions to reproduce the reported programming issues. The outcomes of our study are three-fold. First, we found that we can reproduce approximately 68% of Java and 71% of Python issues, whereas we were unable to reproduce approximately 22% of Java and 19% of Python issues using the code segments. Of the issues that were reproducible, approximately 67% of the Java code segments and 20% of the Python code segments required minor or major modifications to reproduce the issues. Second, we carefully investigated why programming issues could not be reproduced and provided evidence-based guidelines for writing effective code examples for Stack Overflow questions. Third, we investigated the correlation between the issue reproducibility status of questions and the corresponding answer meta-data, such as the presence of an accepted answer. According to our analysis, a reproducible question has at least two times higher chance of receiving an accepted answer than an irreproducible question. Besides, the median time delay in receiving accepted answers is double if the issues reported in questions could not be reproduced. We also investigate the confounding factors (e.g., reputation) and find that confounding factors do not hurt the correlation between reproducibility status and answer meta-data. △ Less

Submitted 25 December, 2021; v1 submitted 23 November, 2021; originally announced November 2021.

Comments: This study has been accepted for publication in Empirical Software Engineering EMSE) journal

arXiv:2111.03196 [pdf, other]

An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Datasets

Authors: Gias Uddin, Yann-Gael Gueheneuc, Foutse Khomh, Chanchal K Roy

Abstract: Sentiment analysis in software engineering (SE) has shown promise to analyze and support diverse development activities. We report the results of an empirical study that we conducted to determine the feasibility of develo** an ensemble engine by combining the polarity labels of stand-alone SE-specific sentiment detectors. Our study has two phases. In the first phase, we pick five SE-specific sen… ▽ More Sentiment analysis in software engineering (SE) has shown promise to analyze and support diverse development activities. We report the results of an empirical study that we conducted to determine the feasibility of develo** an ensemble engine by combining the polarity labels of stand-alone SE-specific sentiment detectors. Our study has two phases. In the first phase, we pick five SE-specific sentiment detection tools from two recently published papers by Lin et al. [31, 32], who first reported negative results with standalone sentiment detectors and then proposed an improved SE-specific sentiment detector, POME [31]. We report the study results on 17,581 units (sentences/documents) coming from six currently available sentiment benchmarks for SE. We find that the existing tools can be complementary to each other in 85-95% of the cases, i.e., one is wrong, but another is right. However, a majority voting-based ensemble of those tools fails to improve the accuracy of sentiment detection. We develop Sentisead, a supervised tool by combining the polarity labels and bag of words as features. Sentisead improves the performance (F1-score) of the individual tools by 4% (over Senti4SD [5]) - 100% (over POME [31]). In a second phase, we compare and improve Sentisead infrastructure using Pre-trained Transformer Models (PTMs). We find that a Sentisead infrastructure with RoBERTa as the ensemble of the five stand-alone rule-based and shallow learning SE-specific tools from Lin et al. [31, 32] offers the best F1-score of 0.805 across the six datasets, while a stand-alone RoBERTa shows an F1-score of 0.801. △ Less

Submitted 4 November, 2021; originally announced November 2021.

Journal ref: ACM Transactions on Software Engineering and Methodology (TOSEM), 2021

arXiv:2109.03624 [pdf, other]

FaBiAN: A Fetal Brain magnetic resonance Acquisition Numerical phantom

Authors: Hélène Lajous, Christopher W. Roy, Tom Hilbert, Priscille de Dumast, Sébastien Tourbier, Yasser Alemán-Gómez, Jérôme Yerly, Thomas Yu, Hamza Kebiri, Kelly Payette, Jean-Baptiste Ledoux, Reto Meuli, Patric Hagmann, Andras Jakab, Vincent Dunet, Mériam Koob, Tobias Kober, Matthias Stuber, Meritxell Bach Cuadra

Abstract: Accurate characterization of in utero human brain maturation is critical as it involves complex and interconnected structural and functional processes that may influence health later in life. Magnetic resonance imaging is a powerful tool to investigate equivocal neurological patterns during fetal development. However, the number of acquisitions of satisfactory quality available in this cohort of s… ▽ More Accurate characterization of in utero human brain maturation is critical as it involves complex and interconnected structural and functional processes that may influence health later in life. Magnetic resonance imaging is a powerful tool to investigate equivocal neurological patterns during fetal development. However, the number of acquisitions of satisfactory quality available in this cohort of sensitive subjects remains scarce, thus hindering the validation of advanced image processing techniques. Numerical phantoms can mitigate these limitations by providing a controlled environment with a known ground truth. In this work, we present FaBiAN, an open-source Fetal Brain magnetic resonance Acquisition Numerical phantom that simulates clinical T2-weighted fast spin echo sequences of the fetal brain. This unique tool is based on a general, flexible and realistic setup that includes stochastic fetal movements, thus providing images of the fetal brain throughout maturation comparable to clinical acquisitions. We demonstrate its value to evaluate the robustness and optimize the accuracy of an algorithm for super-resolution fetal brain magnetic resonance imaging from simulated motion-corrupted 2D low-resolution series as compared to a synthetic high-resolution reference volume. We also show that the images generated can complement clinical datasets to support data-intensive deep learning methods for fetal brain tissue segmentation. △ Less

Submitted 6 September, 2021; originally announced September 2021.

Comments: 23 pages, 9 figures (including Supplementary Material), 4 tables, 1 supplement. Submitted to Scientific Reports (2021)

arXiv:2109.00659 [pdf, other]

Semantic Slicing of Architectural Change Commits: Towards Semantic Design Review

Authors: Amit Kumar Mondal, Chanchal K. Roy, Kevin A. Schneider, Banani Roy, Sristy Sumana Nath

Abstract: Software architectural changes involve more than one module or component and are complex to analyze compared to local code changes. Development teams aiming to review architectural aspects (design) of a change commit consider many essential scenarios such as access rules and restrictions on usage of program entities across modules. Moreover, design review is essential when proper architectural for… ▽ More Software architectural changes involve more than one module or component and are complex to analyze compared to local code changes. Development teams aiming to review architectural aspects (design) of a change commit consider many essential scenarios such as access rules and restrictions on usage of program entities across modules. Moreover, design review is essential when proper architectural formulations are paramount for develo** and deploying a system. Untangling architectural changes, recovering semantic design, and producing design notes are the crucial tasks of the design review process. To support these tasks, we construct a lightweight tool [4] that can detect and decompose semantic slices of a commit containing architectural instances. A semantic slice consists of a description of relational information of involved modules, their classes, methods and connected modules in a change instance, which is easy to understand to a reviewer. We extract various directory and naming structures (DANS) properties from the source code for develo** our tool. Utilizing the DANS properties, our tool first detects architectural change instances based on our defined metric and then decomposes the slices (based on string processing). Our preliminary investigation with ten open-source projects (developed in Java and Kotlin) reveals that the DANS properties produce highly reliable precision and recall (93-100%) for detecting and generating architectural slices. Our proposed tool will serve as the preliminary approach for the semantic design recovery and design summary generation for the project releases. △ Less

Submitted 1 September, 2021; originally announced September 2021.

arXiv:2108.09646 [pdf, other]

A Systematic Review of Automated Query Reformulations in Source Code Search

Authors: Mohammad Masudur Rahman, Chanchal K. Roy

Abstract: Fixing software bugs and adding new features are two of the major maintenance tasks. Software bugs and features are reported as change requests. Developers consult these requests and often choose a few keywords from them as an ad hoc query. Then they execute the query with a search engine to find the exact locations within software code that need to be changed. Unfortunately, even experienced deve… ▽ More Fixing software bugs and adding new features are two of the major maintenance tasks. Software bugs and features are reported as change requests. Developers consult these requests and often choose a few keywords from them as an ad hoc query. Then they execute the query with a search engine to find the exact locations within software code that need to be changed. Unfortunately, even experienced developers often fail to choose appropriate queries, which leads to costly trials and errors during a code search. Over the years, many studies attempt to reformulate the ad hoc queries from developers to support them. In this systematic literature review, we carefully select 70 primary studies on query reformulations from 2,970 candidate studies, perform an in-depth qualitative analysis (e.g., Grounded Theory), and then answer seven research questions with major findings. First, to date, eight major methodologies (e.g., term weighting, term co-occurrence analysis, thesaurus lookup) have been adopted to reformulate queries. Second, the existing studies suffer from several major limitations (e.g., lack of generalizability, vocabulary mismatch problem, subjective bias) that might prevent their wide adoption. Finally, we discuss the best practices and future opportunities to advance the state of research in search query reformulations. △ Less

Submitted 8 June, 2023; v1 submitted 22 August, 2021; originally announced August 2021.

Comments: 81 pages, accepted at TOSEM

ACM Class: D.2.5; D.2.1; D.2.7; D.2.13

arXiv:2108.05341 [pdf, other]

The Forgotten Role of Search Queries in IR-based Bug Localization: An Empirical Study

Authors: Mohammad Masudur Rahman, Foutse Khomh, Shamima Yeasmin, Chanchal K. Roy

Abstract: Being light-weight and cost-effective, IR-based approaches for bug localization have shown promise in finding software bugs. However, the accuracy of these approaches heavily depends on their used bug reports. A significant number of bug reports contain only plain natural language texts. According to existing studies, IR-based approaches cannot perform well when they use these bug reports as searc… ▽ More Being light-weight and cost-effective, IR-based approaches for bug localization have shown promise in finding software bugs. However, the accuracy of these approaches heavily depends on their used bug reports. A significant number of bug reports contain only plain natural language texts. According to existing studies, IR-based approaches cannot perform well when they use these bug reports as search queries. On the other hand, there is a piece of recent evidence that suggests that even these natural language-only reports contain enough good keywords that could help localize the bugs successfully. On one hand, these findings suggest that natural language-only bug reports might be a sufficient source for good query keywords. On the other hand, they cast serious doubt on the query selection practices in the IR-based bug localization. In this article, we attempted to clear the sky on this aspect by conducting an in-depth empirical study that critically examines the state-of-the-art query selection practices in IR-based bug localization. In particular, we use a dataset of 2,320 bug reports, employ ten existing approaches from the literature, exploit the Genetic Algorithm-based approach to construct optimal, near-optimal search queries from these bug reports, and then answer three research questions. We confirmed that the state-of-the-art query construction approaches are indeed not sufficient for constructing appropriate queries (for bug localization) from certain natural language-only bug reports although they contain such queries. We also demonstrate that optimal queries and non-optimal queries chosen from bug report texts are significantly different in terms of several keyword characteristics, which has led us to actionable insights. Furthermore, we demonstrate 27%--34% improvement in the performance of non-optimal queries through the application of our actionable insights to them. △ Less

Submitted 11 August, 2021; originally announced August 2021.

Comments: 57 pages, EMSE (2021)

ACM Class: D.2; D.2.5; D.2.7

arXiv:2108.02702 [pdf, other]

Improved Retrieval of Programming Solutions With Code Examples Using a Multi-featured Score

Authors: Rodrigo F. Silva, M. Masudur Rahman, Carlos Eduardo Dantas, Chanchal Roy, Foutse Khomh, Marcelo A. Maia

Abstract: Developers often depend on code search engines to obtain solutions for their programming tasks. However, finding an expected solution containing code examples along with their explanations is challenging due to several issues. There is a vocabulary mismatch between the search keywords (the query) and the appropriate solutions. Semantic gap may increase for similar bag of words due to antonyms and… ▽ More Developers often depend on code search engines to obtain solutions for their programming tasks. However, finding an expected solution containing code examples along with their explanations is challenging due to several issues. There is a vocabulary mismatch between the search keywords (the query) and the appropriate solutions. Semantic gap may increase for similar bag of words due to antonyms and negation. Moreover, documents retrieved by search engines might not contain solutions containing both code examples and their explanations. So, we propose CRAR (Crowd Answer Recommender) to circumvent those issues aiming at improving retrieval of relevant answers from Stack Overflow containing not only the expected code examples for the given task but also their explanations. Given a programming task, we investigate the effectiveness of combining information retrieval techniques along with a set of features to enhance the ranking of important threads (i.e., the units containing questions along with their answers) for the given task and then selects relevant answers contained in those threads, including semantic features, like word embeddings and sentence embeddings, for instance, a Convolutional Neural Network (CNN). CRAR also leverages social aspects of Stack Overflow discussions like popularity to select relevant answers for the tasks. Our experimental evaluation shows that the combination of the different features performs better than each one individually. We also compare the retrieval performance with the state-of-art CROKAGE (Crowd Knowledge Answer Generator), which is also a system aimed at retrieving relevant answers from Stack Overflow. We show that CRAR outperforms CROKAGE in Mean Reciprocal Rank and Mean Recall with small and medium effect sizes, respectively. △ Less

Submitted 5 August, 2021; originally announced August 2021.

Comments: 31 pages, 5 figures, 9 tables

arXiv:2102.08874 [pdf, other]

Mining API Usage Scenarios from Stack Overflow

Authors: Gias Uddin, Foutse Khomh, Chanchal K Roy

Abstract: We propose a framework to mine API usage scenarios from Stack Overflow. Each task consists of a code example, the task description, and the reactions of developers towards the code example. First, we present an algorithm to automatically link a code example in a forum post to an API mentioned in the textual contents of the forum post. Second, we generate a natural language description of the task… ▽ More We propose a framework to mine API usage scenarios from Stack Overflow. Each task consists of a code example, the task description, and the reactions of developers towards the code example. First, we present an algorithm to automatically link a code example in a forum post to an API mentioned in the textual contents of the forum post. Second, we generate a natural language description of the task by summarizing the discussions around the code example. Third, we automatically associate developers reactions (i.e., positive and negative opinions) towards the code example to offer information about code quality. We evaluate the algorithms using three benchmarks. △ Less

Submitted 17 February, 2021; originally announced February 2021.

Journal ref: 2020 Information and Software Technology (IST)

arXiv:2102.08502 [pdf, other]

Automatic API Usage Scenario Documentation from Technical Q&A Sites

Authors: Gias Uddin, Foutse Khomh, Chanchal K Roy

Abstract: The online technical Q&A site Stack Overflow (SO) is popular among developers to support their coding and diverse development needs. To address shortcomings in API official documentation resources, several research has thus focused on augmenting official API documentation with insights (e.g., code examples) from SO. The techniques propose to add code examples/insights about APIs into its official… ▽ More The online technical Q&A site Stack Overflow (SO) is popular among developers to support their coding and diverse development needs. To address shortcomings in API official documentation resources, several research has thus focused on augmenting official API documentation with insights (e.g., code examples) from SO. The techniques propose to add code examples/insights about APIs into its official documentation. Reviews are opinionated sentences with positive/negative sentiments. However, we are aware of no previous research that attempts to automatically produce API documentation from SO by considering both API code examples and reviews. In this paper, we present two novel algorithms that can be used to automatically produce API documentation from SO by combining code examples and reviews towards those examples. The first algorithm is called statistical documentation, which shows the distribution of positivity and negativity around the code examples of an API using different metrics (e.g., star ratings). The second algorithm is called concept-based documentation, which clusters similar and conceptually relevant usage scenarios. An API usage scenario contains a code example, a textual description of the underlying task addressed by the code example, and the reviews (i.e., opinions with positive and negative sentiments) from other developers towards the code example. We deployed the algorithms in Opiner, a web-based platform to aggregate information about APIs from online forums. We evaluated the algorithms by mining all Java JSON-based posts in SO and by conducting three user studies based on produced documentation from the posts. △ Less

Submitted 16 February, 2021; originally announced February 2021.

Journal ref: 2021 ACM Transactions on Software Engineering and Methodology (TOSEM)

arXiv:2012.02925 [pdf, other]

doi 10.1016/j.jpdc.2021.05.010

An Improved Framework of GPU Computing for CFD Applications on Structured Grids using OpenACC

Authors: Weicheng Xue, Charles W. Jackson, Christoper J. Roy

Abstract: This paper is focused on improving multi-GPU performance of a research CFD code on structured grids. MPI and OpenACC directives are used to scale the code up to 16 GPUs. This paper shows that using 16 P100 GPUs and 16 V100 GPUs can be 30$\times$ and 70$\times$ faster than 16 Xeon CPU E5-2680v4 cores for three different test cases, respectively. A series of performance issues related to the scaling… ▽ More This paper is focused on improving multi-GPU performance of a research CFD code on structured grids. MPI and OpenACC directives are used to scale the code up to 16 GPUs. This paper shows that using 16 P100 GPUs and 16 V100 GPUs can be 30$\times$ and 70$\times$ faster than 16 Xeon CPU E5-2680v4 cores for three different test cases, respectively. A series of performance issues related to the scaling for the multi-block CFD code are addressed by applying various optimizations. Performance optimizations such as the pack/unpack message method, removing temporary arrays as arguments to procedure calls, allocating global memory for limiters and connected boundary data, reordering non-blocking MPI I\_send/I\_recv and Wait calls, reducing unnecessary implicit derived type member data movement between the host and the device and the use of GPUDirect can improve the compute utilization, memory throughput, and asynchronous progression in the multi-block CFD code using modern programming features. △ Less

Submitted 4 December, 2020; originally announced December 2020.

Comments: 43 pages, 27 figures

arXiv:2007.06544 [pdf]

doi 10.1002/mrm.28713

Free-running SIMilarity-Based Angiography (SIMBA) for simplified anatomical MR imaging of the heart

Authors: John Heerfordt, Kevin K. Whitehead, Jessica A. M. Bastiaansen, Lorenzo Di Sopra, Christopher W. Roy, Jérôme Yerly, Bastien Milani, Mark A. Fogel, Matthias Stuber, Davide Piccini

Abstract: Purpose: Whole-heart MRA techniques typically target pre-determined motion states and address cardiac and respiratory dynamics independently. We propose a novel fast reconstruction algorithm, applicable to ungated free-running sequences, that leverages inherent similarities in the acquired data to avoid such physiological constraints. Theory and Methods: The proposed SIMilarity-Based Angiography… ▽ More Purpose: Whole-heart MRA techniques typically target pre-determined motion states and address cardiac and respiratory dynamics independently. We propose a novel fast reconstruction algorithm, applicable to ungated free-running sequences, that leverages inherent similarities in the acquired data to avoid such physiological constraints. Theory and Methods: The proposed SIMilarity-Based Angiography (SIMBA) method clusters the continuously acquired k-space data in order to find a motion-consistent subset that can be reconstructed into a motion-suppressed whole-heart MRA. Free-running 3D radial datasets from six ferumoxytol-enhanced scans of pediatric cardiac patients and twelve non-contrast scans of healthy volunteers were reconstructed with a non-motion-suppressed regridding of all the acquired data (All Data), our proposed SIMBA method, and a previously published free-running framework (FRF) that uses cardiac and respiratory self-gating and compressed sensing. Images were compared for blood-myocardium interface sharpness, contrast ratio, and visibility of coronary artery ostia. Results: Both the fast SIMBA reconstruction (~20s) and the FRF provided significantly higher blood-myocardium sharpness than All Data (P<0.001). No significant difference was observed among the former two. Significantly higher blood-myocardium contrast ratio was obtained with SIMBA compared to All Data and FRF (P<0.01). More coronary ostia could be visualized with both SIMBA and FRF than with All Data (All Data: 4/36, SIMBA: 30/36, FRF: 33/36, both P<0.001) but no significant difference was found between the first two. Conclusion: The combination of free-running sequences and the fast SIMBA reconstruction, which operates without a priori assumptions related to physiological motion, forms a simple workflow for obtaining whole-heart MRA with sharp anatomical structures. △ Less

Submitted 13 July, 2020; originally announced July 2020.

Comments: 8 figures, 2 tables

Journal ref: Magnetic Resonance in Medicine, 24 February 2021

arXiv:2006.15682 [pdf, other]

A Survey on the Evaluation of Clone Detection Performance and Benchmarking

Authors: Jeffrey Svajlenko, Chanchal K. Roy

Abstract: There are a great many clone detection tools proposed in the literature. In this paper, we investigate the state of clone detection tool evaluation. We begin by surveying the clone detection benchmarks, and performing a multi-faceted evaluation and comparison of their features and capabilities. We then survey the existing clone detection tool and technique publications, and evaluate how the author… ▽ More There are a great many clone detection tools proposed in the literature. In this paper, we investigate the state of clone detection tool evaluation. We begin by surveying the clone detection benchmarks, and performing a multi-faceted evaluation and comparison of their features and capabilities. We then survey the existing clone detection tool and technique publications, and evaluate how the authors of these works evaluate their own tools/techniques. We rank the individual works by how well they measure recall, precision, execution time and scalability. We select the works the best evaluate all four metrics as exemplars that should be considered by future researchers publishing clone detection tools/techniques when designing the empirical evaluation of their tool/technique. We measure statistics on tool evaluation by the authors, and find that evaluation is poor amongst the authors. We finish our investigation into clone detection evaluation by surveying the existing tool comparison studies, including both the qualitative and quantitative studies. △ Less

Submitted 28 June, 2020; originally announced June 2020.

Comments: 109 pages, review article, several figures and tables, and 277 references. It covers the whole area of clone detection and evaluation literature

arXiv:2006.02602 [pdf, other]

doi 10.1002/cpe.6036

Multi-GPU Performance Optimization of a CFD Code using OpenACC on Different Platforms

Authors: Weicheng Xue, Christopher J. Roy

Abstract: This paper investigates the multi-GPU performance of a 3D buoyancy driven cavity solver using MPI and OpenACC directives on different platforms. The paper shows that decomposing the total problem in different dimensions affects the strong scaling performance significantly for the GPU. Without proper performance optimizations, it is shown that 1D domain decomposition scales poorly on multiple GPUs… ▽ More This paper investigates the multi-GPU performance of a 3D buoyancy driven cavity solver using MPI and OpenACC directives on different platforms. The paper shows that decomposing the total problem in different dimensions affects the strong scaling performance significantly for the GPU. Without proper performance optimizations, it is shown that 1D domain decomposition scales poorly on multiple GPUs due to the noncontiguous memory access. The performance using whatever decompositions can be benefited from a series of performance optimizations in the paper. Since the buoyancy driven cavity code is latency-bounded on the clusters examined, a series of optimizations both agnostic and tailored to the platforms are designed to reduce the latency cost and improve memory throughput between hosts and devices efficiently. First, the parallel message packing/unpacking strategy developed for noncontiguous data movement between hosts and devices improves the overall performance by about a factor of 2. Second, transferring different data based on the stencil sizes for different variables further reduces the communication overhead. These two optimizations are general enough to be beneficial to stencil computations having ghost changes on all of the clusters tested. Third, GPUDirect is used to improve the communication on clusters which have the hardware and software support for direct communication between GPUs without staging CPU's memory. Finally, overlap** the communication and computations is shown to be not efficient on multi-GPUs if only using MPI or MPI+OpenACC. Although we believe our implementation has revealed enough overlap, the actual running does not utilize the overlap well due to a lack of asynchronous progression. △ Less

Submitted 3 June, 2020; originally announced June 2020.

arXiv:2005.02335 [pdf, other]

Don't Explain without Verifying Veracity: An Evaluation of Explainable AI with Video Activity Recognition

Authors: Mahsan Nourani, Chiradeep Roy, Tahrima Rahman, Eric D. Ragan, Nicholas Ruozzi, Vibhav Gogate

Abstract: Explainable machine learning and artificial intelligence models have been used to justify a model's decision-making process. This added transparency aims to help improve user performance and understanding of the underlying model. However, in practice, explainable systems face many open questions and challenges. Specifically, designers might reduce the complexity of deep learning models in order to… ▽ More Explainable machine learning and artificial intelligence models have been used to justify a model's decision-making process. This added transparency aims to help improve user performance and understanding of the underlying model. However, in practice, explainable systems face many open questions and challenges. Specifically, designers might reduce the complexity of deep learning models in order to provide interpretability. The explanations generated by these simplified models, however, might not accurately justify and be truthful to the model. This can further add confusion to the users as they might not find the explanations meaningful with respect to the model predictions. Understanding how these explanations affect user behavior is an ongoing challenge. In this paper, we explore how explanation veracity affects user performance and agreement in intelligent systems. Through a controlled user study with an explainable activity recognition system, we compare variations in explanation veracity for a video review and querying task. The results suggest that low veracity explanations significantly decrease user performance and agreement compared to both accurate explanations and a system without explanations. These findings demonstrate the importance of accurate and understandable explanations and caution that poor explanations can sometimes be worse than no explanations with respect to their effect on user performance and reliance on an AI system. △ Less

Submitted 5 May, 2020; originally announced May 2020.

ACM Class: H.1.2

arXiv:2005.01005 [pdf, other]

doi 10.1109/CSMR-WCRE.2014.6747168

The Vision of Software Clone Management: Past, Present, and Future

Authors: Chanchal K. Roy, Minhaz F. Zibran, Rainer Koschke

Abstract: Duplicated code or code clones are a kind of code smell that have both positive and negative impacts on the development and maintenance of software systems. Software clone research in the past mostly focused on the detection and analysis of code clones, while research in recent years extends to the whole spectrum of clone management. In the last decade, three surveys appeared in the literature, wh… ▽ More Duplicated code or code clones are a kind of code smell that have both positive and negative impacts on the development and maintenance of software systems. Software clone research in the past mostly focused on the detection and analysis of code clones, while research in recent years extends to the whole spectrum of clone management. In the last decade, three surveys appeared in the literature, which cover the detection, analysis, and evolutionary characteristics of code clones. This paper presents a comprehensive survey on the state of the art in clone management, with in-depth investigation of clone management activities (e.g., tracing, refactoring, cost-benefit analysis) beyond the detection and analysis. This is the first survey on clone management, where we point to the achievements so far, and reveal avenues for further research necessary towards an integrated clone management system. We believe that we have done a good job in surveying the area of clone management and that this work may serve as a kind of roadmap for future research in the area △ Less

Submitted 3 May, 2020; originally announced May 2020.

Comments: 16 pages

Journal ref: 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE), Antwerp, 2014, pp. 18-33

arXiv:2005.00967 [pdf, other]

A Machine Learning Based Framework for Code Clone Validation

Authors: Golam Mostaeen, Banani Roy, Chanchal Roy, Kevin Schneider, Jeffrey Svajlenko

Abstract: A code clone is a pair of code fragments, within or between software systems that are similar. Since code clones often negatively impact the maintainability of a software system, several code clone detection techniques and tools have been proposed and studied over the last decade. To detect all possible similar source code patterns in general, the clone detection tools work on the syntax level whi… ▽ More A code clone is a pair of code fragments, within or between software systems that are similar. Since code clones often negatively impact the maintainability of a software system, several code clone detection techniques and tools have been proposed and studied over the last decade. To detect all possible similar source code patterns in general, the clone detection tools work on the syntax level while lacking user-specific preferences. This often means the clones must be manually inspected before analysis in order to remove those false positives from consideration. This manual clone validation effort is very time-consuming and often error-prone, in particular for large-scale clone detection. In this paper, we propose a machine learning approach for automating the validation process. Our machine learning-based approach is used to automatically validate clones without human inspection. Thus the proposed approach can be used to remove the false positive clones from the detection results, automatically evaluate the precision of any clone detectors for any given set of datasets, evaluate existing clone benchmark datasets, or even be used to build new clone benchmarks and datasets with minimum effort. In an experiment with clones detected by several clone detectors in several different software systems, we found our approach has an accuracy of up to 87.4% when compared against the manual validation by multiple expert judges. The proposed method also shows better results in several comparative studies with the existing related approaches for clone classification. △ Less

Submitted 2 May, 2020; originally announced May 2020.

arXiv:2003.07970 [pdf, other]

doi 10.1145/3379597.3387512

An Exploratory Study to Find Motives Behind Cross-platform Forks from Software Heritage Dataset

Authors: Avijit Bhattacharjee, Sristy Sumana Nath, Shurui Zhou, Debasish Chakroborti, Banani Roy, Chanchal K. Roy, Kevin Schneider

Abstract: The fork-based development mechanism provides the flexibility and the unified processes for software teams to collaborate easily in a distributed setting without too much coordination overhead.Currently, multiple social coding platforms support fork-based development, such as GitHub, GitLab, and Bitbucket. Although these different platforms virtually share the same features, they have different em… ▽ More The fork-based development mechanism provides the flexibility and the unified processes for software teams to collaborate easily in a distributed setting without too much coordination overhead.Currently, multiple social coding platforms support fork-based development, such as GitHub, GitLab, and Bitbucket. Although these different platforms virtually share the same features, they have different emphasis. As GitHub is the most popular platform and the corresponding data is publicly available, most of the current studies are focusing on GitHub hosted projects. However, we observed anecdote evidences that people are confused about choosing among these platforms, and some projects are migrating from one platform to another, and the reasons behind these activities remain unknown.With the advances of Software Heritage Graph Dataset (SWHGD),we have the opportunity to investigate the forking activities across platforms. In this paper, we conduct an exploratory study on 10popular open-source projects to identify cross-platform forks and investigate the motivation behind. Preliminary result shows that cross-platform forks do exist. For the 10 subject systems in this study, we found 81,357 forks in total among which 179 forks are on GitLab. Based on our qualitative analysis, we found that most of the cross-platform forks that we identified are mirrors of the repositories on another platform, but we still find cases that were created due to preference of using certain functionalities (e.g. Continuous Integration (CI)) supported by different platforms. This study lays the foundation of future research directions, such as understanding the differences between platforms and supporting cross-platform collaboration. △ Less

Submitted 17 March, 2020; originally announced March 2020.

Comments: Accepted at 17th International Conference on Mining Software Repositories, October 5--6, 2020, Seoul, Republic of Korea

arXiv:1910.11125 [pdf, other]

Micro-level Modularity of Computaion-intensive Programs in Big Data Platforms: A Case Study with Image Data

Authors: Amit Kumar Mondal, Banani Roy, Chanchal K. Roy, Kevin A. Schneider

Abstract: With the rapid advancement of Big Data platforms such as Hadoop, Spark, and Dataflow, many tools are being developed that are intended to provide end users with an interactive environment for large-scale data analysis (e.g., IQmulus). However, there are challenges using these platforms. For example, developers find it difficult to use these platforms when develo** interactive and reusable data a… ▽ More With the rapid advancement of Big Data platforms such as Hadoop, Spark, and Dataflow, many tools are being developed that are intended to provide end users with an interactive environment for large-scale data analysis (e.g., IQmulus). However, there are challenges using these platforms. For example, developers find it difficult to use these platforms when develo** interactive and reusable data analytic tools. One approach to better support interactivity and reusability is the use of microlevel modularisation for computation-intensive tasks, which splits data operations into independent, composable modules. However, modularizing data and computation-intensive tasks into independent components differs from traditional programming, e.g., when accessing large scale data, controlling data-flow among components, and structuring computation logic. In this paper, we present a case study on modularizing real world computationintensive tasks that investigates the impact of modularization on processing large scale image data. To that end, we synthesize image data-processing patterns and propose a unified modular model for the effective implementation of computation-intensive tasks on data-parallel frameworks considering reproducibility, reusability, and customization. We present various insights of using the modularity model based on our experimental results from running image processing tasks on Spark and Hadoop clusters. △ Less

Submitted 19 October, 2019; originally announced October 2019.

arXiv:1909.04238 [pdf, other]

LVMapper: A Large-variance Clone Detector Using Sequencing Alignment Approach

Authors: Ming Wu, Pengcheng Wang, Kangqi Yin, Haoyu Cheng, Yun Xu, Chanchal K. Roy

Abstract: To detect large-variance code clones (i.e. clones with relatively more differences) in large-scale code repositories is difficult because most current tools can only detect almost identical or very similar clones. It will make promotion and changes to some software applications such as bug detection, code completion, software analysis, etc. Recently, CCAligner made an attempt to detect clones with… ▽ More To detect large-variance code clones (i.e. clones with relatively more differences) in large-scale code repositories is difficult because most current tools can only detect almost identical or very similar clones. It will make promotion and changes to some software applications such as bug detection, code completion, software analysis, etc. Recently, CCAligner made an attempt to detect clones with relatively concentrated modifications called large-gap clones. Our contribution is to develop a novel and effective detection approach of large-variance clones to more general cases for not only the concentrated code modifications but also the scattered code modifications. A detector named LVMapper is proposed, borrowing and changing the approach of sequencing alignment in bioinformatics which can find two similar sequences with more differences. The ability of LVMapper was tested on both self-synthetic datasets and real cases, and the results show substantial improvement in detecting large-variance clones compared with other state-of-the-art tools including CCAligner. Furthermore, our new tool also presents good recall and precision for general Type-1, Type-2 and Type-3 clones on the widely used benchmarking dataset, BigCloneBench. △ Less

Submitted 9 September, 2019; originally announced September 2019.

arXiv:1909.03166 [pdf, other]

Equalizing Recourse across Groups

Authors: Vivek Gupta, Pegah Nokhiz, Chitradeep Dutta Roy, Suresh Venkatasubramanian

Abstract: The rise in machine learning-assisted decision-making has led to concerns about the fairness of the decisions and techniques to mitigate problems of discrimination. If a negative decision is made about an individual (denying a loan, rejecting an application for housing, and so on) justice dictates that we be able to ask how we might change circumstances to get a favorable decision the next time. M… ▽ More The rise in machine learning-assisted decision-making has led to concerns about the fairness of the decisions and techniques to mitigate problems of discrimination. If a negative decision is made about an individual (denying a loan, rejecting an application for housing, and so on) justice dictates that we be able to ask how we might change circumstances to get a favorable decision the next time. Moreover, the ability to change circumstances (a better education, improved credentials) should not be limited to only those with access to expensive resources. In other words, \emph{recourse} for negative decisions should be considered a desirable value that can be equalized across (demographically defined) groups. This paper describes how to build models that make accurate predictions while still ensuring that the penalties for a negative outcome do not disadvantage different groups disproportionately. We measure recourse as the distance of an individual from the decision boundary of a classifier. We then introduce a regularized objective to minimize the difference in recourse across groups. We explore linear settings and further extend recourse to non-linear settings as well as model-agnostic settings where the exact distance from boundary cannot be calculated. Our results show that we can successfully decrease the unfairness in recourse while maintaining classifier performance. △ Less

Submitted 6 September, 2019; originally announced September 2019.

Comments: 13 pages, 4 figures, 2 tables

arXiv:1904.05514 [pdf, other]

Mitigating Information Leakage in Image Representations: A Maximum Entropy Approach

Authors: Proteek Chandan Roy, Vishnu Naresh Boddeti

Abstract: Image recognition systems have demonstrated tremendous progress over the past few decades thanks, in part, to our ability of learning compact and robust representations of images. As we witness the wide spread adoption of these systems, it is imperative to consider the problem of unintended leakage of information from an image representation, which might compromise the privacy of the data owner. T… ▽ More Image recognition systems have demonstrated tremendous progress over the past few decades thanks, in part, to our ability of learning compact and robust representations of images. As we witness the wide spread adoption of these systems, it is imperative to consider the problem of unintended leakage of information from an image representation, which might compromise the privacy of the data owner. This paper investigates the problem of learning an image representation that minimizes such leakage of user information. We formulate the problem as an adversarial non-zero sum game of finding a good embedding function with two competing goals: to retain as much task dependent discriminative image information as possible, while simultaneously minimizing the amount of information, as measured by entropy, about other sensitive attributes of the user. We analyze the stability and convergence dynamics of the proposed formulation using tools from non-linear systems theory and compare to that of the corresponding adversarial zero-sum game formulation that optimizes likelihood as a measure of information content. Numerical experiments on UCI, Extended Yale B, CIFAR-10 and CIFAR-100 datasets indicate that our proposed approach is able to learn image representations that exhibit high task performance while mitigating leakage of predefined sensitive information. △ Less

Submitted 10 April, 2019; originally announced April 2019.

Comments: Accepted for oral presentation at CVPR 2019

arXiv:1903.07662 [pdf, other]

Recommending Comprehensive Solutions for Programming Tasks by Mining Crowd Knowledge

Authors: Rodrigo F. G. Silva, Chanchal K. Roy, Mohammad Masudur Rahman, Kevin A. Schneider, Klerisson Paixao, Marcelo de Almeida Maia

Abstract: Developers often search for relevant code examples on the web for their programming tasks. Unfortunately, they face two major problems. First, the search is impaired due to a lexical gap between their query (task description) and the information associated with the solution. Second, the retrieved solution may not be comprehensive, i.e., the code segment might miss a succinct explanation. These pro… ▽ More Developers often search for relevant code examples on the web for their programming tasks. Unfortunately, they face two major problems. First, the search is impaired due to a lexical gap between their query (task description) and the information associated with the solution. Second, the retrieved solution may not be comprehensive, i.e., the code segment might miss a succinct explanation. These problems make the developers browse dozens of documents in order to synthesize an appropriate solution. To address these two problems, we propose CROKAGE (Crowd Knowledge Answer Generator), a tool that takes the description of a programming task (the query) and provides a comprehensive solution for the task. Our solutions contain not only relevant code examples but also their succinct explanations. Our proposed approach expands the task description with relevant API classes from Stack Overflow Q&A threads and then mitigates the lexical gap problems. Furthermore, we perform natural language processing on the top quality answers and then return such programming solutions containing code examples and code explanations unlike earlier studies. We evaluate our approach using 48 programming queries and show that it outperforms six baselines including the state-of-art by a statistically significant margin. Furthermore, our evaluation with 29 developers using 24 tasks (queries) confirms the superiority of CROKAGE over the state-of-art tool in terms of relevance of the suggested code examples, benefit of the code explanations and the overall solution quality (code + explanation). △ Less

Submitted 20 March, 2019; v1 submitted 18 March, 2019; originally announced March 2019.

Comments: Accepted at ICPC, 12 pages, 2019

arXiv:1902.03501 [pdf, other]

Assessing the Local Interpretability of Machine Learning Models

Authors: Dylan Slack, Sorelle A. Friedler, Carlos Scheidegger, Chitradeep Dutta Roy

Abstract: The increasing adoption of machine learning tools has led to calls for accountability via model interpretability. But what does it mean for a machine learning model to be interpretable by humans, and how can this be assessed? We focus on two definitions of interpretability that have been introduced in the machine learning literature: simulatability (a user's ability to run a model on a given input… ▽ More The increasing adoption of machine learning tools has led to calls for accountability via model interpretability. But what does it mean for a machine learning model to be interpretable by humans, and how can this be assessed? We focus on two definitions of interpretability that have been introduced in the machine learning literature: simulatability (a user's ability to run a model on a given input) and "what if" local explainability (a user's ability to correctly determine a model's prediction under local changes to the input, given knowledge of the model's original prediction). Through a user study with 1,000 participants, we test whether humans perform well on tasks that mimic the definitions of simulatability and "what if" local explainability on models that are typically considered locally interpretable. To track the relative interpretability of models, we employ a simple metric, the runtime operation count on the simulatability task. We find evidence that as the number of operations increases, participant accuracy on the local interpretability tasks decreases. In addition, this evidence is consistent with the common intuition that decision trees and logistic regression models are interpretable and are more interpretable than neural networks. △ Less

Submitted 2 August, 2019; v1 submitted 9 February, 2019; originally announced February 2019.

arXiv:1812.00975 [pdf, other]

Structure Learning Using Forced Pruning

Authors: Ahmed Abdelatty, Pracheta Sahoo, Chiradeep Roy

Abstract: Markov networks are widely used in many Machine Learning applications including natural language processing, computer vision, and bioinformatics . Learning Markov networks have many complications ranging from intractable computations involved to the possibility of learning a model with a huge number of parameters. In this report, we provide a computationally tractable greedy heuristic for learning… ▽ More Markov networks are widely used in many Machine Learning applications including natural language processing, computer vision, and bioinformatics . Learning Markov networks have many complications ranging from intractable computations involved to the possibility of learning a model with a huge number of parameters. In this report, we provide a computationally tractable greedy heuristic for learning Markov networks structure. The proposed heuristic results in a model with a limited predefined number of parameters. We ran our method on 3 fully-observed real datasets, and we observed that our method is doing comparably good to the state of the art methods. △ Less

Submitted 3 December, 2018; originally announced December 2018.

arXiv:1808.00594 [pdf, other]

Improving IR-Based Bug Localization with Context-Aware Query Reformulation

Authors: Mohammad Masudur Rahman, Chanchal K. Roy

Abstract: Recent findings suggest that Information Retrieval (IR)-based bug localization techniques do not perform well if the bug report lacks rich structured information (eg relevant program entity names). Conversely, excessive structured information (eg stack traces) in the bug report might not always help the automated localization either. In this paper, we propose a novel technique--BLIZZARD-- that aut… ▽ More Recent findings suggest that Information Retrieval (IR)-based bug localization techniques do not perform well if the bug report lacks rich structured information (eg relevant program entity names). Conversely, excessive structured information (eg stack traces) in the bug report might not always help the automated localization either. In this paper, we propose a novel technique--BLIZZARD-- that automatically localizes buggy entities from project source using appropriate query reformulation and effective information retrieval. In particular, our technique determines whether there are excessive program entities or not in a bug report (query), and then applies appropriate reformulations to the query for bug localization. Experiments using 5,139 bug reports show that our technique can localize the buggy source documents with 7%--56% higher Hit@10, 6%--62% higher MAP@10 and 6%--62% higher MRR@10 than the baseline technique. Comparison with the state-of-the-art techniques and their variants report that our technique can improve 19% in MAP@10 and 20% in MRR@10 over the state-of-the-art, and can improve 59% of the noisy queries and 39% of the poor queries. △ Less

Submitted 1 August, 2018; originally announced August 2018.

Comments: To be presented at The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018), FL, USA

Journal ref: In Proc. ESEC/FSE 2018

arXiv:1807.08798 [pdf, other]

Effective Reformulation of Query for Code Search using Crowdsourced Knowledge and Extra-Large Data Analytics

Authors: Mohammad Masudur Rahman, Chanchal K. Roy

Abstract: Software developers frequently issue generic natural language queries for code search while using code search engines (e.g., GitHub native search, Krugle). Such queries often do not lead to any relevant results due to vocabulary mismatch problems. In this paper, we propose a novel technique that automatically identifies relevant and specific API classes from Stack Overflow Q & A site for a program… ▽ More Software developers frequently issue generic natural language queries for code search while using code search engines (e.g., GitHub native search, Krugle). Such queries often do not lead to any relevant results due to vocabulary mismatch problems. In this paper, we propose a novel technique that automatically identifies relevant and specific API classes from Stack Overflow Q & A site for a programming task written as a natural language query, and then reformulates the query for improved code search. We first collect candidate API classes from Stack Overflow using pseudo-relevance feedback and two term weighting algorithms, and then rank the candidates using Borda count and semantic proximity between query keywords and the API classes. The semantic proximity has been determined by an analysis of 1.3 million questions and answers of Stack Overflow. Experiments using 310 code search queries report that our technique suggests relevant API classes with 48% precision and 58% recall which are 32% and 48% higher respectively than those of the state-of-the-art. Comparisons with two state-of-the-art studies and three popular search engines (e.g., Google, Stack Overflow, and GitHub native search) report that our reformulated queries (1) outperform the queries of the state-of-the-art, and (2) significantly improve the code search results provided by these contemporary search engines. △ Less

Submitted 23 July, 2018; originally announced July 2018.

Comments: The 34th International Conference on Software Maintenance and Evolution (ICSME 2018), pp. 12, Madrid, Spain, September, 2018

Journal ref: Proc. ICSME 2018

arXiv:1807.07676 [pdf, other]

doi 10.1145/3183440.3195003

Poster: Improving Bug Localization with Report Quality Dynamics and Query Reformulation

Authors: Mohammad Masudur Rahman, Chanchal K. Roy

Abstract: Recent findings from a user study suggest that IR-based bug localization techniques do not perform well if the bug report lacks rich structured information such as relevant program entity names. On the contrary, excessive structured information such as stack traces in the bug report might always not be helpful for the automated bug localization. In this paper, we conduct a large empirical study us… ▽ More Recent findings from a user study suggest that IR-based bug localization techniques do not perform well if the bug report lacks rich structured information such as relevant program entity names. On the contrary, excessive structured information such as stack traces in the bug report might always not be helpful for the automated bug localization. In this paper, we conduct a large empirical study using 5,500 bug reports from eight subject systems and replicating three existing studies from the literature. Our findings (1) empirically demonstrate how quality dynamics of bug reports affect the performances of IR-based bug localization, and (2) suggest potential ways (e.g., query reformulations) to overcome such limitations. △ Less

Submitted 19 July, 2018; originally announced July 2018.

Comments: The 40th International Conference on Software Engineering (Companion volume, Poster Track) (ICSE 2018), pp. 348--349, Gothenburg, Sweden, May, 2018

Journal ref: Proc. ICSE-C 2018, pp. 348--349

arXiv:1807.04488 [pdf, other]

Improved Query Reformulation for Concept Location using CodeRank and Document Structures

Authors: Mohammad Masudur Rahman, Chanchal K. Roy

Abstract: During software maintenance, developers usually deal with a significant number of software change requests. As a part of this, they often formulate an initial query from the request texts, and then attempt to map the concepts discussed in the request to relevant source code locations in the software system (a.k.a., concept location). Unfortunately, studies suggest that they often perform poorly in… ▽ More During software maintenance, developers usually deal with a significant number of software change requests. As a part of this, they often formulate an initial query from the request texts, and then attempt to map the concepts discussed in the request to relevant source code locations in the software system (a.k.a., concept location). Unfortunately, studies suggest that they often perform poorly in choosing the right search terms for a change task. In this paper, we propose a novel technique --ACER-- that takes an initial query, identifies appropriate search terms from the source code using a novel term weight --CodeRank, and then suggests effective reformulation to the initial query by exploiting the source document structures, query quality analysis and machine learning. Experiments with 1,675 baseline queries from eight subject systems report that our technique can improve 71% of the baseline queries which is highly promising. Comparison with five closely related existing techniques in query reformulation not only validates our empirical findings but also demonstrates the superiority of our technique. △ Less

Submitted 12 July, 2018; originally announced July 2018.

Comments: The 32nd International Conference on Automated Software Engineering (ASE 2017), pp. 428-439, Urbana-Champaign, Illinois, USA, October, 2017

Report number: 10.1109/ASE.2017.8115655

Journal ref: Proc. ASE 2017, pp. 428-439

arXiv:1807.04485 [pdf, other]

Predicting Usefulness of Code Review Comments using Textual Features and Developer Experience

Authors: Mohammad Masudur Rahman, Chanchal K. Roy, Raula G. Kula

Abstract: Although peer code review is widely adopted in both commercial and open source development, existing studies suggest that such code reviews often contain a significant amount of non-useful review comments. Unfortunately, to date, no tools or techniques exist that can provide automatic support in improving those non-useful comments. In this paper, we first report a comparative study between useful… ▽ More Although peer code review is widely adopted in both commercial and open source development, existing studies suggest that such code reviews often contain a significant amount of non-useful review comments. Unfortunately, to date, no tools or techniques exist that can provide automatic support in improving those non-useful comments. In this paper, we first report a comparative study between useful and non-useful review comments where we contrast between them using their textual characteristics, and reviewers' experience. Then, based on the findings from the study, we develop RevHelper, a prediction model that can help the developers improve their code review comments through automatic prediction of their usefulness during review submission. Comparative study using 1,116 review comments suggested that useful comments share more vocabulary with the changed code, contain salient items like relevant code elements, and their reviewers are generally more experienced. Experiments using 1,482 review comments report that our model can predict comment usefulness with 66\% prediction accuracy which is promising. Comparison with three variants of a baseline model using a case study validates our empirical findings and demonstrates the potential of our model. △ Less

Submitted 12 July, 2018; originally announced July 2018.

Comments: The 14th International Conference on Mining Software Repositories (MSR 2017), pp. 215--226, Buenos Aires, Argentina, May, 2017

Report number: 10.1109/MSR.2017.17

Journal ref: Proc. MSR 2017, pp. 215--226

arXiv:1807.04479 [pdf, other]

RACK: Code Search in the IDE using Crowdsourced Knowledge

Authors: Mohammad Masudur Rahman, Chanchal K. Roy, David Lo

Abstract: Traditional code search engines often do not perform well with natural language queries since they mostly apply keyword matching. These engines thus require carefully designed queries containing information about programming APIs for code search. Unfortunately, existing studies suggest that preparing an effective query for code search is both challenging and time consuming for the developers. In t… ▽ More Traditional code search engines often do not perform well with natural language queries since they mostly apply keyword matching. These engines thus require carefully designed queries containing information about programming APIs for code search. Unfortunately, existing studies suggest that preparing an effective query for code search is both challenging and time consuming for the developers. In this paper, we propose a novel code search tool--RACK--that returns relevant source code for a given code search query written in natural language text. The tool first translates the query into a list of relevant API classes by mining keyword-API associations from the crowdsourced knowledge of Stack Overflow, and then applies the reformulated query to GitHub code search API for collecting relevant results. Once a query related to a programming task is submitted, the tool automatically mines relevant code snippets from thousands of open-source projects, and displays them as a ranked list within the context of the developer's programming environment--the IDE. Tool page: http://www.usask.ca/~masud.rahman/rack △ Less

Submitted 12 July, 2018; originally announced July 2018.

Comments: The 39th International Conference on Software Engineering (Companion volume) (ICSE 2017), pp. 51--54, Buenos Aires, Argentina, May, 2017

Report number: 10.1109/ICSE-C.2017.11

Journal ref: Proc. ICSE-C 2017, pp. 51--54

arXiv:1807.04475 [pdf, other]

STRICT: Information Retrieval Based Search Term Identification for Concept Location

Authors: Mohammad Masudur Rahman, Chanchal K. Roy

Abstract: During maintenance, software developers deal with numerous change requests that are written in an unstructured fashion using natural language. Such natural language texts illustrate the change requirement involving various domain related concepts. Software developers need to find appropriate search terms from those concepts so that they could locate the possible locations in the source code using… ▽ More During maintenance, software developers deal with numerous change requests that are written in an unstructured fashion using natural language. Such natural language texts illustrate the change requirement involving various domain related concepts. Software developers need to find appropriate search terms from those concepts so that they could locate the possible locations in the source code using a search technique. Once such locations are identified, they can implement the requested changes there. Studies suggest that developers often perform poorly in coming up with good search terms for a change task. In this paper, we propose a novel technique--STRICT--that automatically identifies suitable search terms for a software change task by analyzing its task description using two information retrieval (IR) techniques-- TextRank and POSRank. These IR techniques determine a term's importance based on not only its co-occurrences with other important terms but also its syntactic relationships with them. Experiments using 1,939 change requests from eight subject systems report that STRICT can identify better quality search terms than baseline terms from 52%--62% of the requests with 30%--57% Top-10 retrieval accuracy which are promising. Comparison with two state-of-the-art techniques not only validates our empirical findings and but also demonstrates the superiority of our technique. △ Less

Submitted 12 July, 2018; originally announced July 2018.

Comments: The 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2017), pp. 79--90, Klagenfurt, Austria, February 2017

Report number: 10.1109/SANER.2017.7884611

Journal ref: Proc. SANER 2017, pp. 79--90

Showing 1–50 of 71 results for author: Roy, C