Skip to main content

Showing 1–50 of 71 results for author: Roy, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.03080  [pdf, other

    cs.SI physics.soc-ph

    Homophilic organization of egocentric communities in ICT services

    Authors: Chandreyee Roy, Hang-Hyun Jo, János Kertész, Kimmo Kaski, János Török

    Abstract: Members of a society can be characterized by a large number of features, such as gender, age, ethnicity, religion, social status, and shared activities. One of the main tie-forming factors between individuals in human societies is homophily, the tendency of being attracted to similar others. Homophily has been mainly studied with focus on one of the features and little is known about the roles of… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 8 pages, 7 figures, 1 table

  2. arXiv:2402.04575  [pdf, other

    cs.SE

    Can We Identify Stack Overflow Questions Requiring Code Snippets? Investigating the Cause & Effect of Missing Code Snippets

    Authors: Saikat Mondal, Mohammad Masudur Rahman, Chanchal K. Roy

    Abstract: On the Stack Overflow (SO) Q&A site, users often request solutions to their code-related problems (e.g., errors, unexpected behavior). Unfortunately, they often miss required code snippets during their question submission, which could prevent their questions from getting prompt and appropriate answers. In this study, we conduct an empirical study investigating the cause & effect of missing code sn… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: This paper has been accepted for inclusion in the International Conference on Software Analysis, Evolution, and Reengineering (SANER 2024) technical program

  3. arXiv:2402.04568  [pdf, other

    cs.SE

    Enhancing User Interaction in ChatGPT: Characterizing and Consolidating Multiple Prompts for Issue Resolution

    Authors: Saikat Mondal, Suborno Deb Bappon, Chanchal K. Roy

    Abstract: Prompt design plays a crucial role in sha** the efficacy of ChatGPT, influencing the model's ability to extract contextually accurate responses. Thus, optimal prompt construction is essential for maximizing the utility and performance of ChatGPT. However, sub-optimal prompt design may necessitate iterative refinement, as imprecise or ambiguous instructions can lead to undesired responses from Ch… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: This paper has been accepted at the 21st International Conference on Mining Software Repositories (MSR 2024)

  4. arXiv:2402.03735  [pdf, other

    cs.SE

    Investigating the Utility of ChatGPT in the Issue Tracking System: An Exploratory Study

    Authors: Joy Krishan Das, Saikat Mondal, Chanchal K. Roy

    Abstract: Issue tracking systems serve as the primary tool for incorporating external users and customizing a software project to meet the users' requirements. However, the limited number of contributors and the challenge of identifying the best approach for each issue often impede effective resolution. Recently, an increasing number of developers are turning to AI tools like ChatGPT to enhance problem-solv… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: Accepted in MSR 2024

  5. arXiv:2312.03182  [pdf, other

    cs.SE

    Investigating Technology Usage Span by Analyzing Users' Q&A Traces in Stack Overflow

    Authors: Saikat Mondal, Debajyoti Mondal, Chanchal K. Roy

    Abstract: Choosing an appropriate software development technology (e.g., programming language) is challenging due to the proliferation of diverse options. The selection of inappropriate technologies for development may have a far-reaching effect on software developers' career growth. Switching to a different technology after working with one may lead to a complex learning curve and, thus, be more challengin… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: Accepted in the 30th Asia-Pacific Software Engineering Conference (APSEC 2023)

  6. arXiv:2311.13652  [pdf, other

    cs.SI cs.CY physics.soc-ph

    Differences of communication activity and mobility patterns between urban and rural people

    Authors: Fumiko Ogushi, Chandreyee Roy, Kimmo Kaski

    Abstract: Human mobility and other social activity patterns influence various aspects of society such as urban planning, traffic predictions, crisis resilience, and epidemic prevention. The behaviour of individuals, like their communication frequencies and movements, are shaped by societal and socio-economic factors. In addition, the differences in the geolocation of people as well as their gender and age c… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: 16 pages, 9 figures in main text

  7. arXiv:2309.06424  [pdf

    cs.SE cs.AI cs.LG

    Unveiling the potential of large language models in generating semantic and cross-language clones

    Authors: Palash R. Roy, Ajmain I. Alam, Farouq Al-omari, Banani Roy, Chanchal K. Roy, Kevin A. Schneider

    Abstract: Semantic and Cross-language code clone generation may be useful for code reuse, code comprehension, refactoring and benchmarking. OpenAI's GPT model has potential in such clone generation as GPT is used for text generation. When developers copy/paste codes from Stack Overflow (SO) or within a system, there might be inconsistent changes leading to unexpected behaviours. Similarly, if someone posses… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: Accepted in IWSC

  8. arXiv:2308.13963  [pdf

    cs.SE

    GPTCloneBench: A comprehensive benchmark of semantic clones and cross-language clones using GPT-3 model and SemanticCloneBench

    Authors: Ajmain Inqiad Alam, Palash Ranjan Roy, Farouq Al-omari, Chanchal Kumar Roy, Banani Roy, Kevin Schneider

    Abstract: With the emergence of Machine Learning, there has been a surge in leveraging its capabilities for problem-solving across various domains. In the code clone realm, the identification of type-4 or semantic clones has emerged as a crucial yet challenging task. Researchers aim to utilize Machine Learning to tackle this challenge, often relying on the BigCloneBench dataset. However, it's worth noting t… ▽ More

    Submitted 1 September, 2023; v1 submitted 26 August, 2023; originally announced August 2023.

    Comments: Accepted in 39th IEEE International Conference on Software Maintenance and Evolution(ICSME 2023)

  9. arXiv:2306.16171  [pdf

    cs.SE cs.AI cs.PL

    A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges

    Authors: Morteza Zakeri-Nasrabadi, Saeed Parsa, Mohammad Ramezani, Chanchal Roy, Masoud Ekhtiarzadeh

    Abstract: Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the exist… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: 49 pages, 10 figures, 6 tables

  10. arXiv:2306.14011  [pdf, other

    cs.PF physics.comp-ph

    Machine Learning-driven Autotuning of Graphics Processing Unit Accelerated Computational Fluid Dynamics for Enhanced Performance

    Authors: Weicheng Xue, Christohper John Roy

    Abstract: Optimizing the performance of computational fluid dynamics (CFD) applications accelerated by graphics processing units (GPUs) is crucial for efficient simulations. In this study, we employed a machine learning-based autotuning technique to optimize 14 key parameters related to GPU kernel scheduling, including the number of thread blocks and threads within a block. Our approach utilizes fully conne… ▽ More

    Submitted 20 February, 2024; v1 submitted 24 June, 2023; originally announced June 2023.

  11. arXiv:2305.18057  [pdf, other

    cs.DC cs.PF

    CPU-GPU Heterogeneous Code Acceleration of a Finite Volume Computational Fluid Dynamics Solver

    Authors: Weicheng Xue, Hongyu Wang, Christopher J. Roy

    Abstract: This work deals with the CPU-GPU heterogeneous code acceleration of a finite-volume CFD solver utilizing multiple CPUs and GPUs at the same time. First, a high-level description of the CFD solver called SENSEI, the discretization of SENSEI, and the CPU-GPU heterogeneous computing workflow in SENSEI leveraging MPI and OpenACC are given. Then, a performance model for CPU-GPU heterogeneous computing… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

  12. arXiv:2304.03563  [pdf, other

    cs.SE

    Do Subjectivity and Objectivity Always Agree? A Case Study with Stack Overflow Questions

    Authors: Saikat Mondal, Mohammad Masudur Rahman, Chanchal K. Roy

    Abstract: In Stack Overflow (SO), the quality of posts (i.e., questions and answers) is subjectively evaluated by users through a voting mechanism. The net votes (upvotes - downvotes) obtained by a post are often considered an approximation of its quality. However, about half of the questions that received working solutions got more downvotes than upvotes. Furthermore, about 18% of the accepted answers (i.e… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

    Comments: Accepted in the International Conference on Mining Software Repositories (MSR 2023)

  13. arXiv:2303.01435  [pdf, other

    cs.SE

    Pathways to Leverage Transcompiler based Data Augmentation for Cross-Language Clone Detection

    Authors: Subroto Nag Pinku, Debajyoti Mondal, Chanchal K. Roy

    Abstract: Software clones are often introduced when developers reuse code fragments to implement similar functionalities in the same or different software systems. Many high-performing clone detection tools today are based on deep learning techniques and are mostly used for detecting clones written in the same programming language, whereas clone detection tools for detecting cross-language clones are also e… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted at the 31st IEEE/ACM International Conference on Program Comprehension (ICPC 2023)

    ACM Class: D.2; D.2.13

  14. arXiv:2210.03281  [pdf, other

    cs.SE

    Automatic Prediction of Rejected Edits in Stack Overflow

    Authors: Saikat Mondal, Gias Uddin, Chanchal Roy

    Abstract: The content quality of shared knowledge in Stack Overflow (SO) is crucial in supporting software developers with their programming problems. Thus, SO allows its users to suggest edits to improve the quality of a post (i.e., question and answer). However, existing research shows that many suggested edits in SO are rejected due to undesired contents/formats or violating edit guidelines. Such a scena… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: Accepted for publication in Empirical Software Engineering (EMSE) journal

  15. arXiv:2204.11449  [pdf, other

    cs.CV

    OCFormer: One-Class Transformer Network for Image Classification

    Authors: Prerana Mukherjee, Chandan Kumar Roy, Swalpa Kumar Roy

    Abstract: We propose a novel deep learning framework based on Vision Transformers (ViT) for one-class classification. The core idea is to use zero-centered Gaussian noise as a pseudo-negative class for latent space representation and then train the network using the optimal loss function. In prior works, there have been tremendous efforts to learn a good representation using varieties of loss functions, whi… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

  16. Backports: Change Types, Challenges and Strategies

    Authors: Debasish Chakroborti, Kevin A. Schneider, Chanchal K. Roy

    Abstract: Source code repositories allow developers to manage multiple versions (or branches) of a software system. Pull-requests are used to modify a branch, and backporting is a regular activity used to port changes from a current development branch to other versions. In open-source software, backports are common and often need to be adapted by hand, which motivates us to explore backports and backporting… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

    Comments: In 30th International Conference on Program Comprehension (ICPC 22), May 16 to 17, 2022, Virtual Event, Pittsburgh

  17. arXiv:2201.10137  [pdf, other

    cs.SE cs.LG

    Leveraging Structural Properties of Source Code Graphs for Just-In-Time Bug Prediction

    Authors: Md Nadim, Debajyoti Mondal, Chanchal K. Roy

    Abstract: The most common use of data visualization is to minimize the complexity for proper understanding. A graph is one of the most commonly used representations for understanding relational data. It produces a simplified representation of data that is challenging to comprehend if kept in a textual format. In this study, we propose a methodology to utilize the relational properties of source code in the… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

    Comments: Has been accepted for publication Automated Software Engineering (AUSE), an International Journal published by Springer

  18. Evaluating the Performance of Clone Detection Tools in Detecting Cloned Co-change Candidates

    Authors: Md Nadim, Manishankar Mondal, Chanchal K. Roy, Kevin Schneider

    Abstract: Co-change candidates are the group of code fragments that require a change if any of these fragments experience a modification in a commit operation during software evolution. The cloned co-change candidates are a subset of the co-change candidates, and the members in this subset are clones of one another. The cloned co-change candidates are usually created by reusing existing code fragments in a… ▽ More

    Submitted 19 January, 2022; originally announced January 2022.

    Comments: Has been accepted for publication in The Journal of Systems & Software (JSS)

  19. arXiv:2112.07719  [pdf, other

    cs.CV

    Decomposing the Deep: Finding Class Specific Filters in Deep CNNs

    Authors: Akshay Badola, Cherian Roy, Vineet Padmanabhan, Rajendra Lal

    Abstract: Interpretability of Deep Neural Networks has become a major area of exploration. Although these networks have achieved state of the art accuracy in many tasks, it is extremely difficult to interpret and explain their decisions. In this work we analyze the final and penultimate layers of Deep Convolutional Networks and provide an efficient method for identifying subsets of features that contribute… ▽ More

    Submitted 3 April, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: 22 pages, 5 figures, 8 tables. github repo: https://github.com/akshaybadola/cnn-class-specific-filters-with-histogram. Preprint submitted to Elsevier. This version contains visualization of filters and ablation study w.r.t. influential features

  20. arXiv:2111.12204  [pdf, other

    cs.SE

    The Reproducibility of Programming-Related Issues in Stack Overflow Questions

    Authors: Saikat Mondal, Mohammad Masudur Rahman, Chanchal K. Roy, Kevin Schneider

    Abstract: Software developers often look for solutions to their code-level problems using the Stack Overflow Q&A website. To receive help, developers frequently submit questions containing sample code segments and the description of the programming issue. Unfortunately, it is not always possible to reproduce the issues from the code segments that may impede questions from receiving prompt and appropriate so… ▽ More

    Submitted 25 December, 2021; v1 submitted 23 November, 2021; originally announced November 2021.

    Comments: This study has been accepted for publication in Empirical Software Engineering EMSE) journal

  21. arXiv:2111.03196  [pdf, other

    cs.SE cs.LG

    An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Datasets

    Authors: Gias Uddin, Yann-Gael Gueheneuc, Foutse Khomh, Chanchal K Roy

    Abstract: Sentiment analysis in software engineering (SE) has shown promise to analyze and support diverse development activities. We report the results of an empirical study that we conducted to determine the feasibility of develo** an ensemble engine by combining the polarity labels of stand-alone SE-specific sentiment detectors. Our study has two phases. In the first phase, we pick five SE-specific sen… ▽ More

    Submitted 4 November, 2021; originally announced November 2021.

    Journal ref: ACM Transactions on Software Engineering and Methodology (TOSEM), 2021

  22. arXiv:2109.03624  [pdf, other

    physics.med-ph cs.LG eess.IV

    FaBiAN: A Fetal Brain magnetic resonance Acquisition Numerical phantom

    Authors: Hélène Lajous, Christopher W. Roy, Tom Hilbert, Priscille de Dumast, Sébastien Tourbier, Yasser Alemán-Gómez, Jérôme Yerly, Thomas Yu, Hamza Kebiri, Kelly Payette, Jean-Baptiste Ledoux, Reto Meuli, Patric Hagmann, Andras Jakab, Vincent Dunet, Mériam Koob, Tobias Kober, Matthias Stuber, Meritxell Bach Cuadra

    Abstract: Accurate characterization of in utero human brain maturation is critical as it involves complex and interconnected structural and functional processes that may influence health later in life. Magnetic resonance imaging is a powerful tool to investigate equivocal neurological patterns during fetal development. However, the number of acquisitions of satisfactory quality available in this cohort of s… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: 23 pages, 9 figures (including Supplementary Material), 4 tables, 1 supplement. Submitted to Scientific Reports (2021)

  23. arXiv:2109.00659  [pdf, other

    cs.SE

    Semantic Slicing of Architectural Change Commits: Towards Semantic Design Review

    Authors: Amit Kumar Mondal, Chanchal K. Roy, Kevin A. Schneider, Banani Roy, Sristy Sumana Nath

    Abstract: Software architectural changes involve more than one module or component and are complex to analyze compared to local code changes. Development teams aiming to review architectural aspects (design) of a change commit consider many essential scenarios such as access rules and restrictions on usage of program entities across modules. Moreover, design review is essential when proper architectural for… ▽ More

    Submitted 1 September, 2021; originally announced September 2021.

  24. arXiv:2108.09646  [pdf, other

    cs.SE cs.IR cs.LG cs.NE

    A Systematic Review of Automated Query Reformulations in Source Code Search

    Authors: Mohammad Masudur Rahman, Chanchal K. Roy

    Abstract: Fixing software bugs and adding new features are two of the major maintenance tasks. Software bugs and features are reported as change requests. Developers consult these requests and often choose a few keywords from them as an ad hoc query. Then they execute the query with a search engine to find the exact locations within software code that need to be changed. Unfortunately, even experienced deve… ▽ More

    Submitted 8 June, 2023; v1 submitted 22 August, 2021; originally announced August 2021.

    Comments: 81 pages, accepted at TOSEM

    ACM Class: D.2.5; D.2.1; D.2.7; D.2.13

  25. arXiv:2108.05341  [pdf, other

    cs.SE cs.IR cs.LG

    The Forgotten Role of Search Queries in IR-based Bug Localization: An Empirical Study

    Authors: Mohammad Masudur Rahman, Foutse Khomh, Shamima Yeasmin, Chanchal K. Roy

    Abstract: Being light-weight and cost-effective, IR-based approaches for bug localization have shown promise in finding software bugs. However, the accuracy of these approaches heavily depends on their used bug reports. A significant number of bug reports contain only plain natural language texts. According to existing studies, IR-based approaches cannot perform well when they use these bug reports as searc… ▽ More

    Submitted 11 August, 2021; originally announced August 2021.

    Comments: 57 pages, EMSE (2021)

    ACM Class: D.2; D.2.5; D.2.7

  26. arXiv:2108.02702  [pdf, other

    cs.SE

    Improved Retrieval of Programming Solutions With Code Examples Using a Multi-featured Score

    Authors: Rodrigo F. Silva, M. Masudur Rahman, Carlos Eduardo Dantas, Chanchal Roy, Foutse Khomh, Marcelo A. Maia

    Abstract: Developers often depend on code search engines to obtain solutions for their programming tasks. However, finding an expected solution containing code examples along with their explanations is challenging due to several issues. There is a vocabulary mismatch between the search keywords (the query) and the appropriate solutions. Semantic gap may increase for similar bag of words due to antonyms and… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

    Comments: 31 pages, 5 figures, 9 tables

  27. arXiv:2102.08874  [pdf, other

    cs.SE

    Mining API Usage Scenarios from Stack Overflow

    Authors: Gias Uddin, Foutse Khomh, Chanchal K Roy

    Abstract: We propose a framework to mine API usage scenarios from Stack Overflow. Each task consists of a code example, the task description, and the reactions of developers towards the code example. First, we present an algorithm to automatically link a code example in a forum post to an API mentioned in the textual contents of the forum post. Second, we generate a natural language description of the task… ▽ More

    Submitted 17 February, 2021; originally announced February 2021.

    Journal ref: 2020 Information and Software Technology (IST)

  28. arXiv:2102.08502  [pdf, other

    cs.SE

    Automatic API Usage Scenario Documentation from Technical Q&A Sites

    Authors: Gias Uddin, Foutse Khomh, Chanchal K Roy

    Abstract: The online technical Q&A site Stack Overflow (SO) is popular among developers to support their coding and diverse development needs. To address shortcomings in API official documentation resources, several research has thus focused on augmenting official API documentation with insights (e.g., code examples) from SO. The techniques propose to add code examples/insights about APIs into its official… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Journal ref: 2021 ACM Transactions on Software Engineering and Methodology (TOSEM)

  29. An Improved Framework of GPU Computing for CFD Applications on Structured Grids using OpenACC

    Authors: Weicheng Xue, Charles W. Jackson, Christoper J. Roy

    Abstract: This paper is focused on improving multi-GPU performance of a research CFD code on structured grids. MPI and OpenACC directives are used to scale the code up to 16 GPUs. This paper shows that using 16 P100 GPUs and 16 V100 GPUs can be 30$\times$ and 70$\times$ faster than 16 Xeon CPU E5-2680v4 cores for three different test cases, respectively. A series of performance issues related to the scaling… ▽ More

    Submitted 4 December, 2020; originally announced December 2020.

    Comments: 43 pages, 27 figures

  30. arXiv:2007.06544  [pdf

    eess.IV cs.CV physics.med-ph

    Free-running SIMilarity-Based Angiography (SIMBA) for simplified anatomical MR imaging of the heart

    Authors: John Heerfordt, Kevin K. Whitehead, Jessica A. M. Bastiaansen, Lorenzo Di Sopra, Christopher W. Roy, Jérôme Yerly, Bastien Milani, Mark A. Fogel, Matthias Stuber, Davide Piccini

    Abstract: Purpose: Whole-heart MRA techniques typically target pre-determined motion states and address cardiac and respiratory dynamics independently. We propose a novel fast reconstruction algorithm, applicable to ungated free-running sequences, that leverages inherent similarities in the acquired data to avoid such physiological constraints. Theory and Methods: The proposed SIMilarity-Based Angiography… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

    Comments: 8 figures, 2 tables

    Journal ref: Magnetic Resonance in Medicine, 24 February 2021

  31. arXiv:2006.15682  [pdf, other

    cs.SE

    A Survey on the Evaluation of Clone Detection Performance and Benchmarking

    Authors: Jeffrey Svajlenko, Chanchal K. Roy

    Abstract: There are a great many clone detection tools proposed in the literature. In this paper, we investigate the state of clone detection tool evaluation. We begin by surveying the clone detection benchmarks, and performing a multi-faceted evaluation and comparison of their features and capabilities. We then survey the existing clone detection tool and technique publications, and evaluate how the author… ▽ More

    Submitted 28 June, 2020; originally announced June 2020.

    Comments: 109 pages, review article, several figures and tables, and 277 references. It covers the whole area of clone detection and evaluation literature

  32. Multi-GPU Performance Optimization of a CFD Code using OpenACC on Different Platforms

    Authors: Weicheng Xue, Christopher J. Roy

    Abstract: This paper investigates the multi-GPU performance of a 3D buoyancy driven cavity solver using MPI and OpenACC directives on different platforms. The paper shows that decomposing the total problem in different dimensions affects the strong scaling performance significantly for the GPU. Without proper performance optimizations, it is shown that 1D domain decomposition scales poorly on multiple GPUs… ▽ More

    Submitted 3 June, 2020; originally announced June 2020.

  33. arXiv:2005.02335  [pdf, other

    cs.HC cs.AI cs.LG

    Don't Explain without Verifying Veracity: An Evaluation of Explainable AI with Video Activity Recognition

    Authors: Mahsan Nourani, Chiradeep Roy, Tahrima Rahman, Eric D. Ragan, Nicholas Ruozzi, Vibhav Gogate

    Abstract: Explainable machine learning and artificial intelligence models have been used to justify a model's decision-making process. This added transparency aims to help improve user performance and understanding of the underlying model. However, in practice, explainable systems face many open questions and challenges. Specifically, designers might reduce the complexity of deep learning models in order to… ▽ More

    Submitted 5 May, 2020; originally announced May 2020.

    ACM Class: H.1.2

  34. The Vision of Software Clone Management: Past, Present, and Future

    Authors: Chanchal K. Roy, Minhaz F. Zibran, Rainer Koschke

    Abstract: Duplicated code or code clones are a kind of code smell that have both positive and negative impacts on the development and maintenance of software systems. Software clone research in the past mostly focused on the detection and analysis of code clones, while research in recent years extends to the whole spectrum of clone management. In the last decade, three surveys appeared in the literature, wh… ▽ More

    Submitted 3 May, 2020; originally announced May 2020.

    Comments: 16 pages

    Journal ref: 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE), Antwerp, 2014, pp. 18-33

  35. arXiv:2005.00967  [pdf, other

    cs.SE

    A Machine Learning Based Framework for Code Clone Validation

    Authors: Golam Mostaeen, Banani Roy, Chanchal Roy, Kevin Schneider, Jeffrey Svajlenko

    Abstract: A code clone is a pair of code fragments, within or between software systems that are similar. Since code clones often negatively impact the maintainability of a software system, several code clone detection techniques and tools have been proposed and studied over the last decade. To detect all possible similar source code patterns in general, the clone detection tools work on the syntax level whi… ▽ More

    Submitted 2 May, 2020; originally announced May 2020.

  36. An Exploratory Study to Find Motives Behind Cross-platform Forks from Software Heritage Dataset

    Authors: Avijit Bhattacharjee, Sristy Sumana Nath, Shurui Zhou, Debasish Chakroborti, Banani Roy, Chanchal K. Roy, Kevin Schneider

    Abstract: The fork-based development mechanism provides the flexibility and the unified processes for software teams to collaborate easily in a distributed setting without too much coordination overhead.Currently, multiple social coding platforms support fork-based development, such as GitHub, GitLab, and Bitbucket. Although these different platforms virtually share the same features, they have different em… ▽ More

    Submitted 17 March, 2020; originally announced March 2020.

    Comments: Accepted at 17th International Conference on Mining Software Repositories, October 5--6, 2020, Seoul, Republic of Korea

  37. arXiv:1910.11125  [pdf, other

    cs.DC cs.SE

    Micro-level Modularity of Computaion-intensive Programs in Big Data Platforms: A Case Study with Image Data

    Authors: Amit Kumar Mondal, Banani Roy, Chanchal K. Roy, Kevin A. Schneider

    Abstract: With the rapid advancement of Big Data platforms such as Hadoop, Spark, and Dataflow, many tools are being developed that are intended to provide end users with an interactive environment for large-scale data analysis (e.g., IQmulus). However, there are challenges using these platforms. For example, developers find it difficult to use these platforms when develo** interactive and reusable data a… ▽ More

    Submitted 19 October, 2019; originally announced October 2019.

  38. arXiv:1909.04238  [pdf, other

    cs.SE

    LVMapper: A Large-variance Clone Detector Using Sequencing Alignment Approach

    Authors: Ming Wu, Pengcheng Wang, Kangqi Yin, Haoyu Cheng, Yun Xu, Chanchal K. Roy

    Abstract: To detect large-variance code clones (i.e. clones with relatively more differences) in large-scale code repositories is difficult because most current tools can only detect almost identical or very similar clones. It will make promotion and changes to some software applications such as bug detection, code completion, software analysis, etc. Recently, CCAligner made an attempt to detect clones with… ▽ More

    Submitted 9 September, 2019; originally announced September 2019.

  39. arXiv:1909.03166  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    Equalizing Recourse across Groups

    Authors: Vivek Gupta, Pegah Nokhiz, Chitradeep Dutta Roy, Suresh Venkatasubramanian

    Abstract: The rise in machine learning-assisted decision-making has led to concerns about the fairness of the decisions and techniques to mitigate problems of discrimination. If a negative decision is made about an individual (denying a loan, rejecting an application for housing, and so on) justice dictates that we be able to ask how we might change circumstances to get a favorable decision the next time. M… ▽ More

    Submitted 6 September, 2019; originally announced September 2019.

    Comments: 13 pages, 4 figures, 2 tables

  40. arXiv:1904.05514  [pdf, other

    cs.LG cs.CV stat.ML

    Mitigating Information Leakage in Image Representations: A Maximum Entropy Approach

    Authors: Proteek Chandan Roy, Vishnu Naresh Boddeti

    Abstract: Image recognition systems have demonstrated tremendous progress over the past few decades thanks, in part, to our ability of learning compact and robust representations of images. As we witness the wide spread adoption of these systems, it is imperative to consider the problem of unintended leakage of information from an image representation, which might compromise the privacy of the data owner. T… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

    Comments: Accepted for oral presentation at CVPR 2019

  41. arXiv:1903.07662  [pdf, other

    cs.SE

    Recommending Comprehensive Solutions for Programming Tasks by Mining Crowd Knowledge

    Authors: Rodrigo F. G. Silva, Chanchal K. Roy, Mohammad Masudur Rahman, Kevin A. Schneider, Klerisson Paixao, Marcelo de Almeida Maia

    Abstract: Developers often search for relevant code examples on the web for their programming tasks. Unfortunately, they face two major problems. First, the search is impaired due to a lexical gap between their query (task description) and the information associated with the solution. Second, the retrieved solution may not be comprehensive, i.e., the code segment might miss a succinct explanation. These pro… ▽ More

    Submitted 20 March, 2019; v1 submitted 18 March, 2019; originally announced March 2019.

    Comments: Accepted at ICPC, 12 pages, 2019

  42. arXiv:1902.03501  [pdf, other

    cs.LG cs.HC stat.ML

    Assessing the Local Interpretability of Machine Learning Models

    Authors: Dylan Slack, Sorelle A. Friedler, Carlos Scheidegger, Chitradeep Dutta Roy

    Abstract: The increasing adoption of machine learning tools has led to calls for accountability via model interpretability. But what does it mean for a machine learning model to be interpretable by humans, and how can this be assessed? We focus on two definitions of interpretability that have been introduced in the machine learning literature: simulatability (a user's ability to run a model on a given input… ▽ More

    Submitted 2 August, 2019; v1 submitted 9 February, 2019; originally announced February 2019.

  43. arXiv:1812.00975  [pdf, other

    cs.LG stat.ML

    Structure Learning Using Forced Pruning

    Authors: Ahmed Abdelatty, Pracheta Sahoo, Chiradeep Roy

    Abstract: Markov networks are widely used in many Machine Learning applications including natural language processing, computer vision, and bioinformatics . Learning Markov networks have many complications ranging from intractable computations involved to the possibility of learning a model with a huge number of parameters. In this report, we provide a computationally tractable greedy heuristic for learning… ▽ More

    Submitted 3 December, 2018; originally announced December 2018.

  44. arXiv:1808.00594  [pdf, other

    cs.SE

    Improving IR-Based Bug Localization with Context-Aware Query Reformulation

    Authors: Mohammad Masudur Rahman, Chanchal K. Roy

    Abstract: Recent findings suggest that Information Retrieval (IR)-based bug localization techniques do not perform well if the bug report lacks rich structured information (eg relevant program entity names). Conversely, excessive structured information (eg stack traces) in the bug report might not always help the automated localization either. In this paper, we propose a novel technique--BLIZZARD-- that aut… ▽ More

    Submitted 1 August, 2018; originally announced August 2018.

    Comments: To be presented at The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018), FL, USA

    Journal ref: In Proc. ESEC/FSE 2018

  45. arXiv:1807.08798  [pdf, other

    cs.SE

    Effective Reformulation of Query for Code Search using Crowdsourced Knowledge and Extra-Large Data Analytics

    Authors: Mohammad Masudur Rahman, Chanchal K. Roy

    Abstract: Software developers frequently issue generic natural language queries for code search while using code search engines (e.g., GitHub native search, Krugle). Such queries often do not lead to any relevant results due to vocabulary mismatch problems. In this paper, we propose a novel technique that automatically identifies relevant and specific API classes from Stack Overflow Q & A site for a program… ▽ More

    Submitted 23 July, 2018; originally announced July 2018.

    Comments: The 34th International Conference on Software Maintenance and Evolution (ICSME 2018), pp. 12, Madrid, Spain, September, 2018

    Journal ref: Proc. ICSME 2018

  46. Poster: Improving Bug Localization with Report Quality Dynamics and Query Reformulation

    Authors: Mohammad Masudur Rahman, Chanchal K. Roy

    Abstract: Recent findings from a user study suggest that IR-based bug localization techniques do not perform well if the bug report lacks rich structured information such as relevant program entity names. On the contrary, excessive structured information such as stack traces in the bug report might always not be helpful for the automated bug localization. In this paper, we conduct a large empirical study us… ▽ More

    Submitted 19 July, 2018; originally announced July 2018.

    Comments: The 40th International Conference on Software Engineering (Companion volume, Poster Track) (ICSE 2018), pp. 348--349, Gothenburg, Sweden, May, 2018

    Journal ref: Proc. ICSE-C 2018, pp. 348--349

  47. arXiv:1807.04488  [pdf, other

    cs.SE

    Improved Query Reformulation for Concept Location using CodeRank and Document Structures

    Authors: Mohammad Masudur Rahman, Chanchal K. Roy

    Abstract: During software maintenance, developers usually deal with a significant number of software change requests. As a part of this, they often formulate an initial query from the request texts, and then attempt to map the concepts discussed in the request to relevant source code locations in the software system (a.k.a., concept location). Unfortunately, studies suggest that they often perform poorly in… ▽ More

    Submitted 12 July, 2018; originally announced July 2018.

    Comments: The 32nd International Conference on Automated Software Engineering (ASE 2017), pp. 428-439, Urbana-Champaign, Illinois, USA, October, 2017

    Report number: 10.1109/ASE.2017.8115655

    Journal ref: Proc. ASE 2017, pp. 428-439

  48. arXiv:1807.04485  [pdf, other

    cs.SE

    Predicting Usefulness of Code Review Comments using Textual Features and Developer Experience

    Authors: Mohammad Masudur Rahman, Chanchal K. Roy, Raula G. Kula

    Abstract: Although peer code review is widely adopted in both commercial and open source development, existing studies suggest that such code reviews often contain a significant amount of non-useful review comments. Unfortunately, to date, no tools or techniques exist that can provide automatic support in improving those non-useful comments. In this paper, we first report a comparative study between useful… ▽ More

    Submitted 12 July, 2018; originally announced July 2018.

    Comments: The 14th International Conference on Mining Software Repositories (MSR 2017), pp. 215--226, Buenos Aires, Argentina, May, 2017

    Report number: 10.1109/MSR.2017.17

    Journal ref: Proc. MSR 2017, pp. 215--226

  49. arXiv:1807.04479  [pdf, other

    cs.SE

    RACK: Code Search in the IDE using Crowdsourced Knowledge

    Authors: Mohammad Masudur Rahman, Chanchal K. Roy, David Lo

    Abstract: Traditional code search engines often do not perform well with natural language queries since they mostly apply keyword matching. These engines thus require carefully designed queries containing information about programming APIs for code search. Unfortunately, existing studies suggest that preparing an effective query for code search is both challenging and time consuming for the developers. In t… ▽ More

    Submitted 12 July, 2018; originally announced July 2018.

    Comments: The 39th International Conference on Software Engineering (Companion volume) (ICSE 2017), pp. 51--54, Buenos Aires, Argentina, May, 2017

    Report number: 10.1109/ICSE-C.2017.11

    Journal ref: Proc. ICSE-C 2017, pp. 51--54

  50. arXiv:1807.04475  [pdf, other

    cs.SE

    STRICT: Information Retrieval Based Search Term Identification for Concept Location

    Authors: Mohammad Masudur Rahman, Chanchal K. Roy

    Abstract: During maintenance, software developers deal with numerous change requests that are written in an unstructured fashion using natural language. Such natural language texts illustrate the change requirement involving various domain related concepts. Software developers need to find appropriate search terms from those concepts so that they could locate the possible locations in the source code using… ▽ More

    Submitted 12 July, 2018; originally announced July 2018.

    Comments: The 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2017), pp. 79--90, Klagenfurt, Austria, February 2017

    Report number: 10.1109/SANER.2017.7884611

    Journal ref: Proc. SANER 2017, pp. 79--90