Search | arXiv e-print repository

Drop it All or Pick it Up? How Developers Responded to the Log4JShell Vulnerability

Authors: Vittunyuta Maeprasart, Ali Ouni, Raula Gaikovina Kula

Abstract: Although using third-party libraries has become prevalent in contemporary software development, developers often struggle to update their dependencies. Prior works acknowledge that due to the migration effort, priority and other issues cause lags in the migration process. The common assumption is that developers should drop all other activities and prioritize fixing the vulnerability. Our objectiv… ▽ More Although using third-party libraries has become prevalent in contemporary software development, developers often struggle to update their dependencies. Prior works acknowledge that due to the migration effort, priority and other issues cause lags in the migration process. The common assumption is that developers should drop all other activities and prioritize fixing the vulnerability. Our objective is to understand developer behavior when facing high-risk vulnerabilities in their code. We explore the prolific, and possibly one of the cases of the Log4JShell, a vulnerability that has the highest severity rating ever, which received widespread media attention. Using a mixed-method approach, we analyze 219 GitHub Pull Requests (PR) and 354 issues belonging to 53 Maven projects affected by the Log4JShell vulnerability. Our study confirms that developers show a quick response taking from 5 to 6 days. However, instead of drop** everything, surprisingly developer activities tend to increase for all pending issues and PRs. Developer discussions involved either giving information (29.3\%) and seeking information (20.6\%), which is missing in existing support tools. Leveraging this possibly-one of a kind event, insights opens up a new line of research, causing us to rethink best practices and what developers need in order to efficiently fix vulnerabilities. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: Accepted to SERA24. arXiv admin note: text overlap with arXiv:2406.11362

arXiv:2406.11362 [pdf, other]

Characterising Contributions that Coincide with Vulnerability Mitigation in NPM Libraries

Authors: Ruksit Rojpaisarnkit, Hathaichanok Damrongsiri, Christoph Treude, Ali Ouni, Raula Gaikovina Kula

Abstract: With the urgent need to secure supply chains among Open Source libraries, attention has focused on mitigating vulnerabilities detected in these libraries. Although awareness has improved recently, most studies still report delays in the mitigation process. This suggests that developers still have to deal with other contributions that occur during the period of fixing vulnerabilities, such as coinc… ▽ More With the urgent need to secure supply chains among Open Source libraries, attention has focused on mitigating vulnerabilities detected in these libraries. Although awareness has improved recently, most studies still report delays in the mitigation process. This suggests that developers still have to deal with other contributions that occur during the period of fixing vulnerabilities, such as coinciding Pull Requests (PRs) and Issues, yet the impact of these contributions remains unclear. To characterize these contributions, we conducted a mixed-method empirical study to analyze NPM GitHub projects affected by 554 different vulnerability advisories, mining a total of 4,699 coinciding PRs and Issues. We believe that tool development and improved workload management for developers have the potential to create a more efficient and effective vulnerability mitigation process. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 6 pages, 3 figures, 3 tables, 22nd IEEE/ACIS International Conference on Software Engineering, Management and Applications (SERA 2024)

ACM Class: D.2.7; D.2.9

arXiv:2402.06035 [pdf, other]

AntiCopyPaster 2.0: Whitebox just-in-time code duplicates extraction

Authors: Eman Abdullah AlOmar, Benjamin Knobloch, Thomas Kain, Christopher Kalish, Mohamed Wiem Mkaouer, Ali Ouni

Abstract: AntiCopyPaster is an IntelliJ IDEA plugin, implemented to detect and refactor duplicate code interactively as soon as a duplicate is introduced. The plugin only recommends the extraction of a duplicate when it is worth it. In contrast to current Extract Method refactoring approaches, our tool seamlessly integrates with the developer's workflow and actively provides recommendations for refactorings… ▽ More AntiCopyPaster is an IntelliJ IDEA plugin, implemented to detect and refactor duplicate code interactively as soon as a duplicate is introduced. The plugin only recommends the extraction of a duplicate when it is worth it. In contrast to current Extract Method refactoring approaches, our tool seamlessly integrates with the developer's workflow and actively provides recommendations for refactorings. This work extends our tool to allow developers to customize the detection rules, i.e., metrics, based on their needs and preferences. The plugin and its source code are publicly available on GitHub at https://github.com/refactorings/anti-copy-paster. The demonstration video can be found on YouTube: https://youtu.be/ Y1sbfpds2Ms. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.06013 [pdf, other]

How to Refactor this Code? An Exploratory Study on Developer-ChatGPT Refactoring Conversations

Authors: Eman Abdullah AlOmar, Anushkrishna Venkatakrishnan, Mohamed Wiem Mkaouer, Christian D. Newman, Ali Ouni

Abstract: Large Language Models (LLMs), like ChatGPT, have gained widespread popularity and usage in various software engineering tasks, including refactoring, testing, code review, and program comprehension. Despite recent studies delving into refactoring documentation in commit messages, issues, and code review, little is known about how developers articulate their refactoring needs when interacting with… ▽ More Large Language Models (LLMs), like ChatGPT, have gained widespread popularity and usage in various software engineering tasks, including refactoring, testing, code review, and program comprehension. Despite recent studies delving into refactoring documentation in commit messages, issues, and code review, little is known about how developers articulate their refactoring needs when interacting with ChatGPT. In this paper, our goal is to explore conversations between developers and ChatGPT related to refactoring to better understand how developers identify areas for improvement in code and how ChatGPT addresses developers' needs. Our approach relies on text mining refactoring-related conversations from 17,913 ChatGPT prompts and responses, and investigating developers' explicit refactoring intention. Our results reveal that (1) developer-ChatGPT conversations commonly involve generic and specific terms/phrases; (2) developers often make generic refactoring requests, while ChatGPT typically includes the refactoring intention; and (3) various learning settings when prompting ChatGPT in the context of refactoring. We envision that our findings contribute to a broader understanding of the collaboration between developers and AI models, in the context of code refactoring, with implications for model improvement, tool development, and best practices in software engineering. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2312.12600 [pdf, other]

Behind the Intent of Extract Method Refactoring: A Systematic Literature Review

Authors: Eman Abdullah AlOmar, Mohamed Wiem Mkaouer, Ali Ouni

Abstract: Code refactoring is widely recognized as an essential software engineering practice to improve the understandability and maintainability of the source code. The Extract Method refactoring is considered as "Swiss army knife" of refactorings, as developers often apply it to improve their code quality. In recent years, several studies attempted to recommend Extract Method refactorings allowing the co… ▽ More Code refactoring is widely recognized as an essential software engineering practice to improve the understandability and maintainability of the source code. The Extract Method refactoring is considered as "Swiss army knife" of refactorings, as developers often apply it to improve their code quality. In recent years, several studies attempted to recommend Extract Method refactorings allowing the collection, analysis, and revelation of actionable data-driven insights about refactoring practices within software projects. In this paper, we aim at reviewing the current body of knowledge on existing Extract Method refactoring research and explore their limitations and potential improvement opportunities for future research efforts. Hence, researchers and practitioners begin to be aware of the state-of-the-art and identify new research opportunities in this context. We review the body of knowledge related to Extract Method refactoring in the form of a systematic literature review (SLR). After compiling an initial pool of 1,367 papers, we conducted a systematic selection and our final pool included 83 primary studies. We define three sets of research questions and systematically develop and refine a classification schema based on several criteria including their methodology, applicability, and degree of automation. The results construct a catalog of 83 Extract Method approaches indicating that several techniques have been proposed in the literature. Our results show that: (i) 38.6% of Extract Method refactoring studies primarily focus on addressing code clones; (ii) Several of the Extract Method tools incorporate the developer's involvement in the decision-making process when applying the method extraction, and (iii) the existing benchmarks are heterogeneous and do not contain the same type of information, making standardizing them for the purpose of benchmarking difficult. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2311.10753 [pdf, other]

Automating Source Code Refactoring in the Classroom

Authors: Eman Abdullah AlOmar, Mohamed Wiem Mkaouer, Ali Ouni

Abstract: Refactoring is the practice of improving software quality without altering its external behavior. Developers intuitively refactor their code for multiple purposes, such as improving program comprehension, reducing code complexity, dealing with technical debt, and removing code smells. However, no prior studies have exposed the students to an experience of the process of antipatterns detection and… ▽ More Refactoring is the practice of improving software quality without altering its external behavior. Developers intuitively refactor their code for multiple purposes, such as improving program comprehension, reducing code complexity, dealing with technical debt, and removing code smells. However, no prior studies have exposed the students to an experience of the process of antipatterns detection and refactoring correction, and provided students with toolset to practice it. To understand and increase the awareness of refactoring concepts, in this paper, we aim to reflect on our experience with teaching refactoring and how it helps students become more aware of bad programming practices and the importance of correcting them via refactoring. This paper discusses the results of an experiment in the classroom that involved carrying out various refactoring activities for the purpose of removing antipatterns using JDeodorant, an Eclipse plugin that supports antipatterns detection and refactoring. The results of the quantitative and qualitative analysis with 171 students show that students tend to appreciate the idea of learning refactoring and are satisfied with various aspects of the JDeodorant plugin's operation. Through this experiment, refactoring can turn into a vital part of the computing educational plan. We envision our findings enabling educators to support students with refactoring tools tuned towards safer and trustworthy refactoring. △ Less

Submitted 5 November, 2023; originally announced November 2023.

arXiv:2311.00256 [pdf]

How is Software Reuse Discussed in Stack Overflow?

Authors: Eman Abdullah AlOmara, Anthony Peruma, Mohamed Wiem Mkaouer, Christian Newman, Ali Ouni

Abstract: Software reuse is a crucial external quality attribute targeted by open-source and commercial projects. Despite that software reuse has experienced an increased adoption throughout the years, little is known about what aspects of code reuse developers discuss. In this paper, we present an empirical study of 1,409 posts to better understand the challenges developers face when reusing code. Our find… ▽ More Software reuse is a crucial external quality attribute targeted by open-source and commercial projects. Despite that software reuse has experienced an increased adoption throughout the years, little is known about what aspects of code reuse developers discuss. In this paper, we present an empirical study of 1,409 posts to better understand the challenges developers face when reusing code. Our findings show that 'visual studio' is the top occurring bigrams for question posts, and there are frequent design patterns utilized by developers for the purpose of reuse. We envision our findings enabling researchers to develop guidelines to be utilized to foster software reuse. △ Less

Submitted 31 October, 2023; originally announced November 2023.

Comments: 2023 Conference on Systems Engineering Research

arXiv:2302.03416 [pdf, other]

Just-in-Time Code Duplicates Extraction

Authors: Eman Abdullah AlOmar, Anton Ivanov, Zarina Kurbatova, Yaroslav Golubev, Mohamed Wiem Mkaouer, Ali Ouni, Timofey Bryksin, Le Nguyen, Amit Kini, Aditya Thakur

Abstract: Refactoring is a critical task in software maintenance, and is usually performed to enforce better design and coding practices, while co** with design defects. The Extract Method refactoring is widely used for merging duplicate code fragments into a single new method. Several studies attempted to recommend Extract Method refactoring opportunities using different techniques, including program sli… ▽ More Refactoring is a critical task in software maintenance, and is usually performed to enforce better design and coding practices, while co** with design defects. The Extract Method refactoring is widely used for merging duplicate code fragments into a single new method. Several studies attempted to recommend Extract Method refactoring opportunities using different techniques, including program slicing, program dependency graph analysis, change history analysis, structural similarity, and feature extraction. However, irrespective of the method, most of the existing approaches interfere with the developer's workflow: they require the developer to stop coding and analyze the suggested opportunities, and also consider all refactoring suggestions in the entire project without focusing on the development context. To increase the adoption of the Extract Method refactoring, in this paper, we aim to investigate the effectiveness of machine learning and deep learning algorithms for its recommendation while maintaining the workflow of the developer. The proposed approach relies on mining prior applied Extract Method refactorings and extracting their features to train a deep learning classifier that detects them in the user's code. We implemented our approach as a plugin for IntelliJ IDEA called AntiCopyPaster. To develop our approach, we trained and evaluated various popular models on a dataset of 18,942 code fragments from 13 Open Source Apache projects. The results show that the best model is the Convolutional Neural Network (CNN), which recommends appropriate Extract Method refactorings with an F-measure of 0.82. We also conducted a qualitative study with 72 developers to evaluate the usefulness of the developed plugin. The results show that developers tend to appreciate the idea of the approach and are satisfied with various aspects of the plugin's operation. △ Less

Submitted 7 February, 2023; originally announced February 2023.

Comments: 32 pages, 9 figures

arXiv:2206.07010 [pdf, other]

A Hierarchical-DBSCAN Method for Extracting Microservices from Monolithic Applications

Authors: Khaled Sellami, Mohamed Aymen Saied, Ali Ouni

Abstract: The microservices architectural style offers many advantages such as scalability, reusability and ease of maintainability. As such microservices has become a common architectural choice when develo** new applications. Hence, to benefit from these advantages, monolithic applications need to be redesigned in order to migrate to a microservice based architecture. Due to the inherent complexity and… ▽ More The microservices architectural style offers many advantages such as scalability, reusability and ease of maintainability. As such microservices has become a common architectural choice when develo** new applications. Hence, to benefit from these advantages, monolithic applications need to be redesigned in order to migrate to a microservice based architecture. Due to the inherent complexity and high costs related to this process, it is crucial to automate this task. In this paper, we propose a method that can identify potential microservices from a given monolithic application. Our method takes as input the source code of the source application in order to measure the similarities and dependencies between all of the classes in the system using their interactions and the domain terminology employed within the code. These similarity values are then used with a variant of a density-based clustering algorithm to generate a hierarchical structure of the recommended microservices while identifying potential outlier classes. We provide an empirical evaluation of our approach through different experimental settings including a comparison with existing human-designed microservices and a comparison with 5 baselines. The results show that our method succeeds in generating microservices that are overall more cohesive and that have fewer interactions in-between them with up to 0.9 of precision score when compared to human-designed microservices. △ Less

Submitted 14 June, 2022; originally announced June 2022.

arXiv:2205.08116 [pdf, other]

On the Use of Refactoring in Security Vulnerability Fixes: An Exploratory Study on Maven Libraries

Authors: Ayano Ikegami, Raula Gaikovina Kula, Bodin Chinthanet, Vittunyuta Maeprasart, Ali Ouni, Takashi Ishio, Kenichi Matsumoto

Abstract: Third-party library dependencies are commonplace in today's software development. With the growing threat of security vulnerabilities, applying security fixes in a timely manner is important to protect software systems. As such, the community developed a list of software and hardware weakness known as Common Weakness Enumeration (CWE) to assess vulnerabilities. Prior work has revealed that mainten… ▽ More Third-party library dependencies are commonplace in today's software development. With the growing threat of security vulnerabilities, applying security fixes in a timely manner is important to protect software systems. As such, the community developed a list of software and hardware weakness known as Common Weakness Enumeration (CWE) to assess vulnerabilities. Prior work has revealed that maintenance activities such as refactoring code potentially correlate with security-related aspects in the source code. In this work, we explore the relationship between refactoring and security by analyzing refactoring actions performed jointly with vulnerability fixes in practice. We conducted a case study to analyze 143 maven libraries in which 351 known vulnerabilities had been detected and fixed. Surprisingly, our exploratory results show that developers incorporate refactoring operations in their fixes, with 31.9% (112 out of 351) of the vulnerabilities paired with refactoring actions. We envision this short paper to open up potential new directions to motivate automated tool support, allowing developers to deliver fixes faster, while maintaining their code. △ Less

Submitted 17 May, 2022; originally announced May 2022.

Comments: Accepted as ERA paper to EASE2022

arXiv:2203.14404 [pdf, other]

Code Review Practices for Refactoring Changes: An Empirical Study on OpenStack

Authors: Eman Abdullah AlOmar, Moataz Chouchen, Mohamed Wiem Mkaouer, Ali Ouni

Abstract: Modern code review is a widely used technique employed in both industrial and open-source projects to improve software quality, share knowledge, and ensure adherence to coding standards and guidelines. During code review, developers may discuss refactoring activities before merging code changes in the code base. To date, code review has been extensively studied to explore its general challenges, b… ▽ More Modern code review is a widely used technique employed in both industrial and open-source projects to improve software quality, share knowledge, and ensure adherence to coding standards and guidelines. During code review, developers may discuss refactoring activities before merging code changes in the code base. To date, code review has been extensively studied to explore its general challenges, best practices and outcomes, and socio-technical aspects. However, little is known about how refactoring is being reviewed and what developers care about when they review refactored code. Hence, in this work, we present a quantitative and qualitative study to understand what are the main criteria developers rely on to develop a decision about accepting or rejecting a submitted refactored code, and what makes this process challenging. Through a case study of 11,010 refactoring and non-refactoring reviews spread across OpenStack open-source projects, we find that refactoring-related code reviews take significantly longer to be resolved in terms of code review efforts. Moreover, upon performing a thematic analysis on a significant sample of the refactoring code review discussions, we built a comprehensive taxonomy consisting of 28 refactoring review criteria. We envision our findings reaffirming the necessity of develo** accurate and efficient tools and techniques that can assist developers in the review process in the presence of refactorings. △ Less

Submitted 27 March, 2022; originally announced March 2022.

arXiv:2203.10221 [pdf, other]

An Exploratory Study on Refactoring Documentation in Issues Handling

Authors: Eman Abdullah AlOmar, Anthony Peruma, Mohamed Wiem Mkaouer, Christian D. Newman, Ali Ouni

Abstract: Understanding the practice of refactoring documentation is of paramount importance in academia and industry. Issue tracking systems are used by most software projects enabling developers, quality assurance, managers, and users to submit feature requests and other tasks such as bug fixing and code review. Although recent studies explored how to document refactoring in commit messages, little is kno… ▽ More Understanding the practice of refactoring documentation is of paramount importance in academia and industry. Issue tracking systems are used by most software projects enabling developers, quality assurance, managers, and users to submit feature requests and other tasks such as bug fixing and code review. Although recent studies explored how to document refactoring in commit messages, little is known about how developers describe their refactoring needs in issues. In this study, we aim at exploring developer-reported refactoring changes in issues to better understand what developers consider to be problematic in their code and how they handle it. Our approach relies on text mining 45,477 refactoring-related issues and identifying refactoring patterns from a diverse corpus of 77 Java projects by investigating issues associated with 15,833 refactoring operations and developers' explicit refactoring intention. Our results show that (1) developers mostly use move refactoring related terms/phrases to target refactoring-related issues; and (2) developers tend to explicitly mention the improvement of specific quality attributes and focus on duplicate code removal. We envision our findings enabling tool builders to support developers with automated documentation of refactoring changes in issues. △ Less

Submitted 18 March, 2022; originally announced March 2022.

arXiv:2203.05660 [pdf, other]

Refactoring Debt: Myth or Reality? An Exploratory Study on the Relationship Between Technical Debt and Refactoring

Authors: Anthony Peruma, Eman Abdullah AlOmar, Christian D. Newman, Mohamed Wiem Mkaouer, Ali Ouni

Abstract: To meet project timelines or budget constraints, developers intentionally deviate from writing optimal code to feasible code in what is known as incurring Technical Debt (TD). Furthermore, as part of planning their correction, developers document these deficiencies as comments in the code (i.e., self-admitted technical debt or SATD). As a means of improving source code quality, developers often ap… ▽ More To meet project timelines or budget constraints, developers intentionally deviate from writing optimal code to feasible code in what is known as incurring Technical Debt (TD). Furthermore, as part of planning their correction, developers document these deficiencies as comments in the code (i.e., self-admitted technical debt or SATD). As a means of improving source code quality, developers often apply a series of refactoring operations to their codebase. In this study, we explore developers repaying this debt through refactoring operations by examining occurrences of SATD removal in the code of 76 open-source Java systems. Our findings show that TD payment usually occurs with refactoring activities and developers refactor their code to remove TD for specific reasons. We envision our findings supporting vendors in providing tools to better support developers in the automatic repayment of technical debt. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Comments: This study has been accepted for publication at: The 19th International Conference on Mining Software Repositories (MSR 2022)

arXiv:2112.15230 [pdf, other]

AntiCopyPaster: Extracting Code Duplicates As Soon As They Are Introduced in the IDE

Authors: Eman Abdullah AlOmar, Anton Ivanov, Zarina Kurbatova, Yaroslav Golubev, Mohamed Wiem Mkaouer, Ali Ouni, Timofey Bryksin, Le Nguyen, Amit Kini, Aditya Thakur

Abstract: We developed a plugin for IntelliJ IDEA called AntiCopyPaster, which tracks the pasting of code fragments inside the IDE and suggests the appropriate Extract Method refactoring to combat the propagation of duplicates. Unlike the existing approaches, our tool is integrated with the developer's workflow, and pro-actively recommends refactorings. Since not all code fragments need to be extracted, we… ▽ More We developed a plugin for IntelliJ IDEA called AntiCopyPaster, which tracks the pasting of code fragments inside the IDE and suggests the appropriate Extract Method refactoring to combat the propagation of duplicates. Unlike the existing approaches, our tool is integrated with the developer's workflow, and pro-actively recommends refactorings. Since not all code fragments need to be extracted, we develop a classification model to make this decision. When a developer copies and pastes a code fragment, the plugin searches for duplicates in the currently opened file, waits for a short period of time to allow the developer to edit the code, and finally inferences the refactoring decision based on a number of features. Our experimental study on a large dataset of 18,942 code fragments mined from 13 Apache projects shows that AntiCopyPaster correctly recommends Extract Method refactorings with an F-score of 0.82. Furthermore, our survey of 59 developers reflects their satisfaction with the developed plugin's operation. The plugin and its source code are publicly available on GitHub at https://github.com/JetBrains-Research/anti-copy-paster. The demonstration video can be found on YouTube: https://youtu.be/_wwHg-qFjJY. △ Less

Submitted 2 September, 2022; v1 submitted 30 December, 2021; originally announced December 2021.

Comments: 4 pages, 3 figures

arXiv:2112.01581 [pdf, other]

On the Documentation of Refactoring Types

Authors: Eman Abdullah AlOmar, Jiaqian Liu, Kenneth Addo, Mohamed Wiem Mkaouer, Christian Newman, Ali Ouni, Zhe Yu

Abstract: Commit messages are the atomic level of software documentation. They provide a natural language description of the code change and its purpose. Messages are critical for software maintenance and program comprehension. Unlike documenting feature updates and bug fixes, little is known about how developers document their refactoring activities. Developers can perform multiple refactoring operations,… ▽ More Commit messages are the atomic level of software documentation. They provide a natural language description of the code change and its purpose. Messages are critical for software maintenance and program comprehension. Unlike documenting feature updates and bug fixes, little is known about how developers document their refactoring activities. Developers can perform multiple refactoring operations, including moving methods, extracting classes, for various reasons. Yet, there is no systematic study that analyzes the extent to which the documentation of refactoring accurately describes the refactoring operations performed at the source code level. Therefore, this paper challenges the ability of refactoring documentation to adequately predict the refactoring types, performed at the commit level. Our analysis relies on the text mining of commit messages to extract the corresponding features that better represent each class. The extraction of text patterns, specific to each refactoring allows the design of a model that verifies the consistency of these patterns with their corresponding refactoring. Such verification process can be achieved via automatically predicting the method-level type of refactoring being applied, namely Extract Method, Inline Method, Move Method, Pull-up Method, Push-down Method, and Rename Method. We compared various classifiers, and a baseline keyword-based approach, in terms of their prediction performance, using a dataset of 5,004 commits. Our main findings show that the complexity of refactoring type prediction varies from one type to another. Rename method and Extract method were found to be the best documented refactoring activities, while Pull-up Method and Push-down Method were the hardest to be identified via textual descriptions. Such findings bring the attention of developers to the necessity of paying more attention to the documentation of these types. △ Less

Submitted 2 December, 2021; originally announced December 2021.

Comments: arXiv admin note: text overlap with arXiv:2009.09279

arXiv:2111.07002 [pdf, other]

Refactoring for Reuse: An Empirical Study

Authors: Eman Abdullah AlOmar, Tianjia Wang, Vaibhavi Raut, Mohamed Wiem Mkaouer, Christian Newman, Ali Ouni

Abstract: Refactoring is the de-facto practice to optimize software health. While several studies propose refactoring strategies to optimize software design through applying design patterns and removing design defects, little is known about how developers actually refactor their code to improve its reuse. Therefore, we extract, from 1,828 open-source projects, a set of refactorings that were intended to imp… ▽ More Refactoring is the de-facto practice to optimize software health. While several studies propose refactoring strategies to optimize software design through applying design patterns and removing design defects, little is known about how developers actually refactor their code to improve its reuse. Therefore, we extract, from 1,828 open-source projects, a set of refactorings that were intended to improve the software reusability. We analyze the impact of reusability refactorings on the state-of-the-art reusability metrics, and we compare the distribution of reusability refactoring types, with the distribution of the remaining mainstream refactorings. Overall, we found that the distribution of refactoring types, applied in the context of reusability, is different from the distribution of refactoring types in mainstream development. In the refactorings performed to improve reusability, source files are subject to more design-level types of refactorings. Reusability refactorings significantly impact, high-level code elements, such as packages, classes, and methods, while typical refactorings, impact all code elements, including identifiers, and parameters. These findings provide practical insights into the current practice of refactoring in the context of code reuse involving the act of refactoring. △ Less

Submitted 12 November, 2021; originally announced November 2021.

arXiv:2110.12229 [pdf, other]

doi 10.1007/s10664-021-10045-x

How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics in Stack Overflow

Authors: Anthony Peruma, Steven Simmons, Eman Abdullah AlOmar, Christian D. Newman, Mohamed Wiem Mkaouer, Ali Ouni

Abstract: An essential part of software maintenance and evolution, refactoring is performed by developers, regardless of technology or domain, to improve the internal quality of the system, and reduce its technical debt. However, choosing the appropriate refactoring strategy is not always straightforward, resulting in developers seeking assistance. Although research in refactoring is well-established, with… ▽ More An essential part of software maintenance and evolution, refactoring is performed by developers, regardless of technology or domain, to improve the internal quality of the system, and reduce its technical debt. However, choosing the appropriate refactoring strategy is not always straightforward, resulting in developers seeking assistance. Although research in refactoring is well-established, with several studies altering between the detection of refactoring opportunities and the recommendation of appropriate code changes, little is known about their adoption in practice. Analyzing the perception of developers is critical to understand better what developers consider to be problematic in their code and how they handle it. Additionally, there is a need for bridging the gap between refactoring, as research, and its adoption in practice, by extracting common refactoring intents that are more suitable for what developers face in reality. In this study, we analyze refactoring discussions on Stack Overflow through a series of quantitative and qualitative experiments. Our results show that Stack Overflow is utilized by a diverse set of developers for refactoring assistance for a variety of technologies. Our observations show five areas that developers typically require help with refactoring -- Code Optimization, Tools and IDEs, Architecture and Design Patterns, Unit Testing, and Database. We envision our findings better bridge the support between traditional (or academic) aspects of refactoring and their real-world applicability, including better tool support. △ Less

Submitted 23 October, 2021; originally announced October 2021.

Comments: Part of a collection: Collective Knowledge in Software Engineering ISSN: 1382-3256 (Print) 1573-7616 (Online)

Journal ref: Empir Software Eng 27, 11 (2022)

arXiv:2109.15141 [pdf, other]

Predicting Code Review Completion Time in Modern Code Review

Authors: Moataz Chouchen, Jefferson Olongo, Ali Ouni, Mohamed Wiem Mkaouer

Abstract: Context. Modern Code Review (MCR) is being adopted in both open source and commercial projects as a common practice. MCR is a widely acknowledged quality assurance practice that allows early detection of defects as well as poor coding practices. It also brings several other benefits such as knowledge sharing, team awareness, and collaboration. Problem. In practice, code reviews can experience si… ▽ More Context. Modern Code Review (MCR) is being adopted in both open source and commercial projects as a common practice. MCR is a widely acknowledged quality assurance practice that allows early detection of defects as well as poor coding practices. It also brings several other benefits such as knowledge sharing, team awareness, and collaboration. Problem. In practice, code reviews can experience significant delays to be completed due to various socio-technical factors which can affect the project quality and cost. For a successful review process, peer reviewers should perform their review tasks in a timely manner while providing relevant feedback about the code change being reviewed. However, there is a lack of tool support to help developers estimating the time required to complete a code review prior to accepting or declining a review request. Objective. Our objective is to build and validate an effective approach to predict the code review completion time in the context of MCR and help developers better manage and prioritize their code review tasks. Method. We formulate the prediction of the code review completion time as a learning problem. In particular, we propose a framework based on regression models to (i) effectively estimate the code review completion time, and (ii) understand the main factors influencing code review completion time. △ Less

Submitted 30 September, 2021; originally announced September 2021.

Comments: 6 pages, 3 figures Registered report for ICSME21

Report number: 237

arXiv:2109.11089 [pdf, other]

Behind the Scenes: On the Relationship Between Developer Experience and Refactoring

Authors: Eman Abdullah AlOmar, Anthony Peruma, Mohamed Wiem Mkaouer, Christian D. Newman, Ali Ouni

Abstract: Refactoring is widely recognized as one of the efficient techniques to manage technical debt and maintain a healthy software project through enforcing best design practices or co** with design defects. Previous refactoring surveys have shown that code refactoring activities are mainly executed by developers who have sufficient knowledge of the system's design and disposing of leadership roles in… ▽ More Refactoring is widely recognized as one of the efficient techniques to manage technical debt and maintain a healthy software project through enforcing best design practices or co** with design defects. Previous refactoring surveys have shown that code refactoring activities are mainly executed by developers who have sufficient knowledge of the system's design and disposing of leadership roles in their development teams. However, these surveys were mainly limited to specific projects and companies. In this paper, we explore the generalizability of the previous results by analyzing 800 open-source projects. We mine their refactoring activities, and we identify their corresponding contributors. Then, we associate an experience score to each contributor in order to test various hypotheses related to whether developers with higher scores tend to 1) perform a higher number of refactoring operations 2) exhibit different motivations behind their refactoring, and 3) better document their refactoring activity. We found that (1) although refactoring is not restricted to a subset of developers, those with higher contribution scores tend to perform more refactorings than others; (2) while there is no correlation between experience and motivation behind refactoring, top contributed developers are found to perform a wider variety of refactoring operations, regardless of their complexity; and (3) top contributed developer tend to document less their refactoring activity. Our qualitative analysis of three randomly sampled projects shows that the developers who are responsible for the majority of refactoring activities are typically in advanced positions in their development teams, demonstrating their extensive knowledge of the design of the systems they contribute to. △ Less

Submitted 22 September, 2021; originally announced September 2021.

arXiv:2107.00073 [pdf, other]

SATDBailiff- Mining and Tracking Self-Admitted Technical Debt

Authors: Eman Abdullah AlOmar, Ben Christians, Mihal Busho, Ahmed Hamad AlKhalid, Ali Ouni, Christian Newman, Mohamed Wiem Mkaouer

Abstract: Self-Admitted Technical Debt (SATD) is a metaphorical concept to describe the self-documented addition of technical debt to a software project in the form of source code comments. SATD can linger in projects and degrade source-code quality, but it can also be more visible than unintentionally added or undocumented technical debt. Understanding the implications of adding SATD to a software project… ▽ More Self-Admitted Technical Debt (SATD) is a metaphorical concept to describe the self-documented addition of technical debt to a software project in the form of source code comments. SATD can linger in projects and degrade source-code quality, but it can also be more visible than unintentionally added or undocumented technical debt. Understanding the implications of adding SATD to a software project is important because developers can benefit from a better understanding of the quality trade-offs they are making. However, empirical studies, analyzing the survivability and removal of SATD comments, are challenged by potential code changes or SATD comment updates that may interfere with properly tracking their appearance, existence, and removal. In this paper, we propose SATDBailiff, a tool that uses an existing state-of-the-art SATD detection tool, to identify SATD in method comments, then properly track their lifespan. SATDBailiff is given as input links to open source projects, and its output is a list of all identified SATDs, and for each detected SATD, SATDBailiff reports all its associated changes, including any updates to its text, all the way to reporting its removal. The goal of SATDBailiff is to aid researchers and practitioners in better tracking SATDs instances and providing them with a reliable tool that can be easily extended. SATDBailiff was validated using a dataset of previously detected and manually validated SATD instances. SATDBailiff is publicly available as an open-source, along with the manual analysis of SATD instances associated with its validation, on the project website △ Less

Submitted 30 June, 2021; originally announced July 2021.

arXiv:2106.13900 [pdf, other]

On Preserving the Behavior in Software Refactoring: A Systematic Map** Study

Authors: Eman Abdullah AlOmar, Mohamed Wiem Mkaouer, Christian Newman, Ali Ouni

Abstract: Context: Refactoring is the art of modifying the design of a system without altering its behavior. The idea is to reorganize variables, classes and methods to facilitate their future adaptations and comprehension. As the concept of behavior preservation is fundamental for refactoring, several studies, using formal verification, language transformation and dynamic analysis, have been proposed to mo… ▽ More Context: Refactoring is the art of modifying the design of a system without altering its behavior. The idea is to reorganize variables, classes and methods to facilitate their future adaptations and comprehension. As the concept of behavior preservation is fundamental for refactoring, several studies, using formal verification, language transformation and dynamic analysis, have been proposed to monitor the execution of refactoring operations and their impact on the program semantics. However, there is no existing study that examines the available behavior preservation strategies for each refactoring operation. Objective: This paper identifies behavior preservation approaches in the research literature. Method: We conduct, in this paper, a systematic map** study, to capture all existing behavior preservation approaches that we classify based on several criteria including their methodology, applicability, and their degree of automation. Results: The results indicate that several behavior preservation approaches have been proposed in the literature. The approaches vary between using formalisms and techniques, develo** automatic refactoring safety tools, and performing a manual analysis of the source code. Conclusion: Our taxonomy reveals that there exist some types of refactoring operations whose behavior preservation is under-researched. Our classification also indicates that several possible strategies can be combined to better detect any violation of the program semantics. △ Less

Submitted 21 July, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

arXiv:2104.14640 [pdf, other]

Test Smell Detection Tools: A Systematic Map** Study

Authors: Wajdi Aljedaani, Anthony Peruma, Ahmed Aljohani, Mazen Alotaibi, Mohamed Wiem Mkaouer, Ali Ouni, Christian D. Newman, Abdullatif Ghallab, Stephanie Ludi

Abstract: Test smells are defined as sub-optimal design choices developers make when implementing test cases. Hence, similar to code smells, the research community has produced numerous test smell detection tools to investigate the impact of test smells on the quality and maintenance of test suites. However, little is known about the characteristics, type of smells, target language, and availability of thes… ▽ More Test smells are defined as sub-optimal design choices developers make when implementing test cases. Hence, similar to code smells, the research community has produced numerous test smell detection tools to investigate the impact of test smells on the quality and maintenance of test suites. However, little is known about the characteristics, type of smells, target language, and availability of these published tools. In this paper, we provide a detailed catalog of all known, peer-reviewed, test smell detection tools. We start with performing a comprehensive search of peer-reviewed scientific publications to construct a catalog of 22 tools. Then, we perform a comparative analysis to identify the smell types detected by each tool and other salient features that include programming language, testing framework support, detection strategy, and adoption, among others. From our findings, we discover tools that detect test smells in Java, Scala, Smalltalk, and C++ test suites, with Java support favored by most tools. These tools are available as command-line and IDE plugins, among others. Our analysis also shows that most tools overlap in detecting specific smell types, such as General Fixture. Further, we encounter four types of techniques these tools utilize to detect smells. We envision our study as a one-stop source for researchers and practitioners in determining the tool appropriate for their needs. Our findings also empower the community with information to guide future tool development. △ Less

Submitted 3 May, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

Comments: Accepted at: The International Conference on Evaluation and Assessment in Software Engineering (EASE '21)

arXiv:2102.05201 [pdf, other]

Refactoring Practices in the Context of Modern Code Review: An Industrial Case Study at Xerox

Authors: Eman Abdullah AlOmar, Hussein AlRubaye, Mohamed Wiem Mkaouer, Ali Ouni, Marouane Kessentini

Abstract: Modern code review is a common and essential practice employed in both industrial and open-source projects to improve software quality, share knowledge, and ensure conformance with coding standards. During code review, developers may inspect and discuss various changes including refactoring activities before merging code changes in the codebase. To date, code review has been extensively studied to… ▽ More Modern code review is a common and essential practice employed in both industrial and open-source projects to improve software quality, share knowledge, and ensure conformance with coding standards. During code review, developers may inspect and discuss various changes including refactoring activities before merging code changes in the codebase. To date, code review has been extensively studied to explore its general challenges, best practices and outcomes, and socio-technical aspects. However, little is known about how refactoring activities are being reviewed, perceived, and practiced. This study aims to reveal insights into how reviewers develop a decision about accepting or rejecting a submitted refactoring request, and what makes such review challenging. We present an industrial case study with 24 professional developers at Xerox. Particularly, we study the motivations, documentation practices, challenges, verification, and implications of refactoring activities during code review. Our study delivers several important findings. Our results report the lack of a proper procedure to follow by developers when documenting their refactorings for review. Our survey with reviewers has also revealed several difficulties related to understanding the refactoring intent and implications on the functional and non-functional aspects of the software. In light of our findings, we recommended a procedure to properly document refactoring activities, as part of our survey feedback. △ Less

Submitted 9 February, 2021; originally announced February 2021.

arXiv:2010.13890 [pdf, other]

doi 10.1016/j.eswa.2020.114176

How We Refactor and How We Document it? On the Use of Supervised Machine Learning Algorithms to Classify Refactoring Documentation

Authors: Eman Abdullah AlOmar, Anthony Peruma, Mohamed Wiem Mkaouer, Christian Newman, Ali Ouni, Marouane Kessentini

Abstract: Refactoring is the art of improving the design of a system without altering its external behavior. Refactoring has become a well established and disciplined software engineering practice that has attracted a significant amount of research presuming that refactoring is primarily motivated by the need to improve system structures. However, recent studies have shown that developers may incorporate re… ▽ More Refactoring is the art of improving the design of a system without altering its external behavior. Refactoring has become a well established and disciplined software engineering practice that has attracted a significant amount of research presuming that refactoring is primarily motivated by the need to improve system structures. However, recent studies have shown that developers may incorporate refactorings in other development activities that go beyond improving the design. Unfortunately, these studies are limited to developer interviews and a reduced set of projects. To cope with the above-mentioned limitations, we aim to better understand what motivates developers to apply refactoring by mining and classifying a large set of 111,884 commits containing refactorings, extracted from 800 Java projects. We trained a multi-class classifier to categorize these commits into 3 categories, namely, Internal QA, External QA, and Code Smell Resolution, along with the traditional BugFix and Functional categories. This classification challenges the original definition of refactoring, being exclusive to improving the design and fixing code smells. Further, to better understand our classification results, we analyzed commit messages to extract textual patterns that developers regularly use to describe their refactorings. The results show that (1) fixing code smells is not the main driver for developers to refactoring their codebases. Refactoring is solicited for a wide variety of reasons, going beyond its traditional definition; (2) the distribution of refactorings differs between production and test files; (3) developers use several patterns to purposefully target refactoring; (4) the textual patterns, extracted from commit messages, provide better coverage for how developers document their refactorings. △ Less

Submitted 26 October, 2020; originally announced October 2020.

Journal ref: Expert Systems with Applications. 167 (2021) 114176

arXiv:2009.09279 [pdf, other]

Toward the Automatic Classification of Self-Affirmed Refactoring

Authors: Eman Abdullah AlOmar, Mohamed Wiem Mkaouer, Ali Ouni

Abstract: The concept of Self-Affirmed Refactoring (SAR) was introduced to explore how developers document their refactoring activities in commit messages, i.e., developers' explicit documentation of refactoring operations intentionally introduced during a code change. In our previous study, we have manually identified refactoring patterns and defined three main common quality improvement categories, includ… ▽ More The concept of Self-Affirmed Refactoring (SAR) was introduced to explore how developers document their refactoring activities in commit messages, i.e., developers' explicit documentation of refactoring operations intentionally introduced during a code change. In our previous study, we have manually identified refactoring patterns and defined three main common quality improvement categories, including internal quality attributes, external quality attributes, and code smells, by only considering refactoring-related commits. However, this approach heavily depends on the manual inspection of commit messages. In this paper, we propose a two-step approach to first identify whether a commit describes developer-related refactoring events, then to classify it according to the refactoring common quality improvement categories. Specifically, we combine the N-Gram TF-IDF feature selection with binary and multiclass classifiers to build a new model to automate the classification of refactorings based on their quality improvement categories. We challenge our model using a total of 2,867 commit messages extracted from well-engineered open-source Java projects. Our findings show that (1) our model is able to accurately classify SAR commits, outperforming the pattern-based and random classifier approaches, and allowing the discovery of 40 more relevant SAR patterns, and (2) our model reaches an F-measure of up to 90% even with a relatively small training dataset. △ Less

Submitted 19 September, 2020; originally announced September 2020.

arXiv:2005.06510 [pdf]

doi 10.1145/2729974

Many-Objective Software Remodularization using NSGA-III

Authors: Mohamed Wiem Mkaouer, Marouane Kessentini, Adnan Shaout, Patrice Koligheu, Slim Bechikh, Kalyanmoy Deb, Ali Ouni

Abstract: Software systems nowadays are complex and difficult to maintain due to continuous changes and bad design choices. To handle the complexity of systems, software products are, in general, decomposed in terms of packages/modules containing classes that are dependent. However, it is challenging to automatically remodularize systems to improve their maintainability. The majority of existing remodulariz… ▽ More Software systems nowadays are complex and difficult to maintain due to continuous changes and bad design choices. To handle the complexity of systems, software products are, in general, decomposed in terms of packages/modules containing classes that are dependent. However, it is challenging to automatically remodularize systems to improve their maintainability. The majority of existing remodularization work mainly satisfy one objective which is improving the structure of packages by optimizing coupling and cohesion. In addition, most of existing studies are limited to only few operation types such as move class and split packages. Many other objectives, such as the design semantics, reducing the number of changes and maximizing the consistency with development change history, are important to improve the quality of the software by remodularizing it. In this paper, we propose a novel many-objective search-based approach using NSGA-III. The process aims at finding the optimal remodularization solutions that improve the structure of packages, minimize the number of changes, preserve semantics coherence, and re-use the history of changes. We evaluate the efficiency of our approach using four different open-source systems and one automotive industry project, provided by our industrial partner, through a quantitative and qualitative study conducted with software engineers. △ Less

Submitted 13 May, 2020; originally announced May 2020.

Comments: Mkaouer, Wiem, et al. "Many-objective software remodularization using NSGA-III." ACM Transactions on Software Engineering and Methodology (TOSEM) 24.3 (2015): 1-45

Journal ref: ACM Transactions on Software Engineering and Methodology (TOSEM) 24, no. 3 (2015): 1-45

arXiv:1907.07724 [pdf]

How Does API Migration Impact Software Quality and Comprehension? An Empirical Study

Authors: Hussein Alrubaye, Deema Alshoaibi, Eman Alomar, Mohamed Wiem Mkaouer, Ali Ouni

Abstract: The migration process between different third-party software libraries is hard, complex and error-prone. Typically, during a library migration process, developers opt to replace methods from the retired library with other methods from a new library without altering the software behavior. However, the extent to which such a migration process to new libraries will be rewarded with an improved softwa… ▽ More The migration process between different third-party software libraries is hard, complex and error-prone. Typically, during a library migration process, developers opt to replace methods from the retired library with other methods from a new library without altering the software behavior. However, the extent to which such a migration process to new libraries will be rewarded with an improved software quality is still unknown. In this paper, we aim at studying and analyzing the impact of library API migration on software quality. We conduct a large-scale empirical study on 9 popular API migrations, collected from a corpus of 57,447 open-source Java projects. We compute the values of commonly-used software quality metrics before and after a migration occurs. The statistical analysis of the obtained results provides evidence that library migrations are likely to improve different software quality attributes including significantly reduced coupling, increased cohesion, and improved code readability. Furthermore, we release an online portal that helps software developers to understand the pre-impact of a library migration on software quality and recommend migration examples that adopt the best design and implementation practices to improve software quality. Finally, we provide the software engineering community with a large scale dataset to foster research in software library migration. △ Less

Submitted 2 October, 2020; v1 submitted 18 July, 2019; originally announced July 2019.

Comments: Accepted at International Conference on Software and Systems Reuse (ICSR-2020)

arXiv:1907.04797 [pdf, other]

doi 10.1109/ESEM.2019.8870177

Do Design Metrics Capture Developers Perception of Quality? An Empirical Study on Self-Affirmed Refactoring Activities

Authors: Eman Abdullah AlOmar, Mohamed Wiem Mkaouer, Ali Ouni, Marouane Kessentini

Abstract: Background. Refactoring is a critical task in software maintenance and is generally performed to enforce the best design and implementation practices or to cope with design defects. Several studies attempted to detect refactoring activities through mining software repositories allowing to collect, analyze and get actionable data-driven insights about refactoring practices within software projects.… ▽ More Background. Refactoring is a critical task in software maintenance and is generally performed to enforce the best design and implementation practices or to cope with design defects. Several studies attempted to detect refactoring activities through mining software repositories allowing to collect, analyze and get actionable data-driven insights about refactoring practices within software projects. Aim. We aim at identifying, among the various quality models presented in the literature, the ones that are more in-line with the developer's vision of quality optimization, when they explicitly mention that they are refactoring to improve them. Method. We extract a large corpus of design-related refactoring activities that are applied and documented by developers during their daily changes from 3,795 curated open source Java projects. In particular, we extract a large-scale corpus of structural metrics and anti-pattern enhancement changes, from which we identify 1,245 quality improvement commits with their corresponding refactoring operations, as perceived by software engineers. Thereafter, we empirically analyze the impact of these refactoring operations on a set of common state-of-the-art design quality metrics. Results. The statistical analysis of the obtained results shows that (i) a few state-of-the-art metrics are more popular than others; and (ii) some metrics are being more emphasized than others. Conclusions. We verify that there are a variety of structural metrics that can represent the internal quality attributes with different degrees of improvement and degradation of software quality. Most of the metrics that are mapped to the main quality attributes do capture developer intentions of quality improvement reported in the commit messages. △ Less

Submitted 10 July, 2019; originally announced July 2019.

Comments: technical paper accepted in 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)

Journal ref: 2019 {ACM/IEEE} International Symposium on Empirical Software Engineering and Measurement, {ESEM} 2019, Porto de Galinhas, Recife, Brazil, September 19-20, 2019

arXiv:1907.02997 [pdf]

MigrationMiner: An Automated Detection Tool of Third-Party Java Library Migration at the Method Level

Authors: Hussein Alrubaye, Mohamed Wiem Mkaouer, Ali Ouni

Abstract: In this paper we introduce, MigrationMiner, an automated tool that detects code migrations performed between Java third-party library. Given a list of open source projects, the tool detects potential library migration code changes and collects the specific code fragments in which the developer replaces methods from the retired library with methods from the new library. To support the migration pro… ▽ More In this paper we introduce, MigrationMiner, an automated tool that detects code migrations performed between Java third-party library. Given a list of open source projects, the tool detects potential library migration code changes and collects the specific code fragments in which the developer replaces methods from the retired library with methods from the new library. To support the migration process, MigrationMiner collects the library documentation that is associated with every method involved in the migration. We evaluate our tool on a benchmark of manually validated library migrations. Results show that MigrationMiner achieves an accuracy of 100%. A demo video of MigrationMiner is available at https://youtu.be/sAlR1HNetXc. △ Less

Submitted 13 July, 2019; v1 submitted 5 July, 2019; originally announced July 2019.

Comments: Accepted at ICSME 2019. arXiv admin note: text overlap with arXiv:1906.02591

arXiv:1906.02882 [pdf]

Learning to Recommend Third-Party Library Migration Opportunities at the API Level

Authors: Hussein Alrubaye, Mohamed Wiem Mkaouer, Igor Khokhlov, Leon Reznik, Ali Ouni, Jason Mcgoff

Abstract: The manual migration between different third-party libraries represents a challenge for software developers. Developers typically need to explore both libraries Application Programming Interfaces, along with reading their documentation, in order to locate the suitable map**s between replacing and replaced methods. In this paper, we introduce RAPIM, a novel machine learning approach that recommen… ▽ More The manual migration between different third-party libraries represents a challenge for software developers. Developers typically need to explore both libraries Application Programming Interfaces, along with reading their documentation, in order to locate the suitable map**s between replacing and replaced methods. In this paper, we introduce RAPIM, a novel machine learning approach that recommends map**s between methods from two different libraries. Our model learns from previous migrations, manually performed in mined software systems, and extracts a set of features related to the similarity between method signatures and method textual documentation. We evaluate our model using 8 popular migrations, collected from 57,447 open-source Java projects. Results show that RAPIM is able to recommend relevant library API map**s with an average accuracy score of 87%. Finally, we provide the community with an API recommendation web service that could be used to support the migration process. △ Less

Submitted 6 June, 2019; originally announced June 2019.

arXiv:1906.02591 [pdf]

On the Use of Information Retrieval to Automate the Detection of Third-Party Java Library Migration at the Method Level

Authors: Hussein Alrubaye, Mohamed Wiem Mkaouer, Ali Ouni

Abstract: The migration process between different third-party libraries is hard, complex and error-prone. Typically, during a library migration, developers need to find methods in the new library that are most adequate in replacing the old methods of the retired library. This process is subjective and time-consuming as developers need to fully understand the documentation of both libraries Application Progr… ▽ More The migration process between different third-party libraries is hard, complex and error-prone. Typically, during a library migration, developers need to find methods in the new library that are most adequate in replacing the old methods of the retired library. This process is subjective and time-consuming as developers need to fully understand the documentation of both libraries Application Programming Interfaces, and find the right matching between their methods if it exists. In this context, several studies rely on mining existing library migrations to provide developers with by-example approaches for similar scenarios. In this paper, we introduce a novel mining approach that extracts existing instances of library method replacements that are manually performed by developers for a given library migration to automatically generate migration patterns in the method level. Thereafter, our approach combines the mined method-change patterns with method-related lexical similarity to accurately detect map**s between replacing/replaced methods. We conduct a large scale empirical study to evaluate our approach on a benchmark of 57,447 open-source Java projects leading to 9 popular library migrations. Our qualitative results indicate that our approach significantly increases the accuracy of mining method-level map**s by an average accuracy of 12%, as well as increasing the number of discovered method map**s, in comparison with existing state-of-the-art studies. Finally, we provide the community with an open source mining tool along with a dataset of all mined migrations at the method level. △ Less

Submitted 2 June, 2019; originally announced June 2019.

Comments: accepted at 27th IEEE/ACM International Conference on Program Comprehension 2019

Journal ref: 27th IEEE/ACM International Conference on Program Comprehension 2019

arXiv:1709.09474 [pdf, other]

An Empirical Study on the Impact of Refactoring Activities on Evolving Client-Used APIs

Authors: Raula Gaikovina Kula, Ali Ouni, Daniel M. German, Katsuro Inoue

Abstract: Context: Refactoring is recognized as an effective practice to maintain evolving software systems. For software libraries, we study how library developers refactor their Application Programming Interfaces (APIs), especially when it impacts client users by breaking an API of the library. Objective: Our work aims to understand how clients that use a library API are affected by refactoring activities… ▽ More Context: Refactoring is recognized as an effective practice to maintain evolving software systems. For software libraries, we study how library developers refactor their Application Programming Interfaces (APIs), especially when it impacts client users by breaking an API of the library. Objective: Our work aims to understand how clients that use a library API are affected by refactoring activities. We target popular libraries that potentially impact more library client users. Method: We distinguish between library APIs based on their client-usage (refereed to as client-used APIs) in order to understand the extent to which API breakages relate to refactorings. Our tool-based approach allows for a large-scale study across eight libraries (i.e., totaling 183 consecutive versions) with around 900 clients projects. Results: We find that library maintainers are less likely to break client-used API classes. Quantitatively, we find that refactoring activities break less than 37% of all client-used APIs. In a more qualitative analysis, we show two documented cases of where non-refactoring API breaking changes are motivated other maintenance issues (i.e., bug fix and new features) and involve more complex refactoring operations. Conclusion: Using our automated approach, we find that library developers are less likely to break APIs and tend to break client-used APIs when addressing these maintenance issues. △ Less

Submitted 28 September, 2017; v1 submitted 27 September, 2017; originally announced September 2017.

Comments: Information and Software Technology Journal

arXiv:1709.04638 [pdf, other]

On the Impact of Micro-Packages: An Empirical Study of the npm JavaScript Ecosystem

Authors: Raula Gaikovina Kula, Ali Ouni, Daniel M. German, Katsuro Inoue

Abstract: The rise of user-contributed Open Source Software (OSS) ecosystems demonstrate their prevalence in the software engineering discipline. Libraries work together by depending on each other across the ecosystem. From these ecosystems emerges a minimized library called a micro-package. Micro- packages become problematic when breaks in a critical ecosystem dependency ripples its effects to unsuspecting… ▽ More The rise of user-contributed Open Source Software (OSS) ecosystems demonstrate their prevalence in the software engineering discipline. Libraries work together by depending on each other across the ecosystem. From these ecosystems emerges a minimized library called a micro-package. Micro- packages become problematic when breaks in a critical ecosystem dependency ripples its effects to unsuspecting users. In this paper, we investigate the impact of micro-packages in the npm JavaScript ecosystem. Specifically, we conducted an empirical in- vestigation with 169,964 JavaScript npm packages to understand (i) the widespread phenomena of micro-packages, (ii) the size dependencies inherited by a micro-package and (iii) the developer usage cost (ie., fetch, install, load times) of using a micro-package. Results of the study find that micro-packages form a significant portion of the npm ecosystem. Apart from the ease of readability and comprehension, we show that some micro-packages have long dependency chains and incur just as much usage costs as other npm packages. We envision that this work motivates the need for developers to be aware of how sensitive their third-party dependencies are to critical changes in the software ecosystem. △ Less

Submitted 14 September, 2017; originally announced September 2017.

Comments: Submitted 2017

arXiv:1709.04621 [pdf, other]

doi 10.1007/s10664-017-9521-5

Do Developers Update Their Library Dependencies? An Empirical Study on the Impact of Security Advisories on Library Migration

Authors: Raula Gaikovina Kula, Daniel M. German, Ali Ouni, Takashi Ishio, Katsuro Inoue

Abstract: Third-party library reuse has become common practice in contemporary software development, as it includes several benefits for developers. Library dependencies are constantly evolving, with newly added features and patches that fix bugs in older versions. To take full advantage of third-party reuse, developers should always keep up to date with the latest versions of their library dependencies. In… ▽ More Third-party library reuse has become common practice in contemporary software development, as it includes several benefits for developers. Library dependencies are constantly evolving, with newly added features and patches that fix bugs in older versions. To take full advantage of third-party reuse, developers should always keep up to date with the latest versions of their library dependencies. In this paper, we investigate the extent of which developers update their library dependencies. Specifically, we conducted an empirical study on library migration that covers over 4,600 GitHub software projects and 2,700 library dependencies. Results show that although many of these systems rely heavily on dependencies, 81.5% of the studied systems still keep their outdated dependencies. In the case of updating a vulnerable dependency, the study reveals that affected developers are not likely to respond to a security advisory. Surveying these developers, we find that 69% of the interviewees claim that they were unaware of their vulnerable dependencies. Furthermore, developers are not likely to prioritize library updates, citing it as extra effort and added responsibility. This study concludes that even though third-party reuse is commonplace, the practice of updating a dependency is not as common for many developers. △ Less

Submitted 14 September, 2017; originally announced September 2017.

Comments: 37 Pages

Journal ref: Empirical Software Engineering 2017

arXiv:1612.01626 [pdf, other]

Automated Inference of Software Library Usage Patterns

Authors: Mohamed Aymen Saied, Ali Ouni, Houari Sahraoui, Raula Gaikovina Kula, Katsuro Inoue, David Lo

Abstract: Modern software systems are increasingly dependent on third-party libraries. It is widely recognized that using mature and well-tested third-party libraries can improve developers' productivity, reduce time-to-market, and produce more reliable software. Today's open-source repositories provide a wide range of libraries that can be freely downloaded and used. However, as software libraries are docu… ▽ More Modern software systems are increasingly dependent on third-party libraries. It is widely recognized that using mature and well-tested third-party libraries can improve developers' productivity, reduce time-to-market, and produce more reliable software. Today's open-source repositories provide a wide range of libraries that can be freely downloaded and used. However, as software libraries are documented separately but intended to be used together, developers are unlikely to fully take advantage of these reuse opportunities. In this paper, we present a novel approach to automatically identify third-party library usage patterns, i.e., collections of libraries that are commonly used together by developers. Our approach employs hierarchical clustering technique to group together software libraries based on external client usage. To evaluate our approach, we mined a large set of over 6,000 popular libraries from Maven Central Repository and investigated their usage by over 38,000 client systems from the Github repository. Our experiments show that our technique is able to detect the majority (77%) of highly consistent and cohesive library usage patterns across a considerable number of client systems. △ Less

Submitted 5 December, 2016; originally announced December 2016.

Showing 1–35 of 35 results for author: Ouni, A