Search | arXiv e-print repository

Impostor Syndrome in Final Year Computer Science Students: An Eye Tracking and Biometrics Study

Authors: Alyssia Chen, Carol Wong, Katy Tarrit, Anthony Peruma

Abstract: Imposter syndrome is a psychological phenomenon that affects individuals who doubt their skills and abilities, despite possessing the necessary competencies. This can lead to a lack of confidence and poor performance. While research has explored the impacts of imposter syndrome on students and professionals in various fields, there is limited knowledge on how it affects code comprehension in softw… ▽ More Imposter syndrome is a psychological phenomenon that affects individuals who doubt their skills and abilities, despite possessing the necessary competencies. This can lead to a lack of confidence and poor performance. While research has explored the impacts of imposter syndrome on students and professionals in various fields, there is limited knowledge on how it affects code comprehension in software engineering. In this exploratory study, we investigate the prevalence of imposter syndrome among final-year undergraduate computer science students and its effects on their code comprehension cognition using an eye tracker and heart rate monitor. Key findings demonstrate that students identifying as male exhibit lower imposter syndrome levels when analyzing code, and higher imposter syndrome is associated with increased time reviewing a code snippet and a lower likelihood of solving it correctly. This study provides initial data on this topic and establishes a foundation for further research to support student academic success and improve developer productivity and mental well-being. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Accepted at: 18th International Conference, AC 2024, Held as Part of the 26th HCI International Conference, HCII 2024

arXiv:2404.10185 [pdf, other]

Insights from the Field: Exploring Students' Perspectives on Bad Unit Testing Practices

Authors: Anthony Peruma, Eman Abdullah AlOmar, Wajdi Aljedaani, Christian D. Newman, Mohamed Wiem Mkaouer

Abstract: Educating students about software testing practices is integral to the curricula of many computer science-related courses and typically involves students writing unit tests. Similar to production/source code, students might inadvertently deviate from established unit testing best practices, and introduce problematic code, referred to as test smells, into their test suites. Given the extensive cata… ▽ More Educating students about software testing practices is integral to the curricula of many computer science-related courses and typically involves students writing unit tests. Similar to production/source code, students might inadvertently deviate from established unit testing best practices, and introduce problematic code, referred to as test smells, into their test suites. Given the extensive catalog of test smells, it becomes challenging for students to identify test smells in their code, especially for those who lack experience with testing practices. In this experience report, we aim to increase students' awareness of bad unit testing practices, and detail the outcomes of having 184 students from three higher educational institutes utilize an IDE plugin to automatically detect test smells in their code. Our findings show that while students report on the plugin's usefulness in learning about and detecting test smells, they also identify specific test smells that they consider harmless. We anticipate that our findings will support academia in refining course curricula on unit testing and enabling educators to support students with code review strategies of test code. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Accepted at: 2024 Innovation and Technology in Computer Science Education V. 1 (ITiCSE 2024)

arXiv:2311.00256 [pdf]

How is Software Reuse Discussed in Stack Overflow?

Authors: Eman Abdullah AlOmara, Anthony Peruma, Mohamed Wiem Mkaouer, Christian Newman, Ali Ouni

Abstract: Software reuse is a crucial external quality attribute targeted by open-source and commercial projects. Despite that software reuse has experienced an increased adoption throughout the years, little is known about what aspects of code reuse developers discuss. In this paper, we present an empirical study of 1,409 posts to better understand the challenges developers face when reusing code. Our find… ▽ More Software reuse is a crucial external quality attribute targeted by open-source and commercial projects. Despite that software reuse has experienced an increased adoption throughout the years, little is known about what aspects of code reuse developers discuss. In this paper, we present an empirical study of 1,409 posts to better understand the challenges developers face when reusing code. Our findings show that 'visual studio' is the top occurring bigrams for question posts, and there are frequent design patterns utilized by developers for the purpose of reuse. We envision our findings enabling researchers to develop guidelines to be utilized to foster software reuse. △ Less

Submitted 31 October, 2023; originally announced November 2023.

Comments: 2023 Conference on Systems Engineering Research

arXiv:2303.04234 [pdf, other]

Do the Test Smells Assertion Roulette and Eager Test Impact Students' Troubleshooting and Debugging Capabilities?

Authors: Wajdi Aljedaani, Mohamed Wiem Mkaouer, Anthony Peruma, Stephanie Ludi

Abstract: To ensure the quality of a software system, developers perform an activity known as unit testing, where they write code (known as test cases) that verifies the individual software units that make up the system. Like production code, test cases are subject to bad programming practices, known as test smells, that hurt maintenance activities. An essential part of most maintenance activities is progra… ▽ More To ensure the quality of a software system, developers perform an activity known as unit testing, where they write code (known as test cases) that verifies the individual software units that make up the system. Like production code, test cases are subject to bad programming practices, known as test smells, that hurt maintenance activities. An essential part of most maintenance activities is program comprehension which involves developers reading the code to understand its behavior to fix issues or update features. In this study, we conduct a controlled experiment with 96 undergraduate computer science students to investigate the impact of two common types of test smells, namely Assertion Roulette and Eager Test, on a student's ability to debug and troubleshoot test case failures. Our findings show that students take longer to correct errors in production code when smells are present in their associated test cases, especially Assertion Roulette. We envision our findings supporting academia in better equip** students with the knowledge and resources in writing and maintaining high-quality test cases. Our experimental materials are available online: https://wajdialjedaani.github.io/testsmellstd/ △ Less

Submitted 7 March, 2023; originally announced March 2023.

Comments: This study has been accepted at: The 45th IEEE/ACM International Conference on Software Engineering (ICSE 2023); Track: Software Engineering Education and Training

arXiv:2303.02258 [pdf, other]

An Exploratory Study on the Occurrence of Self-Admitted Technical Debt in Android Apps

Authors: Gregory Wilder II, Riley Miyamoto, Samuel Watson, Rick Kazman, Anthony Peruma

Abstract: Technical debt describes situations where developers write less-than-optimal code to meet project milestones. However, this debt accumulation often results in future developer effort to live with or fix these quality issues. To better manage this debt, developers may document their sub-optimal code as comments in the code (i.e., self-admitted technical debt or SATD). While prior research has inves… ▽ More Technical debt describes situations where developers write less-than-optimal code to meet project milestones. However, this debt accumulation often results in future developer effort to live with or fix these quality issues. To better manage this debt, developers may document their sub-optimal code as comments in the code (i.e., self-admitted technical debt or SATD). While prior research has investigated the occurrence and characteristics of SATD, this research has primarily focused on non-mobile systems. With millions of mobile applications (apps) in multiple genres available for end-users, there is a lack of research on sub-optimal code developers intentionally implement in mobile apps. In this study, we examine the occurrence and characteristics of SATD in 15,614 open-source Android apps. Our findings show that even though such apps contain occurrences of SATD, the volume per app (a median of 4) is lower than in non-mobile systems, with most debt categorized as Code Debt. Additionally, we identify typical elements in an app that are prone to intentional sub-optimal implementations. We envision our findings supporting researchers and tool vendors with building tools and techniques to support app developers with app maintenance. △ Less

Submitted 3 March, 2023; originally announced March 2023.

Comments: This study has been accepted at: The 6th International Conference on Technical Debt (TechDebt 2023)

arXiv:2303.01035 [pdf, other]

Performance Comparison of Binary Machine Learning Classifiers in Identifying Code Comment Types: An Exploratory Study

Authors: Amila Indika, Peter Y. Washington, Anthony Peruma

Abstract: Code comments are vital to source code as they help developers with program comprehension tasks. Written in natural language (usually English), code comments convey a variety of different information, which are grouped into specific categories. In this study, we construct 19 binary machine learning classifiers for code comment categories that belong to three different programming languages. We pre… ▽ More Code comments are vital to source code as they help developers with program comprehension tasks. Written in natural language (usually English), code comments convey a variety of different information, which are grouped into specific categories. In this study, we construct 19 binary machine learning classifiers for code comment categories that belong to three different programming languages. We present a comparison of performance scores for different types of machine learning classifiers and show that the Linear SVC classifier has the highest average F1 score of 0.5474. △ Less

Submitted 2 March, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

Comments: This study has been accepted at: The 2nd International Workshop on Natural Language-based Software Engineering (NLBSE 2023); Tool Competition Track

arXiv:2303.00169 [pdf, other]

An Exploratory Study on the Usage and Readability of Messages Within Assertion Methods of Test Cases

Authors: Taryn Takebayashi, Anthony Peruma, Mohamed Wiem Mkaouer, Christian D. Newman

Abstract: Unit testing is a vital part of the software development process and involves developers writing code to verify or assert production code. Furthermore, to help comprehend the test case and troubleshoot issues, developers have the option to provide a message that explains the reason for the assertion failure. In this exploratory empirical study, we examine the characteristics of assertion messages… ▽ More Unit testing is a vital part of the software development process and involves developers writing code to verify or assert production code. Furthermore, to help comprehend the test case and troubleshoot issues, developers have the option to provide a message that explains the reason for the assertion failure. In this exploratory empirical study, we examine the characteristics of assertion messages contained in the test methods in 20 open-source Java systems. Our findings show that while developers rarely utilize the option of supplying a message, those who do, either compose it of only string literals, identifiers, or a combination of both types. Using standard English readability measuring techniques, we observe that a beginner's knowledge of English is required to understand messages containing only identifiers, while a 4th-grade education level is required to understand messages composed of string literals. We also discuss shortcomings with using such readability measuring techniques and common anti-patterns in assert message construction. We envision our results incorporated into code quality tools that appraise the understandability of assertion messages. △ Less

Submitted 28 February, 2023; originally announced March 2023.

Comments: This study has been accepted at: The 2nd International Workshop on Natural Language-based Software Engineering (NLBSE 2023)

arXiv:2302.11632 [pdf, other]

Rename Chains: An Exploratory Study on the Occurrence and Characteristics of Identifiers Undergoing Multiple Renamings

Authors: Anthony Peruma, Christian D. Newman

Abstract: Identifier names play a significant role in program comprehension activities, with high-quality names improving developer productivity and system quality. To correct poor-quality names, developers rename identifiers to reflect their intended purpose better. However, renames do not always result in high-quality, long-lasting names; in many cases, developers perform multiple rename operations on the… ▽ More Identifier names play a significant role in program comprehension activities, with high-quality names improving developer productivity and system quality. To correct poor-quality names, developers rename identifiers to reflect their intended purpose better. However, renames do not always result in high-quality, long-lasting names; in many cases, developers perform multiple rename operations on the same identifier throughout the system's lifetime. In this paper, we report on a large-scale empirical study that examines the occurrence of identifiers undergoing multiple renames (i.e., rename chains). Our findings show the presence of rename chains in almost every project, with methods typically having more rename chains than other identifier types. Furthermore, it is usually the same developer responsible for creating all renames within a chain, with most names maintaining the same grammatical structure. Understanding rename chains can help us provide stronger advice, and targeted research, on how to craft high-quality, long-lasting identifiers. △ Less

Submitted 23 February, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

Comments: This study has been accepted at: The 6th International Workshop on Refactoring (IWoR 2022)

arXiv:2203.10221 [pdf, other]

An Exploratory Study on Refactoring Documentation in Issues Handling

Authors: Eman Abdullah AlOmar, Anthony Peruma, Mohamed Wiem Mkaouer, Christian D. Newman, Ali Ouni

Abstract: Understanding the practice of refactoring documentation is of paramount importance in academia and industry. Issue tracking systems are used by most software projects enabling developers, quality assurance, managers, and users to submit feature requests and other tasks such as bug fixing and code review. Although recent studies explored how to document refactoring in commit messages, little is kno… ▽ More Understanding the practice of refactoring documentation is of paramount importance in academia and industry. Issue tracking systems are used by most software projects enabling developers, quality assurance, managers, and users to submit feature requests and other tasks such as bug fixing and code review. Although recent studies explored how to document refactoring in commit messages, little is known about how developers describe their refactoring needs in issues. In this study, we aim at exploring developer-reported refactoring changes in issues to better understand what developers consider to be problematic in their code and how they handle it. Our approach relies on text mining 45,477 refactoring-related issues and identifying refactoring patterns from a diverse corpus of 77 Java projects by investigating issues associated with 15,833 refactoring operations and developers' explicit refactoring intention. Our results show that (1) developers mostly use move refactoring related terms/phrases to target refactoring-related issues; and (2) developers tend to explicitly mention the improvement of specific quality attributes and focus on duplicate code removal. We envision our findings enabling tool builders to support developers with automated documentation of refactoring changes in issues. △ Less

Submitted 18 March, 2022; originally announced March 2022.

arXiv:2203.05660 [pdf, other]

Refactoring Debt: Myth or Reality? An Exploratory Study on the Relationship Between Technical Debt and Refactoring

Authors: Anthony Peruma, Eman Abdullah AlOmar, Christian D. Newman, Mohamed Wiem Mkaouer, Ali Ouni

Abstract: To meet project timelines or budget constraints, developers intentionally deviate from writing optimal code to feasible code in what is known as incurring Technical Debt (TD). Furthermore, as part of planning their correction, developers document these deficiencies as comments in the code (i.e., self-admitted technical debt or SATD). As a means of improving source code quality, developers often ap… ▽ More To meet project timelines or budget constraints, developers intentionally deviate from writing optimal code to feasible code in what is known as incurring Technical Debt (TD). Furthermore, as part of planning their correction, developers document these deficiencies as comments in the code (i.e., self-admitted technical debt or SATD). As a means of improving source code quality, developers often apply a series of refactoring operations to their codebase. In this study, we explore developers repaying this debt through refactoring operations by examining occurrences of SATD removal in the code of 76 open-source Java systems. Our findings show that TD payment usually occurs with refactoring activities and developers refactor their code to remove TD for specific reasons. We envision our findings supporting vendors in providing tools to better support developers in the automatic repayment of technical debt. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Comments: This study has been accepted for publication at: The 19th International Conference on Mining Software Repositories (MSR 2022)

arXiv:2203.00113 [pdf, other]

Understanding Digits in Identifier Names: An Exploratory Study

Authors: Anthony Peruma, Christian D. Newman

Abstract: Before any software maintenance can occur, developers must read the identifier names found in the code to be maintained. Thus, high-quality identifier names are essential for productive program comprehension and maintenance activities. With developers free to construct identifier names to their liking, it can be difficult to automatically reason about the quality and semantics behind an identifier… ▽ More Before any software maintenance can occur, developers must read the identifier names found in the code to be maintained. Thus, high-quality identifier names are essential for productive program comprehension and maintenance activities. With developers free to construct identifier names to their liking, it can be difficult to automatically reason about the quality and semantics behind an identifier name. Studying the structure of identifier names can help alleviate this problem. Existing research focuses on studying words within identifiers, but there are other symbols that appear in identifier names -- such as digits. This paper explores the presence and purpose of digits in identifier names through an empirical study of 800 open-source Java systems. We study how digits contribute to the semantics of identifier names and how identifier names that contain digits evolve over time through renaming. We envision our findings improving the efficiency of name appraisal and recommendation tools and techniques. △ Less

Submitted 15 March, 2022; v1 submitted 28 February, 2022; originally announced March 2022.

Comments: This study has been accepted for publication at: The 1st International Workshop on Natural Language-based Software Engineering (NLBSE 2022)

arXiv:2110.12229 [pdf, other]

doi 10.1007/s10664-021-10045-x

How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics in Stack Overflow

Authors: Anthony Peruma, Steven Simmons, Eman Abdullah AlOmar, Christian D. Newman, Mohamed Wiem Mkaouer, Ali Ouni

Abstract: An essential part of software maintenance and evolution, refactoring is performed by developers, regardless of technology or domain, to improve the internal quality of the system, and reduce its technical debt. However, choosing the appropriate refactoring strategy is not always straightforward, resulting in developers seeking assistance. Although research in refactoring is well-established, with… ▽ More An essential part of software maintenance and evolution, refactoring is performed by developers, regardless of technology or domain, to improve the internal quality of the system, and reduce its technical debt. However, choosing the appropriate refactoring strategy is not always straightforward, resulting in developers seeking assistance. Although research in refactoring is well-established, with several studies altering between the detection of refactoring opportunities and the recommendation of appropriate code changes, little is known about their adoption in practice. Analyzing the perception of developers is critical to understand better what developers consider to be problematic in their code and how they handle it. Additionally, there is a need for bridging the gap between refactoring, as research, and its adoption in practice, by extracting common refactoring intents that are more suitable for what developers face in reality. In this study, we analyze refactoring discussions on Stack Overflow through a series of quantitative and qualitative experiments. Our results show that Stack Overflow is utilized by a diverse set of developers for refactoring assistance for a variety of technologies. Our observations show five areas that developers typically require help with refactoring -- Code Optimization, Tools and IDEs, Architecture and Design Patterns, Unit Testing, and Database. We envision our findings better bridge the support between traditional (or academic) aspects of refactoring and their real-world applicability, including better tool support. △ Less

Submitted 23 October, 2021; originally announced October 2021.

Comments: Part of a collection: Collective Knowledge in Software Engineering ISSN: 1382-3256 (Print) 1573-7616 (Online)

Journal ref: Empir Software Eng 27, 11 (2022)

arXiv:2109.11089 [pdf, other]

Behind the Scenes: On the Relationship Between Developer Experience and Refactoring

Authors: Eman Abdullah AlOmar, Anthony Peruma, Mohamed Wiem Mkaouer, Christian D. Newman, Ali Ouni

Abstract: Refactoring is widely recognized as one of the efficient techniques to manage technical debt and maintain a healthy software project through enforcing best design practices or co** with design defects. Previous refactoring surveys have shown that code refactoring activities are mainly executed by developers who have sufficient knowledge of the system's design and disposing of leadership roles in… ▽ More Refactoring is widely recognized as one of the efficient techniques to manage technical debt and maintain a healthy software project through enforcing best design practices or co** with design defects. Previous refactoring surveys have shown that code refactoring activities are mainly executed by developers who have sufficient knowledge of the system's design and disposing of leadership roles in their development teams. However, these surveys were mainly limited to specific projects and companies. In this paper, we explore the generalizability of the previous results by analyzing 800 open-source projects. We mine their refactoring activities, and we identify their corresponding contributors. Then, we associate an experience score to each contributor in order to test various hypotheses related to whether developers with higher scores tend to 1) perform a higher number of refactoring operations 2) exhibit different motivations behind their refactoring, and 3) better document their refactoring activity. We found that (1) although refactoring is not restricted to a subset of developers, those with higher contribution scores tend to perform more refactorings than others; (2) while there is no correlation between experience and motivation behind refactoring, top contributed developers are found to perform a wider variety of refactoring operations, regardless of their complexity; and (3) top contributed developer tend to document less their refactoring activity. Our qualitative analysis of three randomly sampled projects shows that the developers who are responsible for the majority of refactoring activities are typically in advanced positions in their development teams, demonstrating their extensive knowledge of the design of the systems they contribute to. △ Less

Submitted 22 September, 2021; originally announced September 2021.

arXiv:2109.00629 [pdf, other]

doi 10.1109/TSE.2021.3098242

An Ensemble Approach for Annotating Source Code Identifiers with Part-of-speech Tags

Authors: Christian D. Newman, Michael J. Decker, Reem S. AlSuhaibani, Anthony Peruma, Satyajit Mohapatra, Tejal Vishnoi, Marcos Zampieri, Mohamed W. Mkaouer, Timothy J. Sheldon, Emily Hill

Abstract: This paper presents an ensemble part-of-speech tagging approach for source code identifiers. Ensemble tagging is a technique that uses machine-learning and the output from multiple part-of-speech taggers to annotate natural language text at a higher quality than the part-of-speech taggers are able to obtain independently. Our ensemble uses three state-of-the-art part-of-speech taggers: SWUM, POSSE… ▽ More This paper presents an ensemble part-of-speech tagging approach for source code identifiers. Ensemble tagging is a technique that uses machine-learning and the output from multiple part-of-speech taggers to annotate natural language text at a higher quality than the part-of-speech taggers are able to obtain independently. Our ensemble uses three state-of-the-art part-of-speech taggers: SWUM, POSSE, and Stanford. We study the quality of the ensemble's annotations on five different types of identifier names: function, class, attribute, parameter, and declaration statement at the level of both individual words and full identifier names. We also study and discuss the weaknesses of our tagger to promote the future amelioration of these problems through further research. Our results show that the ensemble achieves 75\% accuracy at the identifier level and 84-86\% accuracy at the word level. This is an increase of +17\% points at the identifier level from the closest independent part-of-speech tagger. △ Less

Submitted 1 September, 2021; originally announced September 2021.

Comments: 18 pages. arXiv admin note: text overlap with arXiv:2007.08033

Journal ref: in IEEE Transactions on Software Engineering, vol. , no. 01, pp. 1-1, 5555

arXiv:2107.08344 [pdf, other]

IDEAL: An Open-Source Identifier Name Appraisal Tool

Authors: Anthony Peruma, Venera Arnaoudova, Christian D. Newman

Abstract: Developers must comprehend the code they will maintain, meaning that the code must be legible and reasonably self-descriptive. Unfortunately, there is still a lack of research and tooling that supports developers in understanding their naming practices; whether the names they choose make sense, whether they are consistent, and whether they convey the information required of them. In this paper, we… ▽ More Developers must comprehend the code they will maintain, meaning that the code must be legible and reasonably self-descriptive. Unfortunately, there is still a lack of research and tooling that supports developers in understanding their naming practices; whether the names they choose make sense, whether they are consistent, and whether they convey the information required of them. In this paper, we present IDEAL, a tool that will provide feedback to developers about their identifier naming practices. Among its planned features, it will support linguistic anti-pattern detection, which is what will be discussed in this paper. IDEAL is designed to, and will, be extended to cover further anti-patterns, naming structures, and practices in the near future. IDEAL is open-source and publicly available, with a demo video available at: https://youtu.be/fVoOYGe50zg △ Less

Submitted 17 July, 2021; originally announced July 2021.

Comments: Accepted at: The 37th International Conference on Software Maintenance and Evolution (ICSME '21)

arXiv:2104.14640 [pdf, other]

Test Smell Detection Tools: A Systematic Map** Study

Authors: Wajdi Aljedaani, Anthony Peruma, Ahmed Aljohani, Mazen Alotaibi, Mohamed Wiem Mkaouer, Ali Ouni, Christian D. Newman, Abdullatif Ghallab, Stephanie Ludi

Abstract: Test smells are defined as sub-optimal design choices developers make when implementing test cases. Hence, similar to code smells, the research community has produced numerous test smell detection tools to investigate the impact of test smells on the quality and maintenance of test suites. However, little is known about the characteristics, type of smells, target language, and availability of thes… ▽ More Test smells are defined as sub-optimal design choices developers make when implementing test cases. Hence, similar to code smells, the research community has produced numerous test smell detection tools to investigate the impact of test smells on the quality and maintenance of test suites. However, little is known about the characteristics, type of smells, target language, and availability of these published tools. In this paper, we provide a detailed catalog of all known, peer-reviewed, test smell detection tools. We start with performing a comprehensive search of peer-reviewed scientific publications to construct a catalog of 22 tools. Then, we perform a comparative analysis to identify the smell types detected by each tool and other salient features that include programming language, testing framework support, detection strategy, and adoption, among others. From our findings, we discover tools that detect test smells in Java, Scala, Smalltalk, and C++ test suites, with Java support favored by most tools. These tools are available as command-line and IDE plugins, among others. Our analysis also shows that most tools overlap in detecting specific smell types, such as General Fixture. Further, we encounter four types of techniques these tools utilize to detect smells. We envision our study as a one-stop source for researchers and practitioners in determining the tool appropriate for their needs. Our findings also empower the community with information to guide future tool development. △ Less

Submitted 3 May, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

Comments: Accepted at: The International Conference on Evaluation and Assessment in Software Engineering (EASE '21)

arXiv:2103.13865 [pdf, ps, other]

doi 10.1109/ICSME.2019.00103

Towards a Model to Appraise and Suggest Identifier Names

Authors: Anthony Peruma

Abstract: Unknowingly, identifiers in the source code of a software system play a vital role in determining the quality of the system. Ambiguous and confusing identifier names lead developers to not only misunderstand the behavior of the code but also increases comprehension time and thereby causes a loss in productivity. Even though correcting poor names through rename operations is a viable option for sol… ▽ More Unknowingly, identifiers in the source code of a software system play a vital role in determining the quality of the system. Ambiguous and confusing identifier names lead developers to not only misunderstand the behavior of the code but also increases comprehension time and thereby causes a loss in productivity. Even though correcting poor names through rename operations is a viable option for solving this problem, renaming itself is an act of rework and is not immune to defect injection. In this study, we aim to understand the motivations that drive developers to name and rename identifiers and the decisions they make in determining the name. Using our results, we propose the development of a linguistic model that determines identifier names based on the behavior of the identifier. As a prerequisite to constructing the model, we conduct multiple studies to determine the features that should feed into the model. In this paper, we discuss findings from our completed studies and justify the continuation of research on this topic through further studies. △ Less

Submitted 25 March, 2021; originally announced March 2021.

arXiv:2103.09388 [pdf, other]

On the Distribution of "Simple Stupid Bugs" in Unit Test Files: An Exploratory Study

Authors: Anthony Peruma, Christian D. Newman

Abstract: A key aspect of ensuring the quality of a software system is the practice of unit testing. Through unit tests, developers verify the correctness of production source code, thereby verifying the system's intended behavior under test. However, unit test code is subject to issues, ranging from bugs in the code to poor test case design (i.e., test smells). In this study, we compare and contrast the oc… ▽ More A key aspect of ensuring the quality of a software system is the practice of unit testing. Through unit tests, developers verify the correctness of production source code, thereby verifying the system's intended behavior under test. However, unit test code is subject to issues, ranging from bugs in the code to poor test case design (i.e., test smells). In this study, we compare and contrast the occurrences of a type of single-statement-bug-fix known as "simple stupid bugs" (SStuBs) in test and non-test (i.e., production) files in popular open-source Java Maven projects. Our results show that SStuBs occur more frequently in non-test files than in test files, with most fix-related code associated with assertion statements in test files. Further, most test files exhibiting SStuBs also exhibit test smells. We envision our findings enabling tool vendors to better support developers in improving the maintenance of test suites. △ Less

Submitted 16 March, 2021; originally announced March 2021.

arXiv:2103.09190 [pdf, other]

Using Grammar Patterns to Interpret Test Method Name Evolution

Authors: Anthony Peruma, Emily Hu, Jiajun Chen, Eman Abdullah Alomar, Mohamed Wiem Mkaouer, Christian D. Newman

Abstract: It is good practice to name test methods such that they are comprehensible to developers; they must be written in such a way that their purpose and functionality are clear to those who will maintain them. Unfortunately, there is little automated support for writing or maintaining the names of test methods. This can lead to inconsistent and low-quality test names and increase the maintenance cost o… ▽ More It is good practice to name test methods such that they are comprehensible to developers; they must be written in such a way that their purpose and functionality are clear to those who will maintain them. Unfortunately, there is little automated support for writing or maintaining the names of test methods. This can lead to inconsistent and low-quality test names and increase the maintenance cost of supporting these methods. Due to this risk, it is essential to help developers in maintaining their test method names over time. In this paper, we use grammar patterns, and how they relate to test method behavior, to understand test naming practices. This data will be used to support an automated tool for maintaining test names. △ Less

Submitted 16 March, 2021; originally announced March 2021.

arXiv:2010.13890 [pdf, other]

doi 10.1016/j.eswa.2020.114176

How We Refactor and How We Document it? On the Use of Supervised Machine Learning Algorithms to Classify Refactoring Documentation

Authors: Eman Abdullah AlOmar, Anthony Peruma, Mohamed Wiem Mkaouer, Christian Newman, Ali Ouni, Marouane Kessentini

Abstract: Refactoring is the art of improving the design of a system without altering its external behavior. Refactoring has become a well established and disciplined software engineering practice that has attracted a significant amount of research presuming that refactoring is primarily motivated by the need to improve system structures. However, recent studies have shown that developers may incorporate re… ▽ More Refactoring is the art of improving the design of a system without altering its external behavior. Refactoring has become a well established and disciplined software engineering practice that has attracted a significant amount of research presuming that refactoring is primarily motivated by the need to improve system structures. However, recent studies have shown that developers may incorporate refactorings in other development activities that go beyond improving the design. Unfortunately, these studies are limited to developer interviews and a reduced set of projects. To cope with the above-mentioned limitations, we aim to better understand what motivates developers to apply refactoring by mining and classifying a large set of 111,884 commits containing refactorings, extracted from 800 Java projects. We trained a multi-class classifier to categorize these commits into 3 categories, namely, Internal QA, External QA, and Code Smell Resolution, along with the traditional BugFix and Functional categories. This classification challenges the original definition of refactoring, being exclusive to improving the design and fixing code smells. Further, to better understand our classification results, we analyzed commit messages to extract textual patterns that developers regularly use to describe their refactorings. The results show that (1) fixing code smells is not the main driver for developers to refactoring their codebases. Refactoring is solicited for a wide variety of reasons, going beyond its traditional definition; (2) the distribution of refactorings differs between production and test files; (3) developers use several patterns to purposefully target refactoring; (4) the textual patterns, extracted from commit messages, provide better coverage for how developers document their refactorings. △ Less

Submitted 26 October, 2020; originally announced October 2020.

Journal ref: Expert Systems with Applications. 167 (2021) 114176

arXiv:2007.08033 [pdf, other]

doi 10.1016/j.jss.2020.110740

On the Generation, Structure, and Semantics of Grammar Patterns in Source Code Identifiers

Authors: Christian D. Newman, Reem S. AlSuhaibani, Michael J. Decker, Anthony Peruma, Dishant Kaushik, Mohamed Wiem Mkaouer, Emily Hill

Abstract: Identifiers make up a majority of the text in code. They are one of the most basic mediums through which developers describe the code they create and understand the code that others create. Therefore, understanding the patterns latent in identifier naming practices and how accurately we are able to automatically model these patterns is vital if researchers are to support developers and automated a… ▽ More Identifiers make up a majority of the text in code. They are one of the most basic mediums through which developers describe the code they create and understand the code that others create. Therefore, understanding the patterns latent in identifier naming practices and how accurately we are able to automatically model these patterns is vital if researchers are to support developers and automated analysis approaches in comprehending and creating identifiers correctly and optimally. This paper investigates identifiers by studying sequences of part-of-speech annotations, referred to as grammar patterns. This work advances our understanding of these patterns and our ability to model them by 1) establishing common naming patterns in different types of identifiers, such as class and attribute names; 2) analyzing how different patterns influence comprehension; and 3) studying the accuracy of state-of-the-art techniques for part-of-speech annotations, which are vital in automatically modeling identifier naming patterns, in order to establish their limits and paths toward improvement. To do this, we manually annotate a dataset of 1,335 identifiers from 20 open-source systems and use this dataset to study naming patterns, semantics, and tagger accuracy. △ Less

Submitted 15 July, 2020; originally announced July 2020.

Comments: 69 pages, 3 figures, 16 tables

Journal ref: Journal of Systems and Software, 2020, 110740, ISSN 0164-1212

Showing 1–21 of 21 results for author: Peruma, A