Search | arXiv e-print repository

doi 10.1016/j.jss.2022.111515

A Decade of Code Comment Quality Assessment: A Systematic Literature Review

Authors: Pooja Rani, Arianna Blasi, Nataliia Stulova, Sebastiano Panichella, Alessandra Gorla, Oscar Nierstrasz

Abstract: Code comments are important artifacts in software systems and play a paramount role in many software engineering (SE) tasks related to maintenance and program comprehension. However, while it is widely accepted that high quality matters in code comments just as it matters in source code, assessing comment quality in practice is still an open problem. First and foremost, there is no unique definiti… ▽ More Code comments are important artifacts in software systems and play a paramount role in many software engineering (SE) tasks related to maintenance and program comprehension. However, while it is widely accepted that high quality matters in code comments just as it matters in source code, assessing comment quality in practice is still an open problem. First and foremost, there is no unique definition of quality when it comes to evaluating code comments. The few existing studies on this topic rather focus on specific attributes of quality that can be easily quantified and measured. Existing techniques and corresponding tools may also focus on comments bound to a specific programming language, and may only deal with comments with specific scopes and clear goals (e.g., Javadoc comments at the method level, or in-body comments describing TODOs to be addressed). In this paper, we present a Systematic Literature Review (SLR) of the last decade of research in SE to answer the following research questions: (i) What types of comments do researchers focus on when assessing comment quality? (ii) What quality attributes (QAs) do they consider? (iii) Which tools and techniques do they use to assess comment quality?, and (iv) How do they evaluate their studies on comment quality assessment in general? Our evaluation, based on the analysis of 2353 papers and the actual review of 47 relevant ones, shows that (i) most studies and techniques focus on comments in Java code, thus may not be generalizable to other languages, and (ii) the analyzed studies focus on four main QAs of a total of 21 QAs identified in the literature, with a clear predominance of checking consistency between comments and the code. We observe that researchers rely on manual assessment and specific heuristics rather than the automated assessment of the comment quality attributes. △ Less

Submitted 16 September, 2022; originally announced September 2022.

arXiv:2201.04853 [pdf]

FuzzingDriver: the Missing Dictionary to Increase Code Coverage in Fuzzers

Authors: Arash Ale Ebrahim, Mohammadreza Hazhirpasand, Oscar Nierstrasz, Mohammad Ghafari

Abstract: We propose a tool, called FuzzingDriver, to generate dictionary tokens for coverage-based greybox fuzzers (CGF) from the codebase of any target program. FuzzingDriver does not add any overhead to the fuzzing job as it is run beforehand. We compared FuzzingDriver to Google dictionaries by fuzzing six open-source targets, and we found that FuzzingDriver consistently achieves higher code coverage in… ▽ More We propose a tool, called FuzzingDriver, to generate dictionary tokens for coverage-based greybox fuzzers (CGF) from the codebase of any target program. FuzzingDriver does not add any overhead to the fuzzing job as it is run beforehand. We compared FuzzingDriver to Google dictionaries by fuzzing six open-source targets, and we found that FuzzingDriver consistently achieves higher code coverage in all tests. We also executed eight benchmarks on FuzzBench to demonstrate how utilizing FuzzingDriver's dictionaries can outperform six widely-used CGF fuzzers. In future work, investigating the impact of FuzzingDriver's dictionaries on improving bug coverage might prove important. Video demonstration: https://www.youtube.com/watch?v=Y8j_KvfRrI8 △ Less

Submitted 13 January, 2022; originally announced January 2022.

Comments: 29th edition of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2022

arXiv:2111.03601 [pdf]

Security Header Fields in HTTP Clients

Authors: Pascal Gadient, Oscar Nierstrasz, Mohammad Ghafari

Abstract: HTTP headers are commonly used to establish web communications, and some of them are relevant for security. However, we have only little information about the usage and support of security-relevant headers in mobile applications. We explored the adoption of such headers in mobile app communication by querying 9,714 distinct URLs that were used in 3,376 apps and collected each server's response inf… ▽ More HTTP headers are commonly used to establish web communications, and some of them are relevant for security. However, we have only little information about the usage and support of security-relevant headers in mobile applications. We explored the adoption of such headers in mobile app communication by querying 9,714 distinct URLs that were used in 3,376 apps and collected each server's response information. We discovered that support for secure HTTP header fields is absent in all major HTTP clients, and it is barely provided with any server response. Based on these results, we discuss opportunities for improvement particularly to reduce the likelihood of data leaks and arbitrary code execution. We advocate more comprehensive use of existing HTTP headers and timely development of relevant web browser security features in HTTP client libraries. △ Less

Submitted 5 November, 2021; originally announced November 2021.

Comments: The 21st IEEE International Conference on Software Quality, Reliability and Security (QRS 2021)

arXiv:2111.03596 [pdf, other]

Phish What You Wish

Authors: Pascal Gadient, Pascal Gerig, Oscar Nierstrasz, Mohammad Ghafari

Abstract: IT professionals have no simple tool to create phishing websites and raise the awareness of users. We developed a prototype that can dynamically mimic websites by using enriched screenshots, which requires no additional programming experience and is simple to set up. The generated websites are functional and remain up-to-date. We found that 98% of the hyperlinks in mimicked websites are functional… ▽ More IT professionals have no simple tool to create phishing websites and raise the awareness of users. We developed a prototype that can dynamically mimic websites by using enriched screenshots, which requires no additional programming experience and is simple to set up. The generated websites are functional and remain up-to-date. We found that 98% of the hyperlinks in mimicked websites are functional with our tool, compared to 43% with the best competitor, and only two participants suspected phishing attempts at the time they were performing tasks with our prototype. This work intends to raise awareness for phishing attempts especially with local websites by providing an easy to use prototype to set up such phishing sites. △ Less

Submitted 5 November, 2021; originally announced November 2021.

Comments: The 21st IEEE International Conference on Software Quality, Reliability and Security (QRS 2021)

arXiv:2111.01406 [pdf, ps, other]

Dazed and Confused: What's Wrong with Crypto Libraries?

Authors: Mohammadreza Hazhirpasand, Oscar Nierstrasz, Mohammad Ghafari

Abstract: Recent studies have shown that developers have difficulties in using cryptographic APIs, which often led to security flaws. We are interested to tackle this matter by looking into what types of problems exist in various crypto libraries. We manually studied 500 posts on Stack Overflow associated with 20 popular crypto libraries. We realized there are 10 themes in the discussions. Interestingly, th… ▽ More Recent studies have shown that developers have difficulties in using cryptographic APIs, which often led to security flaws. We are interested to tackle this matter by looking into what types of problems exist in various crypto libraries. We manually studied 500 posts on Stack Overflow associated with 20 popular crypto libraries. We realized there are 10 themes in the discussions. Interestingly, there were only two questions related to attacks against cryptography. There were 63 discussions in which developers had interoperability issues when working with more than a crypto library. The majority of posts (i.e. 112) were about encryption/decryption problems and 111 were about installation/compilation issues of crypto libraries. Overall, we realize that the crypto libraries are frequently involved in more than five themes of discussions. We believe the current initial findings can help team leaders and experienced developers to correctly guide the team members in the domain of cryptography. Moreover, future research should investigate the similarity of problems at the API level among popular crypto libraries. △ Less

Submitted 2 November, 2021; originally announced November 2021.

Comments: 18th Annual International Conference on Privacy, Security and Trust (PST2021)

arXiv:2109.15093 [pdf, other]

Crypto Experts Advise What They Adopt

Authors: Mohammadreza Hazhirpasand, Oscar Nierstrasz, Mohammad Ghafari

Abstract: Previous studies have shown that developers regularly seek advice on online forums to resolve their cryptography issues. We investigated whether users who are active in cryptography discussions also use cryptography in practice. We collected the top 1% of responders who have participated in crypto discussions on Stack Overflow, and we manually analyzed their crypto contributions to open source pro… ▽ More Previous studies have shown that developers regularly seek advice on online forums to resolve their cryptography issues. We investigated whether users who are active in cryptography discussions also use cryptography in practice. We collected the top 1% of responders who have participated in crypto discussions on Stack Overflow, and we manually analyzed their crypto contributions to open source projects on GitHub. We could identify 319 GitHub profiles that belonged to such crypto responders and found that 189 of them used cryptography in their projects. Further investigation revealed that the majority of analyzed users (i.e., 85%) use the same programming languages for crypto activity on Stack Overflow and crypto contributions on GitHub. Moreover, 90% of the analyzed users employed the same concept of cryptography in their projects as they advised about on Stack Overflow. △ Less

Submitted 30 September, 2021; originally announced September 2021.

Comments: 2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW)

arXiv:2109.14363 [pdf, other]

Worrisome Patterns in Developers: A Survey in Cryptography

Authors: Mohammadreza Hazhirpasand, Oscar Nierstrasz, Mohammad Ghafari

Abstract: We surveyed 97 developers who had used cryptography in open-source projects, in the hope of identifying developer security and cryptography practices. We asked them about individual and company-level practices, and divided respondents into three groups (i.e., high, medium, and low) based on their level of knowledge. We found differences between the high-profile developers and the other two groups.… ▽ More We surveyed 97 developers who had used cryptography in open-source projects, in the hope of identifying developer security and cryptography practices. We asked them about individual and company-level practices, and divided respondents into three groups (i.e., high, medium, and low) based on their level of knowledge. We found differences between the high-profile developers and the other two groups. For instance, high-profile developers have more years of experience in programming, have attended more security and cryptography courses, have more background in security, are highly concerned about security, and tend to use security tools more than the other two groups. Nevertheless, we observed worrisome patterns among all participants such as the high usage of unreliable sources like Stack Overflow, and the low rate of security tool usage. △ Less

Submitted 30 September, 2021; v1 submitted 29 September, 2021; originally announced September 2021.

Comments: 2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW)

arXiv:2108.11205 [pdf, other]

RepliComment: Identifying Clones in Code Comments

Authors: Arianna Blasi, Nataliia Stulova, Alessandra Gorla, Oscar Nierstrasz

Abstract: Code comments are the primary means to document implementation and facilitate program comprehension. Thus, their quality should be a primary concern to improve program maintenance. While much effort has been dedicated to detecting bad smells, such as clones in code, little work has focused on comments. In this paper we present our solution to detect clones in comments that developers should fix. R… ▽ More Code comments are the primary means to document implementation and facilitate program comprehension. Thus, their quality should be a primary concern to improve program maintenance. While much effort has been dedicated to detecting bad smells, such as clones in code, little work has focused on comments. In this paper we present our solution to detect clones in comments that developers should fix. RepliComment can automatically analyze Java projects and report instances of copy-and-paste errors in comments, and can point developers to which comments should be fixed. Moreover, it can report when clones are signs of poorly written comments. Developers should fix these instances too in order to improve the quality of the code documentation. Our evaluation of 10 well-known open source Java projects identified over 11K instances of comment clones, and over 1,300 of them are potentially critical. We improve on our own previous work, which could only find 36 issues in the same dataset. Our manual inspection of 412 issues reported by RepliComment reveals that it achieves a precision of 79% in reporting critical comment clones. The manual inspection of 200 additional comment clones that RepliComment filters out as being legitimate, could not evince any false negative. △ Less

Submitted 25 August, 2021; originally announced August 2021.

Comments: 31 pages, 1 figure, 9 tables. To appear in the Journal of Systems and Software

ACM Class: D.2.7; D.2.9

arXiv:2108.10766 [pdf, other]

doi 10.1109/SCAM52516.2021.00028

Do Comments follow Commenting Conventions? A Case Study in Java and Python

Authors: Pooja Rani, Suada Abukar, Nataliia Stulova, Alexandre Bergel, Oscar Nierstrasz

Abstract: Assessing code comment quality is known to be a difficult problem. A number of coding style guidelines have been created with the aim to encourage writing of informative, readable, and consistent comments. However, it is not clear from the research to date which specific aspects of comments the guidelines cover (e.g., syntax, content, structure). Furthermore, the extent to which developers follow… ▽ More Assessing code comment quality is known to be a difficult problem. A number of coding style guidelines have been created with the aim to encourage writing of informative, readable, and consistent comments. However, it is not clear from the research to date which specific aspects of comments the guidelines cover (e.g., syntax, content, structure). Furthermore, the extent to which developers follow these guidelines while writing code comments is unknown. We analyze various style guidelines in Java and Python and uncover that the majority of them address more the content aspect of the comments rather than syntax or formatting, but when considering the different types of information developers embed in comments and the concerns they raise on various online platforms about the commenting practices, existing comment conventions are not yet specified clearly enough, nor do they adequately cover important concerns. We also analyze commenting practices of developers in diverse projects to see the extent to which they follow the guidelines. Our results highlight the mismatch between developer commenting practices and style guidelines, and provide several focal points for the design and improvement of comment quality checking tools. △ Less

Submitted 27 August, 2021; v1 submitted 24 August, 2021; originally announced August 2021.

Comments: 5 pages, 3 figures, conference

arXiv:2108.07648 [pdf, other]

doi 10.1109/SCAM52516.2021.00027

What Do Developers Discuss about Code Comments?

Authors: Pooja Rani, Mathias Birrer, Sebastiano Panichella, Mohammad Ghafari, Oscar Nierstrasz

Abstract: Code comments are important for program comprehension, development, and maintenance tasks. Given the varying standards for code comments, and their unstructured or semi-structured nature, developers get easily confused (especially novice developers) about which convention(s) to follow, or what tools to use while writing code documentation. Thus, they post related questions on external online sourc… ▽ More Code comments are important for program comprehension, development, and maintenance tasks. Given the varying standards for code comments, and their unstructured or semi-structured nature, developers get easily confused (especially novice developers) about which convention(s) to follow, or what tools to use while writing code documentation. Thus, they post related questions on external online sources to seek better commenting practices. In this paper, we analyze code comment discussions on online sources such as Stack Overflow (SO) and Quora to shed some light on the questions developers ask about commenting practices. We apply Latent Dirichlet Allocation (LDA) to identify emerging topics concerning code comments. Then we manually analyze a statistically significant sample set of posts to derive a taxonomy that provides an overview of the developer questions about commenting practices. Our results highlight that on SO nearly 40% of the questions mention how to write or process comments in documentation tools and environments, and nearly 20% of the questions are about potential limitations and possibilities of documentation tools to add automatically and consistently more information in comments. On the other hand, on Quora, developer questions focus more on background information (35% of the questions) or asking opinions (16% of the questions) about code comments. We found that (i) not all aspects of comments are covered in coding style guidelines, e.g., how to add a specific type of information, (ii) developers need support in learning the syntax and format conventions to add various types of information in comments, and (iii) developers are interested in various automated strategies for comments such as detection of bad comments, or verify comment style automatically, but lack tool support to do that. △ Less

Submitted 17 August, 2021; originally announced August 2021.

Comments: 21st IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM'21)

arXiv:2108.07188 [pdf, other]

doi 10.1145/3475716.3475780

Security Smells Pervade Mobile App Servers

Authors: Pascal Gadient, Marc-Andrea Tarnutzer, Oscar Nierstrasz, Mohammad Ghafari

Abstract: [Background] Web communication is universal in cyberspace, and security risks in this domain are devastating. [Aims] We analyzed the prevalence of six security smells in mobile app servers, and we investigated the consequence of these smells from a security perspective. [Method] We used an existing dataset that includes 9714 distinct URLs used in 3376 Android mobile apps. We exercised these URLs t… ▽ More [Background] Web communication is universal in cyberspace, and security risks in this domain are devastating. [Aims] We analyzed the prevalence of six security smells in mobile app servers, and we investigated the consequence of these smells from a security perspective. [Method] We used an existing dataset that includes 9714 distinct URLs used in 3376 Android mobile apps. We exercised these URLs twice within 14 months and investigated the HTTP headers and bodies. [Results] We found that more than 69% of tested apps suffer from three kinds of security smells, and that unprotected communication and misconfigurations are very common in servers. Moreover, source-code and version leaks, or the lack of update policies expose app servers to security risks. [Conclusions] Poor app server maintenance greatly hampers security. △ Less

Submitted 16 August, 2021; originally announced August 2021.

Comments: ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM 2021)

arXiv:2108.07141 [pdf, other]

Hurdles for Developers in Cryptography

Authors: Mohammadreza Hazhirpasand, Oscar Nierstrasz, Mohammadhossein Shabani, Mohammad Ghafari

Abstract: Prior research has shown that cryptography is hard to use for developers. We aim to understand what cryptography issues developers face in practice. We clustered 91954 cryptography-related questions on the Stack Overflow website, and manually analyzed a significant sample (i.e., 383) of the questions to comprehend the crypto challenges developers commonly face in this domain. We found that either… ▽ More Prior research has shown that cryptography is hard to use for developers. We aim to understand what cryptography issues developers face in practice. We clustered 91954 cryptography-related questions on the Stack Overflow website, and manually analyzed a significant sample (i.e., 383) of the questions to comprehend the crypto challenges developers commonly face in this domain. We found that either developers have a distinct lack of knowledge in understanding the fundamental concepts, \eg OpenSSL, public-key cryptography or password hashing, or the usability of crypto libraries undermined developer performance to correctly realize a crypto scenario. This is alarming and indicates the need for dedicated research to improve the design of crypto APIs. △ Less

Submitted 16 August, 2021; originally announced August 2021.

Comments: ICSME 2021 - NIER Track

arXiv:2107.04521 [pdf, other]

doi 10.1016/j.jss.2021.111047

How to Identify Class Comment Types? A Multi-language Approach for Class Comment Classification

Authors: Pooja Rani, Sebastiano Panichella, Manuel Leuenberger, Andrea Di Sorbo, Oscar Nierstrasz

Abstract: Most software maintenance and evolution tasks require developers to understand the source code of their software systems. Software developers usually inspect class comments to gain knowledge about program behavior, regardless of the programming language they are using. Unfortunately, (i) different programming languages present language-specific code commenting notations/guidelines; and (ii) the so… ▽ More Most software maintenance and evolution tasks require developers to understand the source code of their software systems. Software developers usually inspect class comments to gain knowledge about program behavior, regardless of the programming language they are using. Unfortunately, (i) different programming languages present language-specific code commenting notations/guidelines; and (ii) the source code of software projects often lacks comments that adequately describe the class behavior, which complicates program comprehension and evolution activities. To handle these challenges, this paper investigates the different language-specific class commenting practices of three programming languages: Python, Java, and Smalltalk. In particular, we systematically analyze the similarities and differences of the information types found in class comments of projects developed in these languages. We propose an approach that leverages two techniques, namely Natural Language Processing and Text Analysis, to automatically identify various types of information from class comments i.e., the specific types of semantic information found in class comments. To the best of our knowledge, no previous work has provided a comprehensive taxonomy of class comment types for these three programming languages with the help of a common automated approach. Our results confirm that our approach can classify frequent class comment information types with high accuracy for Python, Java, and Smalltalk programming languages. We believe this work can help to monitor and assess the quality and evolution of code comments in different program languages, and thus support maintenance and evolution tasks. △ Less

Submitted 25 July, 2021; v1 submitted 9 July, 2021; originally announced July 2021.

Comments: 25 pages, 10 figures, 8 tables

arXiv:2009.01101 [pdf, other]

Java Cryptography Uses in the Wild

Authors: Mohammadreza Hazhirpasand, Mohammad Ghafari, Oscar Nierstrasz

Abstract: [Background] Previous research has shown that developers commonly misuse cryptography APIs. [Aim] We have conducted an exploratory study to find out how crypto APIs are used in open-source Java projects, what types of misuses exist, and why developers make such mistakes. [Method] We used a static analysis tool to analyze hundreds of open-source Java projects that rely on Java Cryptography Architec… ▽ More [Background] Previous research has shown that developers commonly misuse cryptography APIs. [Aim] We have conducted an exploratory study to find out how crypto APIs are used in open-source Java projects, what types of misuses exist, and why developers make such mistakes. [Method] We used a static analysis tool to analyze hundreds of open-source Java projects that rely on Java Cryptography Architecture, and manually inspected half of the analysis results to assess the tool results. We also contacted the maintainers of these projects by creating an issue on the GitHub repository of each project, and discussed the misuses with developers. [Results] We learned that 85% of Cryptography APIs are misused, however, not every misuse has severe consequences. Developer feedback showed that security caveats in the documentation of crypto APIs are rare, developers may overlook misuses that originate in third-party code, and the context where a Crypto API is used should be taken into account. [Conclusion] We conclude that using Crypto APIs is still problematic for developers but blindly blaming them for such misuses may lead to erroneous conclusions. △ Less

Submitted 2 September, 2020; originally announced September 2020.

Comments: The ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) 2020

arXiv:2006.01181 [pdf, other]

doi 10.1109/SCAM.2017.24

Security Smells in Android

Authors: Mohammad Ghafari, Pascal Gadient, Oscar Nierstrasz

Abstract: The ubiquity of smartphones, and their very broad capabilities and usage, make the security of these devices tremendously important. Unfortunately, despite all progress in security and privacy mechanisms, vulnerabilities continue to proliferate. Research has shown that many vulnerabilities are due to insecure programming practices. However, each study has often dealt with a specific issue, making… ▽ More The ubiquity of smartphones, and their very broad capabilities and usage, make the security of these devices tremendously important. Unfortunately, despite all progress in security and privacy mechanisms, vulnerabilities continue to proliferate. Research has shown that many vulnerabilities are due to insecure programming practices. However, each study has often dealt with a specific issue, making the results less actionable for practitioners. To promote secure programming practices, we have reviewed related research, and identified avoidable vulnerabilities in Android-run devices and the "security code smells" that indicate their presence. In particular, we explain the vulnerabilities, their corresponding smells, and we discuss how they could be eliminated or mitigated during development. Moreover, we develop a lightweight static analysis tool and discuss the extent to which it successfully detects several vulnerabilities in about 46,000 apps hosted by the official Android market. △ Less

Submitted 1 June, 2020; originally announced June 2020.

Comments: 2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)

arXiv:2005.11583 [pdf, other]

doi 10.1007/s10664-021-09981-5

What do class comments tell us? An investigation of comment evolution and practices in Pharo Smalltalk

Authors: Pooja Rani, Sebastiano Panichella, Manuel Leuenberger, Mohammad Ghafari, Oscar Nierstrasz

Abstract: Previous studies have characterized code comments in various programming languages to support better program comprehension activities and maintenance tasks. However, very few studies have focused on understanding developer practices to write comments. None of them has compared such developer practices to the standard comment guidelines to study the extent to which developers follow the guidelines.… ▽ More Previous studies have characterized code comments in various programming languages to support better program comprehension activities and maintenance tasks. However, very few studies have focused on understanding developer practices to write comments. None of them has compared such developer practices to the standard comment guidelines to study the extent to which developers follow the guidelines. This paper reports the first empirical study investigating commenting practices in Pharo Smalltalk. First, we analyze class comment evolution over seven Pharo versions. Then, we investigate the information types embedded in class comments. Finally, we study the adherence of developer commenting practices to the official class comment template over Pharo versions. The results of this study show that there is a rapid increase in class comments in the initial three Pharo versions, while in subsequent versions developers added comments to both new and old classes, thus maintaining a similar code to comment ratio. We furthermore found three times as many information types in class comments as those suggested by the template. However, the information types suggested by the template tend to be present more often than other types of information. Additionally, we find that a substantial proportion of comments follow the writing style of the template in writing these information types, but they are written and formatted in a non-uniform way.This suggests the need to standardize the commenting guidelines for formatting the text, and to provide headers for the different information types to ensure a consistent style and to identify the information easily. Given the importance of high-quality code comments, we draw numerous implications for developers and researchers to improve the support for comment quality assessment tools. △ Less

Submitted 15 June, 2021; v1 submitted 23 May, 2020; originally announced May 2020.

Comments: 35 pages, 26 figures, 10 tables, Journal format, five authors, three research questions

Journal ref: Empirical Software Engineering, 2021

arXiv:2002.08463 [pdf, other]

Tricking Johnny into Granting Web Permissions

Authors: Mohammadreza Hazhirpasand, Mohammad Ghafari, Oscar Nierstrasz

Abstract: We studied the web permission API dialog box in popular mobile and desktop browsers, and found that it typically lacks measures to protect users from unwittingly granting web permission when clicking too fast. We developed a game that exploits this issue, and tricks users into granting webcam permission. We conducted three experiments, each with 40 different participants, on both desktop and mob… ▽ More We studied the web permission API dialog box in popular mobile and desktop browsers, and found that it typically lacks measures to protect users from unwittingly granting web permission when clicking too fast. We developed a game that exploits this issue, and tricks users into granting webcam permission. We conducted three experiments, each with 40 different participants, on both desktop and mobile browsers. The results indicate that in the absence of a prevention mechanism, we achieve a considerably high success rate in tricking 95% and 72% of participants on mobile and desktop browsers, respectively. Interestingly, we also tricked 47% of participants on a desktop browser where a prevention mechanism exists. △ Less

Submitted 19 February, 2020; originally announced February 2020.

Comments: The 24th International Conference on Evaluation and Assessment in Software Engineering (EASE 2020)

arXiv:2002.08458 [pdf, ps, other]

Caveats in Eliciting Mobile App Requirements

Authors: Nitish Patkar, Mohammad Ghafari, Oscar Nierstrasz, Sofija Hotomski

Abstract: Factors such as app stores or platform choices heavily affect functional and non-functional mobile app requirements. We surveyed 45 companies and interviewed ten experts to explore how factors that impact mobile app requirements are understood by requirements engineers in the mobile app industry. We observed a lack of knowledge in several areas. For instance, we observed that all practitioners w… ▽ More Factors such as app stores or platform choices heavily affect functional and non-functional mobile app requirements. We surveyed 45 companies and interviewed ten experts to explore how factors that impact mobile app requirements are understood by requirements engineers in the mobile app industry. We observed a lack of knowledge in several areas. For instance, we observed that all practitioners were aware of data privacy concerns, however, they did not know that certain third-party libraries, usage aggregators, or advertising libraries also occasionally leak sensitive user data. Similarly, certain functional requirements may not be implementable in the absence of a third-party library that is either banned from an app store for policy violations or lacks features, for instance, missing desired features in ARKit library for iOS made practitioners turn to Android. We conclude that requirements engineers should have adequate technical experience with mobile app development as well as sufficient knowledge in areas such as privacy, security and law, in order to make informed decisions during requirements elicitation. △ Less

Submitted 19 February, 2020; originally announced February 2020.

Comments: The 24th International Conference on Evaluation and Assessment in Software Engineering (EASE 2020)

arXiv:2001.00773 [pdf, other]

CryptoExplorer: An Interactive Web Platform Supporting Secure Use of Cryptography APIs

Authors: Mohammadreza Hazhirpasand, Mohammad Ghafari, Oscar Nierstrasz

Abstract: Research has shown that cryptographic APIs are hard to use. Consequently, developers resort to using code examples available in online information sources that are often not secure. We have developed a web platform, named CryptoExplorer, stocked with numerous real-world secure and insecure examples that developers can explore to learn how to use cryptographic APIs properly. This platform currently… ▽ More Research has shown that cryptographic APIs are hard to use. Consequently, developers resort to using code examples available in online information sources that are often not secure. We have developed a web platform, named CryptoExplorer, stocked with numerous real-world secure and insecure examples that developers can explore to learn how to use cryptographic APIs properly. This platform currently provides 3,263 secure uses, and 5,897 insecure uses of Java Cryptography Architecture mined from 2,324 Java projects on GitHub. A preliminary study shows that CryptoExplorer provides developers with secure crypto API use examples instantly, developers can save time compared to searching on the internet for such examples, and they learn to avoid using certain algorithms in APIs by studying misused API examples. We have a pipeline to regularly mine more projects, and, on request, we offer our dataset to researchers. △ Less

Submitted 3 January, 2020; originally announced January 2020.

Comments: 27th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). London, Ontario, Canada, February 18-21, 2020

arXiv:2001.00195 [pdf, other]

doi 10.1109/SANER48275.2020.9054850

Web APIs in Android through the Lens of Security

Authors: Pascal Gadient, Mohammad Ghafari, Marc-Andrea Tarnutzer, Oscar Nierstrasz

Abstract: Web communication has become an indispensable characteristic of mobile apps. However, it is not clear what data the apps transmit, to whom, and what consequences such transmissions have. We analyzed the web communications found in mobile apps from the perspective of security. We first manually studied 160 Android apps to identify the commonly-used communication libraries, and to understand how the… ▽ More Web communication has become an indispensable characteristic of mobile apps. However, it is not clear what data the apps transmit, to whom, and what consequences such transmissions have. We analyzed the web communications found in mobile apps from the perspective of security. We first manually studied 160 Android apps to identify the commonly-used communication libraries, and to understand how they are used in these apps. We then developed a tool to statically identify web API URLs used in the apps, and restore the JSON data schemas including the type and value of each parameter. We extracted 9,714 distinct web API URLs that were used in 3,376 apps. We found that developers often use the java.net package for network communication, however, third-party libraries like OkHttp are also used in many apps. We discovered that insecure HTTP connections are seven times more prevalent in closed-source than in open-source apps, and that embedded SQL and JavaScript code is used in web communication in more than 500 different apps. This finding is devastating; it leaves billions of users and API service providers vulnerable to attack. △ Less

Submitted 1 June, 2020; v1 submitted 1 January, 2020; originally announced January 2020.

Comments: 27th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). London, Ontario, Canada, February 18-21, 2020

arXiv:1908.04090 [pdf, other]

VISON: An Ontology-Based Approach for Software Visualization Tool Discoverability

Authors: Leonel Merino, Ekaterina Kozlova, Oscar Nierstrasz, Daniel Weiskopf

Abstract: Although many tools have been presented in the research literature of software visualization, there is little evidence of their adoption. To choose a suitable visualization tool, practitioners need to analyze various characteristics of tools such as their supported software concerns and level of maturity. Indeed, some tools can be prototypes for which the lifespan is expected to be short, whereas… ▽ More Although many tools have been presented in the research literature of software visualization, there is little evidence of their adoption. To choose a suitable visualization tool, practitioners need to analyze various characteristics of tools such as their supported software concerns and level of maturity. Indeed, some tools can be prototypes for which the lifespan is expected to be short, whereas others can be fairly mature products that are maintained for a longer time. Although such characteristics are often described in papers, we conjecture that practitioners willing to adopt software visualizations require additional support to discover suitable visualization tools. In this paper, we elaborate on our efforts to provide such support. To this end, we systematically analyzed research papers in the literature of software visualization and curated a catalog of 70 available tools that employ various visualization techniques to support the analysis of multiple software concerns. We further encapsulate these characteristics in an ontology. VISON, our software visualization ontology, captures these semantics as concepts and relationships. We report on early results of usage scenarios that demonstrate how the ontology can support (i) developers to find suitable tools for particular development concerns, and (ii) researchers who propose new software visualization tools to identify a baseline tool for a controlled experiment. △ Less

Submitted 12 August, 2019; originally announced August 2019.

Comments: 11 pages, 12 figures, 2 tables. VISSOFT 2019

arXiv:1908.01489 [pdf, other]

The Impact of Developer Experience in Using Java Cryptography

Authors: Mohammadreza Hazhirpasand, Mohammad Ghafari, Stefan Krüger, Eric Bodden, Oscar Nierstrasz

Abstract: Previous research has shown that crypto APIs are hard for developers to understand and difficult for them to use. They consequently rely on unvalidated boilerplate code from online resources where security vulnerabilities are common. We analyzed 2,324 open-source Java projects that rely on Java Cryptography Architecture (JCA) to understand how crypto APIs are used in practice, and what factors a… ▽ More Previous research has shown that crypto APIs are hard for developers to understand and difficult for them to use. They consequently rely on unvalidated boilerplate code from online resources where security vulnerabilities are common. We analyzed 2,324 open-source Java projects that rely on Java Cryptography Architecture (JCA) to understand how crypto APIs are used in practice, and what factors account for the performance of developers in using these APIs. We found that, in general, the experience of developers in using JCA does not correlate with their performance. In particular, none of the factors such as the number or frequency of committed lines of code, the number of JCA APIs developers use, or the number of projects they are involved in correlate with developer performance in this domain. We call for qualitative studies to shed light on the reasons underlying the success of developers who are expert in using cryptography. Also, detailed investigation at API level is necessary to further clarify a developer obstacles in this domain. △ Less

Submitted 5 August, 2019; originally announced August 2019.

Comments: The ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)

arXiv:1908.01476 [pdf, other]

Testability First!

Authors: Mohammad Ghafari, Markus Eggiman, Oscar Nierstrasz

Abstract: The pivotal role of testing in high-quality software production has driven a significant effort in evaluating and assessing testing practices. We explore the state of testing in a large industrial project over an extended period. We study the interplay between bugs in the project and its test cases, and interview developers and stakeholders to uncover reasons underpinning our observations. We real… ▽ More The pivotal role of testing in high-quality software production has driven a significant effort in evaluating and assessing testing practices. We explore the state of testing in a large industrial project over an extended period. We study the interplay between bugs in the project and its test cases, and interview developers and stakeholders to uncover reasons underpinning our observations. We realized that testing is not well adopted, and that testability (ie, ease of testing) is low. We found that developers tended to abandon writing tests when they assessed the effort to be high. Frequent changes in requirements and pressure to add new features also hindered developers from writing tests. Regardless of the debates on test first or later, we hypothesize that the underlying reasons for poor test quality are rooted in a lack of attention to testing early in the development of a software component, leading to poor testability of the component. However, testability is usually overlooked in research that studies the impact of testing practices, and should be explicitly taken into account. △ Less

Submitted 5 August, 2019; originally announced August 2019.

Comments: The ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)

arXiv:1904.06399 [pdf, other]

doi 10.1145/3302541.3313104

PerfVis: Pervasive Visualization in Immersive AugmentedReality for Performance Awareness

Authors: Leonel Merino, Mario Hess, Alexandre Bergel, Oscar Nierstrasz, Daniel Weiskopf

Abstract: Developers are usually unaware of the impact of code changes to the performance of software systems. Although developers can analyze the performance of a system by executing, for instance, a performance test to compare the performance of two consecutive versions of the system, changing from a programming task to a testing task would disrupt the development flow. In this paper, we propose the use o… ▽ More Developers are usually unaware of the impact of code changes to the performance of software systems. Although developers can analyze the performance of a system by executing, for instance, a performance test to compare the performance of two consecutive versions of the system, changing from a programming task to a testing task would disrupt the development flow. In this paper, we propose the use of a city visualization that dynamically provides developers with a pervasive view of the continuous performance of a system. We use an immersive augmented reality device (Microsoft HoloLens) to display our visualization and extend the integrated development environment on a computer screen to use the physical space. We report on technical details of the design and implementation of our visualization tool, and discuss early feedback that we collected of its usability. Our investigation explores a new visual metaphor to support the exploration and analysis of possibly very large and multidimensional performance data. Our initial result indicates that the city metaphor can be adequate to analyze dynamic performance data on a large and non-trivial software system. △ Less

Submitted 5 April, 2019; originally announced April 2019.

Comments: ICPE'19 vision, 4 pages, 2 figure, conference

arXiv:1811.12713 [pdf, other]

doi 10.1007/s10664-018-9673-y

Security Code Smells in Android ICC

Authors: Pascal Gadient, Mohammad Ghafari, Patrick Frischknecht, Oscar Nierstrasz

Abstract: Android Inter-Component Communication (ICC) is complex, largely unconstrained, and hard for developers to understand. As a consequence, ICC is a common source of security vulnerability in Android apps. To promote secure programming practices, we have reviewed related research, and identified avoidable ICC vulnerabilities in Android-run devices and the security code smells that indicate their prese… ▽ More Android Inter-Component Communication (ICC) is complex, largely unconstrained, and hard for developers to understand. As a consequence, ICC is a common source of security vulnerability in Android apps. To promote secure programming practices, we have reviewed related research, and identified avoidable ICC vulnerabilities in Android-run devices and the security code smells that indicate their presence. We explain the vulnerabilities and their corresponding smells, and we discuss how they can be eliminated or mitigated during development. We present a lightweight static analysis tool on top of Android Lint that analyzes the code under development and provides just-in-time feedback within the IDE about the presence of such smells in the code. Moreover, with the help of this tool we study the prevalence of security code smells in more than 700 open-source apps, and manually inspect around 15% of the apps to assess the extent to which identifying such smells uncovers ICC security vulnerabilities. △ Less

Submitted 10 December, 2018; v1 submitted 30 November, 2018; originally announced November 2018.

Comments: Accepted on 28 Nov 2018, Empirical Software Engineering Journal (EMSE), 2018

arXiv:1807.04486 [pdf, other]

The Impact of Feature Selection on Predicting the Number of Bugs

Authors: Haidar Osman, Mohammad Ghafari, Oscar Nierstrasz

Abstract: Bug prediction is the process of training a machine learning model on software metrics and fault information to predict bugs in software entities. While feature selection is an important step in building a robust prediction model, there is insufficient evidence about its impact on predicting the number of bugs in software systems. We study the impact of both correlation-based feature selection (CF… ▽ More Bug prediction is the process of training a machine learning model on software metrics and fault information to predict bugs in software entities. While feature selection is an important step in building a robust prediction model, there is insufficient evidence about its impact on predicting the number of bugs in software systems. We study the impact of both correlation-based feature selection (CFS) filter methods and wrapper feature selection methods on five widely-used prediction models and demonstrate how these models perform with or without feature selection to predict the number of bugs in five different open source Java software systems. Our results show that wrappers outperform the CFS filter; they improve prediction accuracy by up to 33% while eliminating more than half of the features. We also observe that though the same feature selection method chooses different feature subsets in different projects, this subset always contains a mix of source code and change metrics. △ Less

Submitted 12 July, 2018; originally announced July 2018.

arXiv:1209.5490 [pdf, other]

doi 10.1109/WCRE.2008.45

Consistent Layout for Thematic Software Maps

Authors: Adrian Kuhn, Peter Loretan, Oscar Nierstrasz

Abstract: Software visualizations can provide a concise overview of a complex software system. Unfortunately, since software has no physical shape, there is no "natural" map** of software to a two-dimensional space. As a consequence most visualizations tend to use a layout in which position and distance have no meaning, and consequently layout typical diverges from one visualization to another. We propose… ▽ More Software visualizations can provide a concise overview of a complex software system. Unfortunately, since software has no physical shape, there is no "natural" map** of software to a two-dimensional space. As a consequence most visualizations tend to use a layout in which position and distance have no meaning, and consequently layout typical diverges from one visualization to another. We propose a consistent layout for software maps in which the position of a software artifact reflects its \emph{vocabulary}, and distance corresponds to similarity of vocabulary. We use Latent Semantic Indexing (LSI) to map software artifacts to a vector space, and then use Multidimensional Scaling (MDS) to map this vector space down to two dimensions. The resulting consistent layout allows us to develop a variety of thematic software maps that express very different aspects of software while making it easy to compare them. The approach is especially suitable for comparing views of evolving software, since the vocabulary of software artifacts tends to be stable over time. △ Less

Submitted 25 September, 2012; originally announced September 2012.

Comments: In Proceedings of 15th Working Conference on Reverse Engineering (WCRE'08), IEEE Computer Society Press, Los Alamitos CA, October 2008, pp. 209-218

arXiv:1007.4303 [pdf, other]

Embedding Spatial Software Visualization in the IDE: an Exploratory Study

Authors: Adrian Kuhn, David Erni, Oscar Nierstrasz

Abstract: Software visualization can be of great use for understanding and exploring a software system in an intuitive manner. Spatial representation of software is a promising approach of increasing interest. However, little is known about how developers interact with spatial visualizations that are embedded in the IDE. In this paper, we present a pilot study that explores the use of Software Cartography f… ▽ More Software visualization can be of great use for understanding and exploring a software system in an intuitive manner. Spatial representation of software is a promising approach of increasing interest. However, little is known about how developers interact with spatial visualizations that are embedded in the IDE. In this paper, we present a pilot study that explores the use of Software Cartography for program comprehension of an unknown system. We investigated whether developers establish a spatial memory of the system, whether clustering by topic offers a sound base layout, and how developers interact with maps. We report our results in the form of observations, hypotheses, and implications. Key findings are a) that developers made good use of the map to inspect search results and call graphs, and b) that developers found the base layout surprising and often confusing. We conclude with concrete advice for the design of embedded software maps. △ Less

Submitted 25 July, 2010; originally announced July 2010.

Comments: To appear in proceedings of SOFTVIS 2010 conference

arXiv:1001.2386 [pdf, other]

Towards Improving the Mental Model of Software Developers through Cartographic Visualization

Authors: Adrian Kuhn, David Erni, Oscar Nierstrasz

Abstract: Software is intangible and knowledge about software systems is typically tacit. The mental model of software developers is thus an important factor in software engineering. It is our vision that developers should be able to refer to code as being "up in the north", "over in the west", or "down-under in the south". We want to provide developers, and everyone else involved in software development,… ▽ More Software is intangible and knowledge about software systems is typically tacit. The mental model of software developers is thus an important factor in software engineering. It is our vision that developers should be able to refer to code as being "up in the north", "over in the west", or "down-under in the south". We want to provide developers, and everyone else involved in software development, with a *shared*, spatial and stable mental model of their software project. We aim to reinforce this by embedding a cartographic visualization in the IDE (Integrated Development Environment). The visualization is always visible in the bottom-left, similar to the GPS navigation device for car drivers. For each development task, related information is displayed on the map. In this paper we present CODEMAP, an eclipse plug-in, and report on preliminary results from an ongoing user study with professional developers and students. △ Less

Submitted 14 January, 2010; originally announced January 2010.

Showing 1–29 of 29 results for author: Nierstrasz, O