-
What Can Self-Admitted Technical Debt Tell Us About Security? A Mixed-Methods Study
Authors:
Nicolás E. Díaz Ferreyra,
Mojtaba Shahin,
Mansooreh Zahedi,
Sodiq Quadri,
Ricardo Scandariato
Abstract:
Self-Admitted Technical Debt (SATD) encompasses a wide array of sub-optimal design and implementation choices reported in software artefacts (e.g., code comments and commit messages) by developers themselves. Such reports have been central to the study of software maintenance and evolution over the last decades. However, they can also be deemed as dreadful sources of information on potentially exp…
▽ More
Self-Admitted Technical Debt (SATD) encompasses a wide array of sub-optimal design and implementation choices reported in software artefacts (e.g., code comments and commit messages) by developers themselves. Such reports have been central to the study of software maintenance and evolution over the last decades. However, they can also be deemed as dreadful sources of information on potentially exploitable vulnerabilities and security flaws. This work investigates the security implications of SATD from a technical and developer-centred perspective. On the one hand, it analyses whether security pointers disclosed inside SATD sources can be used to characterise vulnerabilities in Open-Source Software (OSS) projects and repositories. On the other hand, it delves into developers' perspectives regarding the motivations behind this practice, its prevalence, and its potential negative consequences. We followed a mixed-methods approach consisting of (i) the analysis of a preexisting dataset containing 8,812 SATD instances and (ii) an online survey with 222 OSS practitioners. We gathered 201 SATD instances through the dataset analysis and mapped them to different Common Weakness Enumeration (CWE) identifiers. Overall, 25 different types of CWEs were spotted across commit messages, pull requests, code comments, and issue sections, from which 8 appear among MITRE's Top-25 most dangerous ones. The survey shows that software practitioners often place security pointers across SATD artefacts to promote a security culture among their peers and help them spot flaky code sections, among other motives. However, they also consider such a practice risky as it may facilitate vulnerability exploits. Our findings suggest that preserving the contextual integrity of security pointers disseminated across SATD artefacts is critical to safeguard both commercial and OSS solutions against zero-day attacks.
△ Less
Submitted 2 March, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
CATMA: Conformance Analysis Tool For Microservice Applications
Authors:
Clinton Cao,
Simon Schneider,
Nicolás E. Díaz Ferreyra,
Sicco Verwer,
Annibale Panichella,
Riccardo Scandariato
Abstract:
The microservice architecture allows developers to divide the core functionality of their software system into multiple smaller services. However, this architectural style also makes it harder for them to debug and assess whether the system's deployment conforms to its implementation. We present CATMA, an automated tool that detects non-conformances between the system's deployment and implementati…
▽ More
The microservice architecture allows developers to divide the core functionality of their software system into multiple smaller services. However, this architectural style also makes it harder for them to debug and assess whether the system's deployment conforms to its implementation. We present CATMA, an automated tool that detects non-conformances between the system's deployment and implementation. It automatically visualizes and generates potential interpretations for the detected discrepancies. Our evaluation of CATMA shows promising results in terms of performance and providing useful insights. CATMA is available at \url{https://cyber-analytics.nl/catma.github.io/}, and a demonstration video is available at \url{https://youtu.be/WKP1hG-TDKc}.
△ Less
Submitted 23 January, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
How Dataflow Diagrams Impact Software Security Analysis: an Empirical Experiment
Authors:
Simon Schneider,
Nicolás E. Díaz Ferreyra,
Pierre-Jean Quéval,
Georg Simhandl,
Uwe Zdun,
Riccardo Scandariato
Abstract:
Models of software systems are used throughout the software development lifecycle. Dataflow diagrams (DFDs), in particular, are well-established resources for security analysis. Many techniques, such as threat modelling, are based on DFDs of the analysed application. However, their impact on the performance of analysts in a security analysis setting has not been explored before. In this paper, we…
▽ More
Models of software systems are used throughout the software development lifecycle. Dataflow diagrams (DFDs), in particular, are well-established resources for security analysis. Many techniques, such as threat modelling, are based on DFDs of the analysed application. However, their impact on the performance of analysts in a security analysis setting has not been explored before. In this paper, we present the findings of an empirical experiment conducted to investigate this effect. Following a within-groups design, participants were asked to solve security-relevant tasks for a given microservice application. In the control condition, the participants had to examine the source code manually. In the model-supported condition, they were additionally provided a DFD of the analysed application and traceability information linking model items to artefacts in source code. We found that the participants (n = 24) performed significantly better in answering the analysis tasks correctly in the model-supported condition (41% increase in analysis correctness). Further, participants who reported using the provided traceability information performed better in giving evidence for their answers (315% increase in correctness of evidence). Finally, we identified three open challenges of using DFDs for security analysis based on the insights gained in the experiment.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations
Authors:
Catherine Tony,
Markus Mutas,
Nicolás E. Díaz Ferreyra,
Riccardo Scandariato
Abstract:
Large Language Models (LLMs) like Codex are powerful tools for performing code completion and code generation tasks as they are trained on billions of lines of code from publicly available sources. Moreover, these models are capable of generating code snippets from Natural Language (NL) descriptions by learning languages and programming practices from public GitHub repositories. Although LLMs prom…
▽ More
Large Language Models (LLMs) like Codex are powerful tools for performing code completion and code generation tasks as they are trained on billions of lines of code from publicly available sources. Moreover, these models are capable of generating code snippets from Natural Language (NL) descriptions by learning languages and programming practices from public GitHub repositories. Although LLMs promise an effortless NL-driven deployment of software applications, the security of the code they generate has not been extensively investigated nor documented. In this work, we present LLMSecEval, a dataset containing 150 NL prompts that can be leveraged for assessing the security performance of such models. Such prompts are NL descriptions of code snippets prone to various security vulnerabilities listed in MITRE's Top 25 Common Weakness Enumeration (CWE) ranking. Each prompt in our dataset comes with a secure implementation example to facilitate comparative evaluations against code produced by LLMs. As a practical application, we show how LLMSecEval can be used for evaluating the security of snippets automatically generated from NL descriptions.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Regret, Delete, (Do Not) Repeat: An Analysis of Self-Cleaning Practices on Twitter After the Outbreak of the COVID-19 Pandemic
Authors:
Nicolás E. Díaz Ferreyra,
Gautam Kishore Shahi,
Catherine Tony,
Stefan Stieglitz,
Riccardo Scandariato
Abstract:
During the outbreak of the COVID-19 pandemic, many people shared their symptoms across Online Social Networks (OSNs) like Twitter, ho** for others' advice or moral support. Prior studies have shown that those who disclose health-related information across OSNs often tend to regret it and delete their publications afterwards. Hence, deleted posts containing sensitive data can be seen as manifesta…
▽ More
During the outbreak of the COVID-19 pandemic, many people shared their symptoms across Online Social Networks (OSNs) like Twitter, ho** for others' advice or moral support. Prior studies have shown that those who disclose health-related information across OSNs often tend to regret it and delete their publications afterwards. Hence, deleted posts containing sensitive data can be seen as manifestations of online regrets. In this work, we present an analysis of deleted content on Twitter during the outbreak of the COVID-19 pandemic. For this, we collected more than 3.67 million tweets describing COVID-19 symptoms (e.g., fever, cough, and fatigue) posted between January and April 2020. We observed that around 24% of the tweets containing personal pronouns were deleted either by their authors or by the platform after one year. As a practical application of the resulting dataset, we explored its suitability for the automatic classification of regrettable content on Twitter.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Developers Need Protection, Too: Perspectives and Research Challenges for Privacy in Social Coding Platforms
Authors:
Nicolás E. Díaz Ferreyra,
Abdessamad Imine,
Melina Vidoni,
Riccardo Scandariato
Abstract:
Social Coding Platforms (SCPs) like GitHub have become central to modern software engineering thanks to their collaborative and version-control features. Like in mainstream Online Social Networks (OSNs) such as Facebook, users of SCPs are subjected to privacy attacks and threats given the high amounts of personal and project-related data available in their profiles and software repositories. Howev…
▽ More
Social Coding Platforms (SCPs) like GitHub have become central to modern software engineering thanks to their collaborative and version-control features. Like in mainstream Online Social Networks (OSNs) such as Facebook, users of SCPs are subjected to privacy attacks and threats given the high amounts of personal and project-related data available in their profiles and software repositories. However, unlike in OSNs, the privacy concerns and practices of SCP users have not been extensively explored nor documented in the current literature. In this work, we present the preliminary results of an online survey (N=105) addressing developers' concerns and perceptions about privacy threats steaming from SCPs. Our results suggest that, although users express concern about social and organisational privacy threats, they often feel safe sharing personal and project-related information on these platforms. Moreover, attacks targeting the inference of sensitive attributes are considered more likely than those seeking to re-identify source-code contributors. Based on these findings, we propose a set of recommendations for future investigations addressing privacy and identity management in SCPs.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
GitHub Considered Harmful? Analyzing Open-Source Projects for the Automatic Generation of Cryptographic API Call Sequences
Authors:
Catherine Tony,
Nicolás E. Díaz Ferreyra,
Riccardo Scandariato
Abstract:
GitHub is a popular data repository for code examples. It is being continuously used to train several AI-based tools to automatically generate code. However, the effectiveness of such tools in correctly demonstrating the usage of cryptographic APIs has not been thoroughly assessed. In this paper, we investigate the extent and severity of misuses, specifically caused by incorrect cryptographic API…
▽ More
GitHub is a popular data repository for code examples. It is being continuously used to train several AI-based tools to automatically generate code. However, the effectiveness of such tools in correctly demonstrating the usage of cryptographic APIs has not been thoroughly assessed. In this paper, we investigate the extent and severity of misuses, specifically caused by incorrect cryptographic API call sequences in GitHub. We also analyze the suitability of GitHub data to train a learning-based model to generate correct cryptographic API call sequences. For this, we manually extracted and analyzed the call sequences from GitHub. Using this data, we augmented an existing learning-based model called DeepAPI to create two security-specific models that generate cryptographic API call sequences for a given natural language (NL) description. Our results indicate that it is imperative to not neglect the misuses in API call sequences while using data sources like GitHub, to train models that generate code.
△ Less
Submitted 24 November, 2022;
originally announced November 2022.
-
ENAGRAM: An App to Evaluate Preventative Nudges for Instagram
Authors:
Nicolás E. Díaz Ferreyra,
Sina Ostendorf,
Esma Aïmeur,
Maritta Heisel,
Matthias Brand
Abstract:
Online self-disclosure is perhaps one of the last decade's most studied communication processes, thanks to the introduction of Online Social Networks (OSNs) like Facebook. Self-disclosure research has contributed significantly to the design of preventative nudges seeking to support and guide users when revealing private information in OSNs. Still, assessing the effectiveness of these solutions is…
▽ More
Online self-disclosure is perhaps one of the last decade's most studied communication processes, thanks to the introduction of Online Social Networks (OSNs) like Facebook. Self-disclosure research has contributed significantly to the design of preventative nudges seeking to support and guide users when revealing private information in OSNs. Still, assessing the effectiveness of these solutions is often challenging since changing or modifying the choice architecture of OSN platforms is practically unfeasible. In turn, the effectiveness of numerous nudging designs is supported primarily by self-reported data instead of actual behavioral information. This work presents ENAGRAM, an app for evaluating preventative nudges, and reports the first results of an empirical study conducted with it. Such a study aims to showcase how the app (and the data collected with it) can be leveraged to assess the effectiveness of a particular nudging approach. We used ENAGRAM as a vehicle to test a risk-based strategy for nudging the self-disclosure decisions of Instagram users. For this, we created two variations of the same nudge and tested it in a between-subjects experimental setting. Study participants (N=22) were recruited via Prolific and asked to use the app regularly for 7 days. An online survey was distributed at the end of the experiment to measure some privacy-related constructs. From the data collected with ENAGRAM, we observed lower (though non-significant) self-disclosure levels when applying risk-based interventions. The constructs measured with the survey were not significant either, except for participants' External Information Privacy Concerns. Our results suggest that (i) ENAGRAM is a suitable alternative for conducting longitudinal experiments in a privacy-friendly way, and (ii) it provides a flexible framework for the evaluation of a broad spectrum of nudging solutions.
△ Less
Submitted 18 August, 2022; v1 submitted 9 August, 2022;
originally announced August 2022.
-
Cybersecurity Discussions in Stack Overflow: A Developer-Centred Analysis of Engagement and Self-Disclosure Behaviour
Authors:
Nicolás E. Díaz Ferreyra,
Melina Vidoni,
Maritta Heisel,
Riccardo Scandariato
Abstract:
Stack Overflow (SO) is a popular platform among developers seeking advice on various software-related topics, including privacy and security. As for many knowledge-sharing websites, the value of SO depends largely on users' engagement, namely their willingness to answer, comment or post technical questions. Still, many of these questions (including cybersecurity-related ones) remain unanswered, pu…
▽ More
Stack Overflow (SO) is a popular platform among developers seeking advice on various software-related topics, including privacy and security. As for many knowledge-sharing websites, the value of SO depends largely on users' engagement, namely their willingness to answer, comment or post technical questions. Still, many of these questions (including cybersecurity-related ones) remain unanswered, putting the site's relevance and reputation into question. Hence, it is important to understand users' participation in privacy and security discussions to promote engagement and foster the exchange of such expertise. Objective: Based on prior findings on online social networks, this work elaborates on the interplay between users' engagement and their privacy practices in SO. Particularly, it analyses developers' self-disclosure behaviour regarding profile visibility and their involvement in discussions related to privacy and security. Method: We followed a mixed-methods approach by (i) analysing SO data from 1239 cybersecurity-tagged questions along with 7048 user profiles, and (ii) conducting an anonymous online survey (N=64). Results: About 33% of the questions we retrieved had no answer, whereas more than 50% had no accepted answer. We observed that "proactive" users tend to disclose significantly less information in their profiles than "reactive" and "unengaged" ones. However, no correlations were found between these engagement categories and privacy-related constructs such as Perceived Control or General Privacy Concerns. Implications: These findings contribute to (i) a better understanding of developers' engagement towards privacy and security topics, and (ii) to shape strategies promoting the exchange of cybersecurity expertise in SO.
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
Conversational DevBots for Secure Programming: An Empirical Study on SKF Chatbot
Authors:
Catherine Tony,
Mohana Balasubramanian,
Nicolás E. Díaz Ferreyra,
Riccardo Scandariato
Abstract:
Conversational agents or chatbots are widely investigated and used across different fields including healthcare, education, and marketing. Still, the development of chatbots for assisting secure coding practices is in its infancy. In this paper, we present the results of an empirical study on SKF chatbot, a software-development bot (DevBot) designed to answer queries about software security. To th…
▽ More
Conversational agents or chatbots are widely investigated and used across different fields including healthcare, education, and marketing. Still, the development of chatbots for assisting secure coding practices is in its infancy. In this paper, we present the results of an empirical study on SKF chatbot, a software-development bot (DevBot) designed to answer queries about software security. To the best of our knowledge, SKF chatbot is one of the very few of its kind, thus a representative instance of conversational DevBots aiding secure software development. In this study, we collect and analyse empirical evidence on the effectiveness of SKF chatbot, while assessing the needs and expectations of its users (i.e., software developers). Furthermore, we explore the factors that may hinder the elaboration of more sophisticated conversational security DevBots and identify features for improving the efficiency of state-of-the-art solutions. All in all, our findings provide valuable insights pointing towards the design of more context-aware and personalized conversational DevBots for security engineering.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
Should I Get Involved? On the Privacy Perils of Mining Software Repositories for Research Participants
Authors:
Melina Vidoni,
Nicolás E. Díaz Ferreyra
Abstract:
Mining Software Repositories (MSRs) is an evidence-based methodology that cross-links data to uncover actionable information about software systems. Empirical studies in software engineering often leverage MSR techniques as they allow researchers to unveil issues and flaws in software development so as to analyse the different factors contributing to them. Hence, counting on fine-grained informati…
▽ More
Mining Software Repositories (MSRs) is an evidence-based methodology that cross-links data to uncover actionable information about software systems. Empirical studies in software engineering often leverage MSR techniques as they allow researchers to unveil issues and flaws in software development so as to analyse the different factors contributing to them. Hence, counting on fine-grained information about the repositories and sources being mined (e.g., server names, and contributors' identities) is essential for the reproducibility and transparency of MSR studies. However, this can also introduce threats to participants' privacy as their identities may be linked to flawed/sub-optimal programming practices (e.g., code smells, improper documentation), or vice-versa. Moreover, this can be extensible to close collaborators and community members resulting "guilty by association". This position paper aims to start a discussion about indirect participation in MSRs investigations, the dichotomy of 'privacy vs. utility' regarding sharing non-aggregated data, and its effects on privacy restrictions and ethical considerations for participant involvement.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
SoK: Security of Microservice Applications: A Practitioners' Perspective on Challenges and Best Practices
Authors:
Priyanka Billawa,
Anusha Bambhore Tukaram,
Nicolás E. Díaz Ferreyra,
Jan-Philipp Steghöfer,
Riccardo Scandariato,
Georg Simhandl
Abstract:
Cloud-based application deployment is becoming increasingly popular among businesses, thanks to the emergence of microservices. However, securing such architectures is a challenging task since traditional security concepts cannot be directly applied to microservice architectures due to their distributed nature. The situation is exacerbated by the scattered nature of guidelines and best practices a…
▽ More
Cloud-based application deployment is becoming increasingly popular among businesses, thanks to the emergence of microservices. However, securing such architectures is a challenging task since traditional security concepts cannot be directly applied to microservice architectures due to their distributed nature. The situation is exacerbated by the scattered nature of guidelines and best practices advocated by practitioners and organizations in this field. This research paper we aim to shay light over the current microservice security discussions hidden within Grey Literature (GL) sources. Particularly, we identify the challenges that arise when securing microservice architectures, as well as solutions recommended by practitioners to address these issues. For this, we conducted a systematic GL study on the challenges and best practices of microservice security present in the Internet with the goal of capturing relevant discussions in blogs, white papers, and standards. We collected 312 GL sources from which 57 were rigorously classified and analyzed. This analysis on the one hand validated past academic literature studies in the area of microservice security, but it also identified improvements to existing methodologies pointing towards future research directions.
△ Less
Submitted 2 September, 2022; v1 submitted 3 February, 2022;
originally announced February 2022.
-
Community Detection for Access-Control Decisions: Analysing the Role of Homophily and Information Diffusion in Online Social Networks
Authors:
Nicolas E. Diaz Ferreyra,
Tobias Hecking,
Esma Aïmeur,
Maritta Heisel,
H. Ulrich Hoppe
Abstract:
Access-Control Lists (ACLs) (a.k.a. friend lists) are one of the most important privacy features of Online Social Networks (OSNs) as they allow users to restrict the audience of their publications. Nevertheless, creating and maintaining custom ACLs can introduce a high cognitive burden on average OSNs users since it normally requires assessing the trustworthiness of a large number of contacts. In…
▽ More
Access-Control Lists (ACLs) (a.k.a. friend lists) are one of the most important privacy features of Online Social Networks (OSNs) as they allow users to restrict the audience of their publications. Nevertheless, creating and maintaining custom ACLs can introduce a high cognitive burden on average OSNs users since it normally requires assessing the trustworthiness of a large number of contacts. In principle, community detection algorithms can be leveraged to support the generation of ACLs by map** a set of examples (i.e. contacts labelled as untrusted) to the emerging communities inside the user's ego-network. However, unlike users' access-control preferences, traditional community-detection algorithms do not take the homophily characteristics of such communities into account (i.e. attributes shared among members). Consequently, this strategy may lead to inaccurate ACL configurations and privacy breaches under certain homophily scenarios. This work investigates the use of community-detection algorithms for the automatic generation of ACLs in OSNs. Particularly, it analyses the performance of the aforementioned approach under different homophily conditions through a simulation model. Furthermore, since private information may reach the scope of untrusted recipients through the re-sharing affordances of OSNs, information diffusion processes are also modelled and taken explicitly into account. Altogether, the removal of gatekeeper nodes is further explored as a strategy to counteract unwanted data dissemination.
△ Less
Submitted 7 June, 2021; v1 submitted 19 April, 2021;
originally announced April 2021.
-
Persuasion Meets AI: Ethical Considerations for the Design of Social Engineering Countermeasures
Authors:
Nicolas E. Díaz Ferreyra,
Esma Aïmeur,
Hicham Hage,
Maritta Heisel,
Catherine García van Hoogstraten
Abstract:
Privacy in Social Network Sites (SNSs) like Facebook or Instagram is closely related to people's self-disclosure decisions and their ability to foresee the consequences of sharing personal information with large and diverse audiences. Nonetheless, online privacy decisions are often based on spurious risk judgements that make people liable to reveal sensitive data to untrusted recipients and become…
▽ More
Privacy in Social Network Sites (SNSs) like Facebook or Instagram is closely related to people's self-disclosure decisions and their ability to foresee the consequences of sharing personal information with large and diverse audiences. Nonetheless, online privacy decisions are often based on spurious risk judgements that make people liable to reveal sensitive data to untrusted recipients and become victims of social engineering attacks. Artificial Intelligence (AI) in combination with persuasive mechanisms like nudging is a promising approach for promoting preventative privacy behaviour among the users of SNSs. Nevertheless, combining behavioural interventions with high levels of personalization can be a potential threat to people's agency and autonomy even when applied to the design of social engineering countermeasures. This paper elaborates on the ethical challenges that nudging mechanisms can introduce to the development of AI-based countermeasures, particularly to those addressing unsafe self-disclosure practices in SNSs. Overall, it endorses the elaboration of personalized risk awareness solutions as i) an ethical approach to counteract social engineering, and ii) as an effective means for promoting reflective privacy decisions.
△ Less
Submitted 27 September, 2020;
originally announced September 2020.
-
Learning from Online Regrets: From Deleted Posts to Risk Awareness in Social Network Sites
Authors:
Nicolas E. Diaz Ferreyra,
Rene Meis,
Maritta Heisel
Abstract:
Social Network Sites (SNSs) like Facebook or Instagram are spaces where people expose their lives to wide and diverse audiences. This practice can lead to unwanted incidents such as reputation damage, job loss or harassment when pieces of private information reach unintended recipients. As a consequence, users often regret to have posted private information in these platforms and proceed to delete…
▽ More
Social Network Sites (SNSs) like Facebook or Instagram are spaces where people expose their lives to wide and diverse audiences. This practice can lead to unwanted incidents such as reputation damage, job loss or harassment when pieces of private information reach unintended recipients. As a consequence, users often regret to have posted private information in these platforms and proceed to delete such content after having a negative experience. Risk awareness is a strategy that can be used to persuade users towards safer privacy decisions. However, many risk awareness technologies for SNSs assume that information about risks is retrieved and measured by an expert in the field. Consequently, risk estimation is an activity that is often passed over despite its importance. In this work we introduce an approach that employs deleted posts as risk information vehicles to measure the frequency and consequence level of self-disclosure patterns in SNSs. In this method, consequence is reported by the users through an ordinal scale and used later on to compute a risk criticality index. We thereupon show how this index can serve in the design of adaptive privacy nudges for SNSs.
△ Less
Submitted 21 August, 2020;
originally announced August 2020.