Search | arXiv e-print repository

Relevant information in TDD experiment reporting

Authors: Fernando Uyaguari, Silvia T. Acuña, John W. Castro, Davide Fucci, Oscar Dieste, Sira Vegas

Abstract: Experiments are a commonly used method of research in software engineering (SE). Researchers report their experiments following detailed guidelines. However, researchers do not, in the field of test-driven development (TDD) at least, specify how they operationalized the response variables and the measurement process. This article has three aims: (i) identify the response variable operationalizatio… ▽ More Experiments are a commonly used method of research in software engineering (SE). Researchers report their experiments following detailed guidelines. However, researchers do not, in the field of test-driven development (TDD) at least, specify how they operationalized the response variables and the measurement process. This article has three aims: (i) identify the response variable operationalization components in TDD experiments that study external quality; (ii) study their influence on the experimental results;(ii) determine if the experiment reports describe the measurement process components that have an impact on the results. Sequential mixed method. The first part of the research adopts a quantitative approach applying a statistical análisis (SA) of the impact of the operationalization components on the experimental results. The second part follows on with a qualitative approach applying a systematic map** study (SMS). The test suites, intervention types and measurers have an influence on the measurements and results of the SA of TDD experiments in SE. The test suites have a major impact on both the measurements and the results of the experiments. The intervention type has less impact on the results than on the measurements. While the measurers have an impact on the measurements, this is not transferred to the experimental results. On the other hand, the results of our SMS confirm that TDD experiments do not usually report either the test suites, the test case generation method, or the details of how external quality was measured. A measurement protocol should be used to assure that the measurements made by different measurers are similar. It is necessary to report the test cases, the experimental task and the intervention type in order to be able to reproduce the measurements and SA, as well as to replicate experiments and build dependable families of experiments. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.01055 [pdf, other]

Requirements Quality Research Artifacts: Recovery, Analysis, and Management Guideline

Authors: Julian Frattini, Lloyd Montgomery, Davide Fucci, Michael Unterkalmsteiner, Daniel Mendez, Jannik Fischbach

Abstract: Requirements quality research, which is dedicated to assessing and improving the quality of requirements specifications, is dependent on research artifacts like data sets (containing information about quality defects) and implementations (automatically detecting and removing these defects). However, recent research exposed that the majority of these research artifacts have become unavailable or ha… ▽ More Requirements quality research, which is dedicated to assessing and improving the quality of requirements specifications, is dependent on research artifacts like data sets (containing information about quality defects) and implementations (automatically detecting and removing these defects). However, recent research exposed that the majority of these research artifacts have become unavailable or have never been disclosed, which inhibits progress in the research domain. In this work, we aim to improve the availability of research artifacts in requirements quality research. To this end, we (1) extend an artifact recovery initiative, (2) empirically evaluate the reasons for artifact unavailability using Bayesian data analysis, and (3) compile a concise guideline for open science artifact disclosure. Our results include 10 recovered data sets and 7 recovered implementations, empirical support for artifact availability improving over time and the positive effect of public hosting services, and a pragmatic artifact management guideline open for community comments. With this work, we hope to encourage and support adherence to open science principles and improve the availability of research artifacts for the requirements research quality community. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.09895 [pdf, other]

Measuring the Fitness-for-Purpose of Requirements: An initial Model of Activities and Attributes

Authors: Julian Frattini, Jannik Fischbach, Davide Fucci, Michael Unterkalmsteiner, Daniel Mendez

Abstract: Requirements engineering aims to fulfill a purpose, i.e., inform subsequent software development activities about stakeholders' needs and constraints that must be met by the system under development. The quality of requirements artifacts and processes is determined by how fit for this purpose they are, i.e., how they impact activities affected by them. However, research on requirements quality lac… ▽ More Requirements engineering aims to fulfill a purpose, i.e., inform subsequent software development activities about stakeholders' needs and constraints that must be met by the system under development. The quality of requirements artifacts and processes is determined by how fit for this purpose they are, i.e., how they impact activities affected by them. However, research on requirements quality lacks a comprehensive overview of these activities and how to measure them. In this paper, we specify the research endeavor addressing this gap and propose an initial model of requirements-affected activities and their attributes. We construct a model from three distinct data sources, including both literature and empirical data. The results yield an initial model containing 24 activities and 16 attributes quantifying these activities. Our long-term goal is to develop evidence-based decision support on how to optimize the fitness for purpose of the RE phase to best support the subsequent, affected software development process. We do so by measuring the effect that requirements artifacts and processes have on the attributes of these activities. With the contribution at hand, we invite the research community to critically discuss our research roadmap and support the further evolution of the model. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.00415 [pdf, other]

On Develo** an Artifact-based Approach to Regulatory Requirements Engineering

Authors: Oleksandr Kosenkov, Michael Unterkalmsteiner, Jannik Fischbach, Daniel Mendez, Davide Fucci, Tony Gorschek

Abstract: Context: Regulatory acts are a challenging source when eliciting, interpreting, and analyzing requirements. Requirements engineers often need to involve legal experts who, however, may often not be available. This raises the need for approaches to regulatory Requirements Engineering (RE) covering and integrating both legal and engineering perspectives. Problem: Regulatory RE approaches need to c… ▽ More Context: Regulatory acts are a challenging source when eliciting, interpreting, and analyzing requirements. Requirements engineers often need to involve legal experts who, however, may often not be available. This raises the need for approaches to regulatory Requirements Engineering (RE) covering and integrating both legal and engineering perspectives. Problem: Regulatory RE approaches need to capture and reflect both the elementary concepts and relationships from a legal perspective and their seamless transition to concepts used to specify software requirements. No existing approach considers explicating and managing legal domain knowledge and engineering-legal coordination. Method: We conducted focus group sessions with legal researchers to identify the core challenges to establishing a regulatory RE approach. Based on our findings, we developed a candidate solution and conducted a first conceptual validation to assess its feasibility. Results: We introduce the first version of our Artifact Model for Regulatory Requirements Engineering (AM4RRE) and its conceptual foundation. It provides a blueprint for applying legal (modelling) concepts and well-established RE concepts. Our initial results suggest that artifact-centric RE can be applied to managing legal domain knowledge and engineering-legal coordination. Conclusions: The focus groups that served as a basis for building our model and the results from the expert validation both strengthen our confidence that we already provide a valuable basis for systematically integrating legal concepts into RE. This overcomes contemporary challenges to regulatory RE and serves as a basis for exposure to critical discussions in the community before continuing with the development of tool-supported extensions and large-scale empirical evaluations in practice. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: The paper was accepted to the 14th International Model-Driven Requirements Engineering (MoDRE) workshop co-located with the 32nd IEEE International Requirements Engineering Conference (RE 2024) in Reykjavik, Iceland

arXiv:2403.06685 [pdf, other]

NLP4RE Tools: Classification, Overview, and Management

Authors: Julian Frattini, Michael Unterkalmsteiner, Davide Fucci, Daniel Mendez

Abstract: Tools constitute an essential contribution to natural language processing for requirements engineering (NLP4RE) research. They are executable instruments that make research usable and applicable in practice. In this chapter, we first introduce a systematic classification of NLP4RE tools to improve the understanding of their types and properties. Then, we extend an existing overview with a systemat… ▽ More Tools constitute an essential contribution to natural language processing for requirements engineering (NLP4RE) research. They are executable instruments that make research usable and applicable in practice. In this chapter, we first introduce a systematic classification of NLP4RE tools to improve the understanding of their types and properties. Then, we extend an existing overview with a systematic summary of 126 NLP4RE tools published between April 2019 and June 2023 to ease reuse and evolution of existing tools. Finally, we provide instructions on how to create, maintain, and disseminate NLP4RE tools to support a more rigorous management and dissemination. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: The final version of the chapter will eventually appear in a book titled "Natural Language Processing for Requirements Engineering", edited by Alessio Ferrari and Gouri Ginde and published by Springer Nature Group

arXiv:2402.17954 [pdf, other]

Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps

Authors: Giuseppe Attanasio, Beatrice Savoldi, Dennis Fucci, Dirk Hovy

Abstract: Current automatic speech recognition (ASR) models are designed to be used across many languages and tasks without substantial changes. However, this broad language coverage hides performance gaps within languages, for example, across genders. Our study systematically evaluates the performance of two widely used multilingual ASR models on three datasets, encompassing 19 languages from eight languag… ▽ More Current automatic speech recognition (ASR) models are designed to be used across many languages and tasks without substantial changes. However, this broad language coverage hides performance gaps within languages, for example, across genders. Our study systematically evaluates the performance of two widely used multilingual ASR models on three datasets, encompassing 19 languages from eight language families and two speaking conditions. Our findings reveal clear gender disparities, with the advantaged group varying across languages and models. Surprisingly, those gaps are not explained by acoustic or lexical properties. However, probing internal model states reveals a correlation with gendered performance gap. I.e., the easier it is to distinguish speaker gender in a language using probes, the more the gap reduces, favoring female speakers. Our results show that gender disparities persist even in state-of-the-art models. Our findings have implications for the improvement of multilingual ASR systems, underscoring the importance of accessibility to training data and nuanced evaluation to predict and mitigate gender gaps. We release all code and artifacts at https://github.com/g8a9/multilingual-asr-gender-gap. △ Less

Submitted 19 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: 23 pages. Code and artifacts at https://github.com/g8a9/multilingual-asr-gender-gap

arXiv:2402.17123 [pdf, other]

Theory-Independent Realism

Authors: D. M. Fucci, R. M. Angelo

Abstract: The distinctive features of quantum mechanics, which set it apart from other physical theories, challenge our notions of realism. Recovering realism from purely philosophical grounds, a quantitative and operational criterion was proposed in the past, but solely for the context of quantum mechanics. We use a framework of generalized probabilistic theories to expand the notion of realism for a theor… ▽ More The distinctive features of quantum mechanics, which set it apart from other physical theories, challenge our notions of realism. Recovering realism from purely philosophical grounds, a quantitative and operational criterion was proposed in the past, but solely for the context of quantum mechanics. We use a framework of generalized probabilistic theories to expand the notion of realism for a theory-independent context, providing a criterion uniquely based on the probabilities assigned to measurement outcomes. More so, using robustness and the Kullback-Leibler divergence, we propose quantifiers for the realism of arbitrary physical properties given a particular state of a generic physical theory. These theory-independent quantifiers are then employed in quantum mechanics and we investigate their relation with another well-established irrealism measure. △ Less

Submitted 11 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.10800 [pdf, other]

doi 10.1145/3643664.3648211

A Second Look at the Impact of Passive Voice Requirements on Domain Modeling: Bayesian Reanalysis of an Experiment

Authors: Julian Frattini, Davide Fucci, Richard Torkar, Daniel Mendez

Abstract: The quality of requirements specifications may impact subsequent, dependent software engineering (SE) activities. However, empirical evidence of this impact remains scarce and too often superficial as studies abstract from the phenomena under investigation too much. Two of these abstractions are caused by the lack of frameworks for causal inference and frequentist methods which reduce complex data… ▽ More The quality of requirements specifications may impact subsequent, dependent software engineering (SE) activities. However, empirical evidence of this impact remains scarce and too often superficial as studies abstract from the phenomena under investigation too much. Two of these abstractions are caused by the lack of frameworks for causal inference and frequentist methods which reduce complex data to binary results. In this study, we aim to demonstrate (1) the use of a causal framework and (2) contrast frequentist methods with more sophisticated Bayesian statistics for causal inference. To this end, we reanalyze the only known controlled experiment investigating the impact of passive voice on the subsequent activity of domain modeling. We follow a framework for statistical causal inference and employ Bayesian data analysis methods to re-investigate the hypotheses of the original study. Our results reveal that the effects observed by the original authors turned out to be much less significant than previously assumed. This study supports the recent call to action in SE research to adopt Bayesian data analysis, including causal frameworks and Bayesian statistics, for more sophisticated causal inference. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: Published at the first International Workshop on Methodological Issues with Empirical Studies in Software Engineering (WSESE '24)

arXiv:2402.07145 [pdf, other]

Designing NLP-based solutions for requirements variability management: experiences from a design science study at Visma

Authors: Parisa Elahidoost, Michael Unterkalmsteiner, Davide Fucci, Peter Liljenberg, Jannik Fischbach

Abstract: Context and motivation: In this industry-academia collaborative project, a team of researchers, supported by a software architect, business analyst, and test engineer explored the challenges of requirement variability in a large business software development company. Question/problem: Following the design science paradigm, we studied the problem of requirements analysis and tracing in the context… ▽ More Context and motivation: In this industry-academia collaborative project, a team of researchers, supported by a software architect, business analyst, and test engineer explored the challenges of requirement variability in a large business software development company. Question/problem: Following the design science paradigm, we studied the problem of requirements analysis and tracing in the context of contractual documents, with a specific focus on managing requirements variability. This paper reports on the lessons learned from that experience, highlighting the strategies and insights gained in the realm of requirements variability management. Principal ideas/results: This experience report outlines the insights gained from applying design science in requirements engineering research in industry. We show and evaluate various strategies to tackle the issue of requirement variability. Contribution: We report on the iterations and how the solution development evolved in parallel with problem understanding. From this process, we derive five key lessons learned to highlight the effectiveness of design science in exploring solutions for requirement variability in contract-based environments. △ Less

Submitted 11 February, 2024; originally announced February 2024.

arXiv:2402.06041 [pdf, other]

A Prompt Response to the Demand for Automatic Gender-Neutral Translation

Authors: Beatrice Savoldi, Andrea Piergentili, Dennis Fucci, Matteo Negri, Luisa Bentivogli

Abstract: Gender-neutral translation (GNT) that avoids biased and undue binary assumptions is a pivotal challenge for the creation of more inclusive translation technologies. Advancements for this task in Machine Translation (MT), however, are hindered by the lack of dedicated parallel data, which are necessary to adapt MT systems to satisfy neutral constraints. For such a scenario, large language models of… ▽ More Gender-neutral translation (GNT) that avoids biased and undue binary assumptions is a pivotal challenge for the creation of more inclusive translation technologies. Advancements for this task in Machine Translation (MT), however, are hindered by the lack of dedicated parallel data, which are necessary to adapt MT systems to satisfy neutral constraints. For such a scenario, large language models offer hitherto unforeseen possibilities, as they come with the distinct advantage of being versatile in various (sub)tasks when provided with explicit instructions. In this paper, we explore this potential to automate GNT by comparing MT with the popular GPT-4 model. Through extensive manual analyses, our study empirically reveals the inherent limitations of current MT systems in generating GNTs and provides valuable insights into the potential and challenges associated with prompting for neutrality. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: Accepted at EACL 2024

arXiv:2401.01154 [pdf, other]

Applying Bayesian Data Analysis for Causal Inference about Requirements Quality: A Controlled Experiment

Authors: Julian Frattini, Davide Fucci, Richard Torkar, Lloyd Montgomery, Michael Unterkalmsteiner, Jannik Fischbach, Daniel Mendez

Abstract: It is commonly accepted that the quality of requirements specifications impacts subsequent software engineering activities. However, we still lack empirical evidence to support organizations in deciding whether their requirements are good enough or impede subsequent activities. We aim to contribute empirical evidence to the effect that requirements quality defects have on a software engineering ac… ▽ More It is commonly accepted that the quality of requirements specifications impacts subsequent software engineering activities. However, we still lack empirical evidence to support organizations in deciding whether their requirements are good enough or impede subsequent activities. We aim to contribute empirical evidence to the effect that requirements quality defects have on a software engineering activity that depends on this requirement. We conduct a controlled experiment in which 25 participants from industry and university generate domain models from four natural language requirements containing different quality defects. We evaluate the resulting models using both frequentist and Bayesian data analysis. Contrary to our expectations, our results show that the use of passive voice only has a minor impact on the resulting domain models. The use of ambiguous pronouns, however, shows a strong effect on various properties of the resulting domain models. Most notably, ambiguous pronouns lead to incorrect associations in domain models. Despite being equally advised against by literature and frequentist methods, the Bayesian data analysis shows that the two investigated quality defects have vastly different impacts on software engineering activities and, hence, deserve different levels of attention. Our employed method can be further utilized by researchers to improve reliable, detailed empirical evidence on requirements quality. △ Less

Submitted 1 July, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

arXiv:2310.15752 [pdf, other]

Integrating Language Models into Direct Speech Translation: An Inference-Time Solution to Control Gender Inflection

Authors: Dennis Fucci, Marco Gaido, Sara Papi, Mauro Cettolo, Matteo Negri, Luisa Bentivogli

Abstract: When translating words referring to the speaker, speech translation (ST) systems should not resort to default masculine generics nor rely on potentially misleading vocal traits. Rather, they should assign gender according to the speakers' preference. The existing solutions to do so, though effective, are hardly feasible in practice as they involve dedicated model re-training on gender-labeled ST d… ▽ More When translating words referring to the speaker, speech translation (ST) systems should not resort to default masculine generics nor rely on potentially misleading vocal traits. Rather, they should assign gender according to the speakers' preference. The existing solutions to do so, though effective, are hardly feasible in practice as they involve dedicated model re-training on gender-labeled ST data. To overcome these limitations, we propose the first inference-time solution to control speaker-related gender inflections in ST. Our approach partially replaces the (biased) internal language model (LM) implicitly learned by the ST decoder with gender-specific external LMs. Experiments on en->es/fr/it show that our solution outperforms the base models and the best training-time mitigation strategy by up to 31.0 and 1.6 points in gender accuracy, respectively, for feminine forms. The gains are even larger (up to 32.0 and 3.4) in the challenging condition where speakers' vocal traits conflict with their gender. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: Accepted at EMNLP 2023

arXiv:2310.15114 [pdf, other]

How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation

Authors: Marco Gaido, Dennis Fucci, Matteo Negri, Luisa Bentivogli

Abstract: When translating from notional gender languages (e.g., English) into grammatical gender languages (e.g., Italian), the generated translation requires explicit gender assignments for various words, including those referring to the speaker. When the source sentence does not convey the speaker's gender, speech translation (ST) models either rely on the possibly-misleading vocal traits of the speaker… ▽ More When translating from notional gender languages (e.g., English) into grammatical gender languages (e.g., Italian), the generated translation requires explicit gender assignments for various words, including those referring to the speaker. When the source sentence does not convey the speaker's gender, speech translation (ST) models either rely on the possibly-misleading vocal traits of the speaker or default to the masculine gender, the most frequent in existing training corpora. To avoid such biased and not inclusive behaviors, the gender assignment of speaker-related expressions should be guided by externally-provided metadata about the speaker's gender. While previous work has shown that the most effective solution is represented by separate, dedicated gender-specific models, the goal of this paper is to achieve the same results by integrating the speaker's gender metadata into a single "multi-gender" neural ST model, easier to maintain. Our experiments demonstrate that a single multi-gender model outperforms gender-specialized ones when trained from scratch (with gender accuracy gains up to 12.9 for feminine forms), while fine-tuning from existing ST models does not lead to competitive results. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: To appear in CLiC-it 2023

arXiv:2310.06590 [pdf, ps, other]

No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition through Pitch Manipulation

Authors: Dennis Fucci, Marco Gaido, Matteo Negri, Mauro Cettolo, Luisa Bentivogli

Abstract: Automatic speech recognition (ASR) systems are known to be sensitive to the sociolinguistic variability of speech data, in which gender plays a crucial role. This can result in disparities in recognition accuracy between male and female speakers, primarily due to the under-representation of the latter group in the training data. While in the context of hybrid ASR models several solutions have been… ▽ More Automatic speech recognition (ASR) systems are known to be sensitive to the sociolinguistic variability of speech data, in which gender plays a crucial role. This can result in disparities in recognition accuracy between male and female speakers, primarily due to the under-representation of the latter group in the training data. While in the context of hybrid ASR models several solutions have been proposed, the gender bias issue has not been explicitly addressed in end-to-end neural architectures. To fill this gap, we propose a data augmentation technique that manipulates the fundamental frequency (f0) and formants. This technique reduces the data unbalance among genders by simulating voices of the under-represented female speakers and increases the variability within each gender group. Experiments on spontaneous English speech show that our technique yields a relative WER improvement up to 9.87% for utterances by female speakers, with larger gains for the least-represented f0 ranges. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: Accepted at ASRU 2023

arXiv:2310.05294 [pdf, other]

Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus

Authors: Andrea Piergentili, Beatrice Savoldi, Dennis Fucci, Matteo Negri, Luisa Bentivogli

Abstract: Gender inequality is embedded in our communication practices and perpetuated in translation technologies. This becomes particularly apparent when translating into grammatical gender languages, where machine translation (MT) often defaults to masculine and stereotypical representations by making undue binary gender assumptions. Our work addresses the rising demand for inclusive language by focusing… ▽ More Gender inequality is embedded in our communication practices and perpetuated in translation technologies. This becomes particularly apparent when translating into grammatical gender languages, where machine translation (MT) often defaults to masculine and stereotypical representations by making undue binary gender assumptions. Our work addresses the rising demand for inclusive language by focusing head-on on gender-neutral translation from English to Italian. We start from the essentials: proposing a dedicated benchmark and exploring automated evaluation methods. First, we introduce GeNTE, a natural, bilingual test set for gender-neutral translation, whose creation was informed by a survey on the perception and use of neutral language. Based on GeNTE, we then overview existing reference-based evaluation approaches, highlight their limits, and propose a reference-free method more suitable to assess gender-neutral translation. △ Less

Submitted 8 October, 2023; originally announced October 2023.

Comments: Accepted at EMNLP 2023

arXiv:2309.10355 [pdf, other]

doi 10.1007/s00766-023-00405-y

Requirements Quality Research: a harmonized Theory, Evaluation, and Roadmap

Authors: Julian Frattini, Lloyd Montgomery, Jannik Fischbach, Daniel Mendez, Davide Fucci, Michael Unterkalmsteiner

Abstract: High-quality requirements minimize the risk of propagating defects to later stages of the software development life cycle. Achieving a sufficient level of quality is a major goal of requirements engineering. This requires a clear definition and understanding of requirements quality. Though recent publications make an effort at disentangling the complex concept of quality, the requirements quality… ▽ More High-quality requirements minimize the risk of propagating defects to later stages of the software development life cycle. Achieving a sufficient level of quality is a major goal of requirements engineering. This requires a clear definition and understanding of requirements quality. Though recent publications make an effort at disentangling the complex concept of quality, the requirements quality research community lacks identity and clear structure which guides advances and puts new findings into an holistic perspective. In this research commentary we contribute (1) a harmonized requirements quality theory organizing its core concepts, (2) an evaluation of the current state of requirements quality research, and (3) a research roadmap to guide advancements in the field. We show that requirements quality research focuses on normative rules and mostly fails to connect requirements quality to its impact on subsequent software development activities, impeding the relevance of the research. Adherence to the proposed requirements quality theory and following the outlined roadmap will be a step towards amending this gap. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: Requirements Eng (2023)

arXiv:2306.16127 [pdf, ps, other]

MLSMM: Machine Learning Security Maturity Model

Authors: Felix Jedrzejewski, Davide Fucci, Oleksandr Adamov

Abstract: Assessing the maturity of security practices during the development of Machine Learning (ML) based software components has not gotten as much attention as traditional software development. In this Blue Sky idea paper, we propose an initial Machine Learning Security Maturity Model (MLSMM) which organizes security practices along the ML-development lifecycle and, for each, establishes three levels o… ▽ More Assessing the maturity of security practices during the development of Machine Learning (ML) based software components has not gotten as much attention as traditional software development. In this Blue Sky idea paper, we propose an initial Machine Learning Security Maturity Model (MLSMM) which organizes security practices along the ML-development lifecycle and, for each, establishes three levels of maturity. We envision MLSMM as a step towards closer collaboration between industry and academia. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: Accepted at AdvML-Frontiers'23 as a Blue Sky Idea

arXiv:2304.10265 [pdf, other]

Replication in Requirements Engineering: the NLP for RE Case

Authors: Sallam Abualhaija, F. BaŞAk Aydemir, Fabiano Dalpiaz, Davide Dell'Anna, Alessio Ferrari, Xavier Franch, Davide Fucci

Abstract: [Context]} Natural language processing (NLP) techniques have been widely applied in the requirements engineering (RE) field to support tasks such as classification and ambiguity detection. Despite its empirical vocation, RE research has given limited attention to replication of NLP for RE studies. Replication is hampered by several factors, including the context specificity of the studies, the het… ▽ More [Context]} Natural language processing (NLP) techniques have been widely applied in the requirements engineering (RE) field to support tasks such as classification and ambiguity detection. Despite its empirical vocation, RE research has given limited attention to replication of NLP for RE studies. Replication is hampered by several factors, including the context specificity of the studies, the heterogeneity of the tasks involving NLP, the tasks' inherent hairiness, and, in turn, the heterogeneous reporting structure. [Solution] To address these issues, we propose a new artifact, referred to as ID-Card, whose goal is to provide a structured summary of research papers emphasizing replication-relevant information. We construct the ID-Card through a structured, iterative process based on design science. [Results] In this paper: (i) we report on hands-on experiences of replication, (ii) we review the state-of-the-art and extract replication-relevant information, (iii) we identify, through focus groups, challenges across two typical dimensions of replication: data annotation and tool reconstruction, and (iv) we present the concept and structure of the ID-Card to mitigate the identified challenges. [Contribution] This study aims to create awareness of replication in NLP for RE. We propose an ID-Card that is intended to foster study replication, but can also be used in other contexts, e.g., for educational purposes. △ Less

Submitted 18 April, 2024; v1 submitted 20 April, 2023; originally announced April 2023.

arXiv:2304.04670 [pdf, other]

Let's Stop Building at the Feet of Giants: Recovering unavailable Requirements Quality Artifacts

Authors: Julian Frattini, Lloyd Montgomery, Davide Fucci, Jannik Fischbach, Michael Unterkalmsteiner, Daniel Mendez

Abstract: Requirements quality literature abounds with publications presenting artifacts, such as data sets and tools. However, recent systematic studies show that more than 80% of these artifacts have become unavailable or were never made public, limiting reproducibility and reusability. In this work, we report on an attempt to recover those artifacts. To that end, we requested corresponding authors of una… ▽ More Requirements quality literature abounds with publications presenting artifacts, such as data sets and tools. However, recent systematic studies show that more than 80% of these artifacts have become unavailable or were never made public, limiting reproducibility and reusability. In this work, we report on an attempt to recover those artifacts. To that end, we requested corresponding authors of unavailable artifacts to recover and disclose them according to open science principles. Our results, based on 19 answers from 35 authors (54% response rate), include an assessment of the availability of requirements quality artifacts and a breakdown of authors' reasons for their continued unavailability. Overall, we improved the availability of seven data sets and seven implementations. △ Less

Submitted 10 April, 2023; originally announced April 2023.

arXiv:2301.10075 [pdf, other]

Gender Neutralization for an Inclusive Machine Translation: from Theoretical Foundations to Open Challenges

Authors: Andrea Piergentili, Dennis Fucci, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri

Abstract: Gender inclusivity in language technologies has become a prominent research topic. In this study, we explore gender-neutral translation (GNT) as a form of gender inclusivity and a goal to be achieved by machine translation (MT) models, which have been found to perpetuate gender bias and discrimination. Specifically, we focus on translation from English into Italian, a language pair representative… ▽ More Gender inclusivity in language technologies has become a prominent research topic. In this study, we explore gender-neutral translation (GNT) as a form of gender inclusivity and a goal to be achieved by machine translation (MT) models, which have been found to perpetuate gender bias and discrimination. Specifically, we focus on translation from English into Italian, a language pair representative of salient gender-related linguistic transfer problems. To define GNT, we review a selection of relevant institutional guidelines for gender-inclusive language, discuss its scenarios of use, and examine the technical challenges of performing GNT in MT, concluding with a discussion of potential solutions to encourage advancements toward greater inclusivity in MT. △ Less

Submitted 4 July, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

Comments: Accepted at the GITT workshop @ EAMT 2023

arXiv:2211.06189 [pdf, other]

doi 10.1016/j.infsof.2023.107201

An initial Theory to Understand and Manage Requirements Engineering Debt in Practice

Authors: Julian Frattini, Davide Fucci, Daniel Mendez, Rodrigo Spinola, Vladimir Mandic, Nebojsa Tausan, Muhammad Ovais Ahmad, Javier Gonzalez-Huerta

Abstract: Context: Advances in technical debt research demonstrate the benefits of applying the financial debt metaphor to support decision-making in software development activities. Although decision-making during requirements engineering has significant consequences, the debt metaphor in requirements engineering is inadequately explored. Objective: We aim to conceptualize how the debt metaphor applies to… ▽ More Context: Advances in technical debt research demonstrate the benefits of applying the financial debt metaphor to support decision-making in software development activities. Although decision-making during requirements engineering has significant consequences, the debt metaphor in requirements engineering is inadequately explored. Objective: We aim to conceptualize how the debt metaphor applies to requirements engineering by organizing concepts related to practitioners' understanding and managing of requirements engineering debt (RED). Method: We conducted two in-depth expert interviews to identify key requirements engineering debt concepts and construct a survey instrument. We surveyed 69 practitioners worldwide regarding their perception of the concepts and developed an initial analytical theory. Results: We propose a RED theory that aligns key concepts from technical debt research but emphasizes the specific nature of requirements engineering. In particular, the theory consists of 23 falsifiable propositions derived from the literature, the interviews, and survey results. Conclusions: The concepts of requirements engineering debt are perceived to be similar to their technical debt counterpart. Nevertheless, measuring and tracking requirements engineering debt are immature in practice. Our proposed theory serves as the first guide toward further research in this area. △ Less

Submitted 8 March, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

Comments: Revised manuscript based on reviews

arXiv:2206.05959 [pdf, other]

A Live Extensible Ontology of Quality Factors for Textual Requirements

Authors: Julian Frattini, Lloyd Montgomery, Jannik Fischbach, Michael Unterkalmsteiner, Daniel Mendez, Davide Fucci

Abstract: Quality factors like passive voice or sentence length are commonly used in research and practice to evaluate the quality of natural language requirements since they indicate defects in requirements artifacts that potentially propagate to later stages in the development life cycle. However, as a research community, we still lack a holistic perspective on quality factors. This inhibits not only a co… ▽ More Quality factors like passive voice or sentence length are commonly used in research and practice to evaluate the quality of natural language requirements since they indicate defects in requirements artifacts that potentially propagate to later stages in the development life cycle. However, as a research community, we still lack a holistic perspective on quality factors. This inhibits not only a comprehensive understanding of the existing body of knowledge but also the effective use and evolution of these factors. To this end, we propose an ontology of quality factors for textual requirements, which includes (1) a structure framing quality factors and related elements and (2) a central repository and web interface making these factors publicly accessible and usable. We contribute the first version of both by applying a rigorous ontology development method to 105 eligible primary studies and construct a first version of the repository and interface. We illustrate the usability of the ontology and invite fellow researchers to a joint community effort to complete and maintain this knowledge repository. We envision our ontology to reflect the community's harmonized perception of requirements quality factors, guide reporting of new quality factors, and provide central access to the current body of knowledge. △ Less

Submitted 13 June, 2022; originally announced June 2022.

Comments: 7 pages, 1 figure, requirements engineering conference

arXiv:2206.04462 [pdf, other]

When Traceability Goes Awry: an Industrial Experience Report

Authors: Davide Fucci, Emil Alégroth, Thomas Axelsson

Abstract: The concept of traceability between artifacts is considered an enabler for software project success. This concept has received plenty of attention from the research community and is by many perceived to always be available in an industrial setting. In this industry-academia collaborative project, a team of researchers, supported by testing practitioners from a large telecommunication company, soug… ▽ More The concept of traceability between artifacts is considered an enabler for software project success. This concept has received plenty of attention from the research community and is by many perceived to always be available in an industrial setting. In this industry-academia collaborative project, a team of researchers, supported by testing practitioners from a large telecommunication company, sought to investigate the partner company's issues related to software quality. However, it was soon identified that the fundamental traceability links between requirements and test cases were missing. This lack of traceability impeded the implementation of a solution to help the company deal with its quality issues. In this experience report, we discuss lessons learned about the practical value of creating and maintaining traceability links in complex industrial settings and provide a cautionary tale for researchers. △ Less

Submitted 9 June, 2022; originally announced June 2022.

arXiv:2205.02629 [pdf, other]

doi 10.18653/v1/2022.iwslt-1.13

Efficient yet Competitive Speech Translation: FBK@IWSLT2022

Authors: Marco Gaido, Sara Papi, Dennis Fucci, Giuseppe Fiameni, Matteo Negri, Marco Turchi

Abstract: The primary goal of this FBK's systems submission to the IWSLT 2022 offline and simultaneous speech translation tasks is to reduce model training costs without sacrificing translation quality. As such, we first question the need of ASR pre-training, showing that it is not essential to achieve competitive results. Second, we focus on data filtering, showing that a simple method that looks at the ra… ▽ More The primary goal of this FBK's systems submission to the IWSLT 2022 offline and simultaneous speech translation tasks is to reduce model training costs without sacrificing translation quality. As such, we first question the need of ASR pre-training, showing that it is not essential to achieve competitive results. Second, we focus on data filtering, showing that a simple method that looks at the ratio between source and target characters yields a quality improvement of 1 BLEU. Third, we compare different methods to reduce the detrimental effect of the audio segmentation mismatch between training data manually segmented at sentence level and inference data that is automatically segmented. Towards the same goal of training cost reduction, we participate in the simultaneous task with the same model trained for offline ST. The effectiveness of our lightweight training strategy is shown by the high score obtained on the MuST-C en-de corpus (26.7 BLEU) and is confirmed in high-resource data conditions by a 1.6 BLEU improvement on the IWSLT2020 test set over last year's winning system. △ Less

Submitted 5 May, 2022; originally announced May 2022.

Comments: IWSLT 2022 System Description

Journal ref: Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

arXiv:2201.05551 [pdf, other]

Cognition in Software Engineering: A Taxonomy and Survey of a Half-Century of Research

Authors: Fabian Fagerholm, Michael Felderer, Davide Fucci, Michael Unterkalmsteiner, Bogdan Marculescu, Markus Martini, Lars Göran Wallgren Tengberg, Robert Feldt, Bettina Lehtelä, Balázs Nagyváradi, Jehan Khattak

Abstract: Cognition plays a fundamental role in most software engineering activities. This article provides a taxonomy of cognitive concepts and a survey of the literature since the beginning of the Software Engineering discipline. The taxonomy comprises the top-level concepts of perception, attention, memory, cognitive load, reasoning, cognitive biases, knowledge, social cognition, cognitive control, and e… ▽ More Cognition plays a fundamental role in most software engineering activities. This article provides a taxonomy of cognitive concepts and a survey of the literature since the beginning of the Software Engineering discipline. The taxonomy comprises the top-level concepts of perception, attention, memory, cognitive load, reasoning, cognitive biases, knowledge, social cognition, cognitive control, and errors, and procedures to assess them both qualitatively and quantitatively. The taxonomy provides a useful tool to filter existing studies, classify new studies, and support researchers in getting familiar with a (sub) area. In the literature survey, we systematically collected and analysed 311 scientific papers spanning five decades and classified them using the cognitive concepts from the taxonomy. Our analysis shows that the most developed areas of research correspond to the four life-cycle stages, software requirements, design, construction, and maintenance. Most research is quantitative and focuses on knowledge, cognitive load, memory, and reasoning. Overall, the state of the art appears fragmented when viewed from the perspective of cognition. There is a lack of use of cognitive concepts that would represent a coherent picture of the cognitive processes active in specific tasks. Accordingly, we discuss the research gap in each cognitive concept and provide recommendations for future research. △ Less

Submitted 14 January, 2022; originally announced January 2022.

arXiv:2108.13059 [pdf, ps, other]

doi 10.1145/3475716.3484191

Vision for an Artefact-based Approach to Regulatory Requirements Engineering

Authors: Oleksandr Kosenkov, Michael Unterkalmsteiner, Daniel Mendez, Davide Fucci

Abstract: Background: Nowadays, regulatory requirements engineering (regulatory RE) faces challenges of interdisciplinary nature that cannot be tackled due to existing research gaps. Aims: We envision an approach to solve some of the challenges related to the nature and complexity of regulatory requirements, the necessity for domain knowledge, and the involvement of legal experts in regulatory RE. Method: W… ▽ More Background: Nowadays, regulatory requirements engineering (regulatory RE) faces challenges of interdisciplinary nature that cannot be tackled due to existing research gaps. Aims: We envision an approach to solve some of the challenges related to the nature and complexity of regulatory requirements, the necessity for domain knowledge, and the involvement of legal experts in regulatory RE. Method: We suggest the qualitative analysis of regulatory texts combined with the further case study to develop an empirical foundation for our research. Results: We outline our vision for the application of extended artefact-based modeling for regulatory RE. Conclusions: Empirical methodology is an essential instrument to address interdisciplinarity and complexity in regulatory RE. Artefact-based modeling supported by empirical results can solve a particular set of problems while not limiting the application of other methods and tools and facilitating the interaction between different fields of practice and research. △ Less

Submitted 30 August, 2021; originally announced August 2021.

arXiv:2108.12411 [pdf, ps, other]

doi 10.1145/3475716.3484273

Towards a Methodology for Participant Selection in Software Engineering Experiments. A Vision of the Future

Authors: Valentina Lenarduzzi, Oscar Dieste, Davide Fucci, Sira Vegas

Abstract: Background. Software Engineering (SE) researchers extensively perform experiments with human subjects. Well-defined samples are required to ensure external validity. Samples are selected \textit{purposely} or by \textit{convenience}, limiting the generalizability of results. Objective. We aim to depict the current status of participants selection in empirical SE, identifying the main threats and h… ▽ More Background. Software Engineering (SE) researchers extensively perform experiments with human subjects. Well-defined samples are required to ensure external validity. Samples are selected \textit{purposely} or by \textit{convenience}, limiting the generalizability of results. Objective. We aim to depict the current status of participants selection in empirical SE, identifying the main threats and how they are mitigated. We draft a robust approach to participants' selection. Method. We reviewed existing participants' selection guidelines in SE, and performed a preliminary literature review to find out how participants' selection is conducted in SE in practice. % and 3) we summarized the main issues identified. Results. We outline a new selection methodology, by 1) defining the characteristics of the desired population, 2) locating possible sources of sampling available for researchers, and 3) identifying and reducing the "distance" between the selected sample and its corresponding population. Conclusion. We propose a roadmap to develop and empirically validate the selection methodology. △ Less

Submitted 27 August, 2021; originally announced August 2021.

arXiv:2105.03312 [pdf, other]

doi 10.1016/j.jss.2021.110937

Studying Test-Driven Development and its Retainment Over a Six-month Time Span

Authors: Maria Teresa Baldassarre, Danilo Caivano, Davide Fucci, Natalia Juristo, Simone Romano, Giuseppe Scanniello, BurakTurhan

Abstract: In this paper, we investigate the effect of TDD, as compared to a non-TDD approach, as well as its retainment (or retention) over a time span of (about) six months. To pursue these objectives, we conducted a (quantitative) longitudinal cohort study with 30 novice developers (i.e., third-year undergraduate students in Computer Science). We observed that TDD affects neither the external quality of s… ▽ More In this paper, we investigate the effect of TDD, as compared to a non-TDD approach, as well as its retainment (or retention) over a time span of (about) six months. To pursue these objectives, we conducted a (quantitative) longitudinal cohort study with 30 novice developers (i.e., third-year undergraduate students in Computer Science). We observed that TDD affects neither the external quality of software products nor developers' productivity. However, we observed that the participants applying TDD produced significantly more tests, with a higher fault-detection capability than those using a non-TDD approach. As for the retainment of TDD, we found that TDD is retained by novice developers for at least six months. △ Less

Submitted 11 May, 2021; v1 submitted 7 May, 2021; originally announced May 2021.

Journal ref: Journal of Systems and Software, Volume 176, 2021, 110937, ISSN 0164-1212

arXiv:2104.02410 [pdf, other]

Using Voice and Biofeedback to Predict User Engagement during Product Feedback Interviews

Authors: Alessio Ferrari, Thaide Huichapa, Paola Spoletini, Nicole Novielli, Davide Fucci, Daniela Girardi

Abstract: Capturing users' engagement is crucial for gathering feedback about the features of a software product. In a market-driven context, current approaches to collect and analyze users' feedback are based on techniques leveraging information extracted from product reviews and social media. These approaches are hardly applicable in bespoke software development, or in contexts in which one needs to gathe… ▽ More Capturing users' engagement is crucial for gathering feedback about the features of a software product. In a market-driven context, current approaches to collect and analyze users' feedback are based on techniques leveraging information extracted from product reviews and social media. These approaches are hardly applicable in bespoke software development, or in contexts in which one needs to gather information from specific users. In such cases, companies need to resort to face-to-face interviews to get feedback on their products. In this paper, we propose to utilize biometric data, in terms of physiological and voice features, to complement interviews with information about the engagement of the user on the discussed product-relevant topics. We evaluate our approach by interviewing users while gathering their physiological data (i.e., biofeedback) using an Empatica E4 wristband, and capturing their voice through the default audio-recorder of a common laptop. Our results show that we can predict users' engagement by training supervised machine learning algorithms on biometric data (F1=0.72), and that voice features alone are sufficiently effective (F1=0.71). Our work contributes with one the first studies in requirements engineering in which biometrics are used to identify emotions. This is also the first study in software engineering that considers voice analysis. The usage of voice features could be particularly helpful for emotion-aware requirements elicitation in remote communication, either performed by human analysts or voice-based chatbots, and can also be exploited to support the analysis of meetings in software engineering research. △ Less

Submitted 1 July, 2024; v1 submitted 6 April, 2021; originally announced April 2021.

Comments: This paper contains updated experimental results with respect to the initial version

MSC Class: 68N30 ACM Class: D.2.1; D.2.2

arXiv:2011.11942 [pdf, other]

A Family of Experiments on Test-Driven Development

Authors: Adrian Santos, Sira Vegas, Oscar Dieste, Fernando Uyaguari, Aysee Tosun, Davide Fucci, Burak Turhan, Giuseppe Scanniello, Simone Romano, Itir Karac, Marco Kuhrmann, Vladimir Mandic, Robert Ramac, Dietmar Pfahl, Christian Engblom, Jarno Kyykka, Kerli Rungi, Carolina Palomeque, Jaroslav Spisak, Markku Oivo, Natalia Juristo

Abstract: Context: Test-driven development (TDD) is an agile software development approach that has been widely claimed to improve software quality. However, the extent to which TDD improves quality appears to be largely dependent upon the characteristics of the study in which it is evaluated (e.g., the research method, participant type, programming environment, etc.). The particularities of each study make… ▽ More Context: Test-driven development (TDD) is an agile software development approach that has been widely claimed to improve software quality. However, the extent to which TDD improves quality appears to be largely dependent upon the characteristics of the study in which it is evaluated (e.g., the research method, participant type, programming environment, etc.). The particularities of each study make the aggregation of results untenable. Objectives: The goal of this paper is to: increase the accuracy and generalizability of the results achieved in isolated experiments on TDD, provide joint conclusions on the performance of TDD across different industrial and academic settings, and assess the extent to which the characteristics of the experiments affect the quality-related performance of TDD. Method: We conduct a family of 12 experiments on TDD in academia and industry. We aggregate their results by means of meta-analysis. We perform exploratory analyses to identify variables impacting the quality-related performance of TDD. Results: TDD novices achieve a slightly higher code quality with iterative test-last development (i.e., ITL, the reverse approach of TDD) than with TDD. The task being developed largely determines quality. The programming environment, the order in which TDD and ITL are applied, or the learning effects from one development approach to another do not appear to affect quality. The quality-related performance of professionals using TDD drops more than for students. We hypothesize that this may be due to their being more resistant to change and potentially less motivated than students. Conclusion: Previous studies seem to provide conflicting results on TDD performance (i.e., positive vs. negative, respectively). We hypothesize that these conflicting results may be due to different study durations, experiment participants being unfamiliar with the TDD process... △ Less

Submitted 24 November, 2020; originally announced November 2020.

arXiv:2009.01722 [pdf, other]

What Makes Agile Test Artifacts Useful? An Activity-Based Quality Model from a Practitioners' Perspective

Authors: Jannik Fischbach, Henning Femmer, Daniel Mendez, Davide Fucci, Andreas Vogelsang

Abstract: Background: The artifacts used in Agile software testing and the reasons why these artifacts are used are fairly well-understood. However, empirical research on how Agile test artifacts are eventually designed in practice and which quality factors make them useful for software testing remains sparse. Aims: Our objective is two-fold. First, we identify current challenges in using test artifacts to… ▽ More Background: The artifacts used in Agile software testing and the reasons why these artifacts are used are fairly well-understood. However, empirical research on how Agile test artifacts are eventually designed in practice and which quality factors make them useful for software testing remains sparse. Aims: Our objective is two-fold. First, we identify current challenges in using test artifacts to understand why certain quality factors are considered good or bad. Second, we build an Activity-Based Artifact Quality Model that describes what Agile test artifacts should look like. Method: We conduct an industrial survey with 18 practitioners from 12 companies operating in seven different domains. Results: Our analysis reveals nine challenges and 16 factors describing the quality of six test artifacts from the perspective of Agile testers. Interestingly, we observed mostly challenges regarding language and traceability, which are well-known to occur in non-Agile projects. Conclusions: Although Agile software testing is becoming the norm, we still have little confidence about general do's and don'ts going beyond conventional wisdom. This study is the first to distill a list of quality factors deemed important to what can be considered as useful test artifacts. △ Less

Submitted 3 September, 2020; originally announced September 2020.

arXiv:2008.12528 [pdf, ps, other]

Researcher Bias in Software Engineering Experiments: a Qualitative Investigation

Authors: Simone Romano, Davide Fucci, Giuseppe Scanniello, Maria Teresa Baldassarre, Burak Turhan, Natalia Juristo

Abstract: Researcher Bias (RB) occurs when researchers influence the results of an empirical study based on their expectations.RB might be due to the use of Questionable Research Practices(QRPs). In research fields like medicine, blinding techniques have been applied to counteract RB. We conducted an explorative qualitative survey to investigate RB in Software Engineering (SE)experiments, with respect to (i… ▽ More Researcher Bias (RB) occurs when researchers influence the results of an empirical study based on their expectations.RB might be due to the use of Questionable Research Practices(QRPs). In research fields like medicine, blinding techniques have been applied to counteract RB. We conducted an explorative qualitative survey to investigate RB in Software Engineering (SE)experiments, with respect to (i) QRPs potentially leading to RB, (ii) causes behind RB, and (iii) possible actions to counteract including blinding techniques. Data collection was based on semi-structured interviews. We interviewed nine active experts in the empirical SE community. We then analyzed the transcripts of these interviews through thematic analysis. We found that some QRPs are acceptable in certain cases. Also, it appears that the presence of RB is perceived in SE and, to counteract RB, a number of solutions have been highlighted: some are intended for SE researchers and others for the boards of SE research outlets. △ Less

Submitted 28 August, 2020; originally announced August 2020.

Comments: Published at SEAA2020

arXiv:2007.09863 [pdf, other]

doi 10.1145/3382494.3410687

Why Research on Test-Driven Development is Inconclusive?

Authors: Mohammad Ghafari, Timm Gross, Davide Fucci, Michael Felderer

Abstract: [Background] Recent investigations into the effects of Test-Driven Development (TDD) have been contradictory and inconclusive. This hinders development teams to use research results as the basis for deciding whether and how to apply TDD. [Aim] To support researchers when designing a new study and to increase the applicability of TDD research in the decision-making process in the industrial context… ▽ More [Background] Recent investigations into the effects of Test-Driven Development (TDD) have been contradictory and inconclusive. This hinders development teams to use research results as the basis for deciding whether and how to apply TDD. [Aim] To support researchers when designing a new study and to increase the applicability of TDD research in the decision-making process in the industrial context, we aim at identifying the reasons behind the inconclusive research results in TDD. [Method] We studied the state of the art in TDD research published in top venues in the past decade, and analyzed the way these studies were set up. [Results] We identified five categories of factors that directly impact the outcome of studies on TDD. [Conclusions] This work can help researchers to conduct more reliable studies, and inform practitioners of risks they need to consider when consulting research on TDD. △ Less

Submitted 19 July, 2020; originally announced July 2020.

Comments: ESEM '20: ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), October 8--9, 2020, Bari, Italy

arXiv:2004.07524 [pdf, other]

Results from a replicated experiment on the affective reactions of novice developers when applying test-driven development

Authors: Simone Romano, Giuseppe Scanniello, Maria Teresa Baldassarre, Davide Fucci, Danilo Caivano

Abstract: Test-driven Development (TDD) is an incremental approach to software development. Despite it is claimed to improve both quality of software and developers' productivity, the research on the claimed effects of TDD has so far shown inconclusive results. Some researchers have ascribed these inconclusive results to the negative affective states that TDD would provoke. A previous (baseline) experiment… ▽ More Test-driven Development (TDD) is an incremental approach to software development. Despite it is claimed to improve both quality of software and developers' productivity, the research on the claimed effects of TDD has so far shown inconclusive results. Some researchers have ascribed these inconclusive results to the negative affective states that TDD would provoke. A previous (baseline) experiment has, therefore, studied the affective reactions of (novice) developers---i.e., 29 third-year undergraduates in Computer Science (CS)---when practicing TDD to implement software. To validate the results of the baseline experiment, we conducted a replicated experiment that studies the affective reactions of novice developers when applying TDD to develop software. Developers in the treatment group carried out a development task using TDD, while those in the control group used a non-TDD approach. To measure the affective reactions of developers, we used the Self-Assessment Manikin instrument complemented with a liking dimension. The most important differences between the baseline and replicated experiments are: (i) the kind of novice developers involved in the experiments---third-year vs. second-year undergraduates in CS from two different universities; and (ii) their number---29 vs. 59. The results of the replicated experiment do not show any difference in the affective reactions of novice developers. Instead, the results of the baseline experiment suggest that developers seem to like TDD less as compared to a non-TDD approach and that developers following TDD seem to like implementing code less than the other developers, while testing code seems to make them less happy. △ Less

Submitted 16 April, 2020; originally announced April 2020.

Comments: XP2020

arXiv:2001.09177 [pdf, other]

doi 10.1145/3377811.3380374

Recognizing Developers' Emotions while Programming

Authors: Daniela Girardi, Nicole Novielli, Davide Fucci, Filippo Lanubile

Abstract: Developers experience a wide range of emotions during programming tasks, which may have an impact on job performance. In this paper, we present an empirical study aimed at (i) investigating the link between emotion and progress, (ii) understanding the triggers for developers' emotions and the strategies to deal with negative ones, (iii) identifying the minimal set of non-invasive biometric sensors… ▽ More Developers experience a wide range of emotions during programming tasks, which may have an impact on job performance. In this paper, we present an empirical study aimed at (i) investigating the link between emotion and progress, (ii) understanding the triggers for developers' emotions and the strategies to deal with negative ones, (iii) identifying the minimal set of non-invasive biometric sensors for emotion recognition during programming task. Results confirm previous findings about the relation between emotions and perceived productivity. Furthermore, we show that developers' emotions can be reliably recognized using only a wristband capturing the electrodermal activity and heart-related metrics. △ Less

Submitted 6 May, 2021; v1 submitted 24 January, 2020; originally announced January 2020.

Comments: Accepted for publication at ICSE2020 Technical Track

Journal ref: In Proceedings of the 42nd International Conference on Software Engineering, Seoul, Republic of Korea, May 23-29, 2020 (ICSE '20),12 pages

arXiv:1908.05781 [pdf, other]

doi 10.1103/PhysRevA.100.062101

Tripartite realism-based quantum nonlocality

Authors: D. M. Fucci, R. M. Angelo

Abstract: From an operational criterion of physical reality, a quantifier of realism-based nonlocality was recently introduced for two-part quantum states. This measure has shown to capture aspects that are rather different from Bell nonlocality. Here we take a step further and introduce a tripartite realism-based nonlocality quantifier. We show that this measure reduces to genuine tripartite entanglement f… ▽ More From an operational criterion of physical reality, a quantifier of realism-based nonlocality was recently introduced for two-part quantum states. This measure has shown to capture aspects that are rather different from Bell nonlocality. Here we take a step further and introduce a tripartite realism-based nonlocality quantifier. We show that this measure reduces to genuine tripartite entanglement for a certain class of pure tripartite states and manifests itself in correlated mixed states even in the absence of quantum correlations. A case study for noisy GHZ and W states points out the existence of scenarios where the realism-based nonlocality is monogamous. △ Less

Submitted 15 August, 2019; originally announced August 2019.

Comments: 5 pages, 2 figures

Journal ref: Phys. Rev. A 100, 062101 (2019)

arXiv:1907.12290 [pdf, other]

An Empirical Assessment on Affective Reactions of Novice Developers when Applying Test-Driven Development

Authors: Simone Romano, Davide Fucci, Maria Teresa Baldassarre, Danilo Caivano, Giuseppe Scanniello

Abstract: We study whether and in which phase Test-Driven Development (TDD) influences affective states of novice developers in terms of pleasure, arousal, dominance, and liking. We performed a controlled experiment with 29 novice developers. Developers in the treatment group performed a development task using TDD, whereas those in the control group used a non-TDD development approach. We compared the affec… ▽ More We study whether and in which phase Test-Driven Development (TDD) influences affective states of novice developers in terms of pleasure, arousal, dominance, and liking. We performed a controlled experiment with 29 novice developers. Developers in the treatment group performed a development task using TDD, whereas those in the control group used a non-TDD development approach. We compared the affective reactions to the development approaches, as well as to the implementation and testing phases, exploiting a lightweight, powerful, and widely used tool, i.e., Self-Assessment Manikin. We observed that there is a difference between the two development approaches in terms of affective reactions. Therefore, it seems that affective reactions play an important role when applying TDD and their investigation could help researchers to better understand such a development approach △ Less

Submitted 29 July, 2019; originally announced July 2019.

Comments: Accepted for publication at the 20th International Conference on Product-Focused Software Process Improvement (PROFES19)

arXiv:1907.10887 [pdf, other]

Towards an Holistic Definition of Requirements Debt

Authors: Valentina Lenarduzzi, Davide Fucci

Abstract: When not appropriately managed, technical debt is considered to have negative effects on the long term success of a software project. However, how the debt metaphor applies to requirements engineering in general, and to requirements engineering activities in particular, is not well understood. Grounded in the existing literature, we present a holistic definition of requirements debt which include… ▽ More When not appropriately managed, technical debt is considered to have negative effects on the long term success of a software project. However, how the debt metaphor applies to requirements engineering in general, and to requirements engineering activities in particular, is not well understood. Grounded in the existing literature, we present a holistic definition of requirements debt which include debt incurred during the identification, formalization, and implementation of requirements. We outline future assessment to validate and further refine our proposed definition. This conceptualization is the first step towards a requirements debt monitoring framework to support stakeholders decisions, such as when to incur and eventually pay back requirements debt, and at what costs △ Less

Submitted 25 July, 2019; originally announced July 2019.

Journal ref: ESEM2019 Vision paper track

arXiv:1907.09807 [pdf, other]

doi 10.1145/3338906.3338943

On Using Machine Learning to Identify Knowledge in API Reference Documentation

Authors: Davide Fucci, Alireza Mollaalizadehbahnemiri, Walid Maalej

Abstract: Using API reference documentation like JavaDoc is an integral part of software development. Previous research introduced a grounded taxonomy that organizes API documentation knowledge in 12 types, including knowledge about the Functionality, Structure, and Quality of an API. We study how well modern text classification approaches can automatically identify documentation containing specific knowled… ▽ More Using API reference documentation like JavaDoc is an integral part of software development. Previous research introduced a grounded taxonomy that organizes API documentation knowledge in 12 types, including knowledge about the Functionality, Structure, and Quality of an API. We study how well modern text classification approaches can automatically identify documentation containing specific knowledge types. We compared conventional machine learning (k-NN and SVM) and deep learning approaches trained on manually annotated Java and .NET API documentation (n = 5,574). When classifying the knowledge types individually (i.e., multiple binary classifiers) the best AUPRC was up to 87%. The deep learning and SVM classifiers seem complementary. For four knowledge types (Concept, Control, Pattern, and Non-Information), SVM clearly outperforms deep learning which, on the other hand, is more accurate for identifying the remaining types. When considering multiple knowledge types at once (i.e., multi-label classification) deep learning outperforms naïve baselines and traditional machine learning achieving a MacroAUC up to 79%. We also compared classifiers using embeddings pre-trained on generic text corpora and StackOverflow but did not observe significant improvements. Finally, to assess the generalizability of the classifiers, we re-tested them on a different, unseen Python documentation dataset. Classifiers for Functionality, Concept, Purpose, Pattern, and Directive seem to generalize from Java and .NET to Python documentation. The accuracy related to the remaining types seems API-specific. We discuss our results and how they inform the development of tools for supporting developers sharing and accessing API knowledge. Published article: https://doi.org/10.1145/3338906.3338943 △ Less

Submitted 23 July, 2019; originally announced July 2019.

Journal ref: ESEC/FSE2019

arXiv:1903.03426 [pdf, other]

A Replication Study on Code Comprehension and Expertise using Lightweight Biometric Sensors

Authors: Davide Fucci, Daniela Girardi, Nicole Novielli, Luigi Quaranta, Filippo Lanubile

Abstract: Code comprehension has been recently investigated from physiological and cognitive perspectives through the use of medical imaging. Floyd et al (i.e., the original study) used fMRI to classify the type of comprehension tasks performed by developers and relate such results to their expertise. We replicate the original study using lightweight biometrics sensors which participants (28 undergrads in c… ▽ More Code comprehension has been recently investigated from physiological and cognitive perspectives through the use of medical imaging. Floyd et al (i.e., the original study) used fMRI to classify the type of comprehension tasks performed by developers and relate such results to their expertise. We replicate the original study using lightweight biometrics sensors which participants (28 undergrads in computer science) wore when performing comprehension tasks on source code and natural language prose. We developed machine learning models to automatically identify what kind of tasks developers are working on leveraging their brain-, heart-, and skin-related signals. The best improvement over the original study performance is achieved using solely the heart signal obtained through a single device (BAC 87% vs. 79.1%). Differently from the original study, we were not able to observe a correlation between the participants' expertise and the classifier performance (tau = 0.16, p = 0.31). Our findings show that lightweight biometric sensors can be used to accurately recognize comprehension tasks opening interesting scenarios for research and practice. △ Less

Submitted 2 April, 2019; v1 submitted 8 March, 2019; originally announced March 2019.

Comments: Author version submitted to ICPC2019 (Replication track)

arXiv:1808.02284 [pdf, other]

Needs and Challenges for a Platform to Support Large-scale Requirements Engineering. A Multiple Case Study

Authors: Davide Fucci, Cristina Palomares, Dolors Costal, Xavier Franch, Mikko Raatikainen, Martin Stettinger, Zijad Kurtanovic, Tero Kojo, Lars Koenig, Andreas Falkner, Gottfried Schenner, Fabrizio Brasca, Tomi Männistö, Alexander Felfernig, Walid Maalej

Abstract: Background: Requirement engineering is often considered a critical activity in system development projects. The increasing complexity of software, as well as number and heterogeneity of stakeholders, motivate the development of methods and tools for improving large-scale requirement engineering. Aims: The empirical study presented in this paper aims to identify and understand the characteristics a… ▽ More Background: Requirement engineering is often considered a critical activity in system development projects. The increasing complexity of software, as well as number and heterogeneity of stakeholders, motivate the development of methods and tools for improving large-scale requirement engineering. Aims: The empirical study presented in this paper aims to identify and understand the characteristics and challenges of a platform, as desired by experts, to support requirement engineering for individual stakeholders, based on the current pain-points of their organizations when dealing with a large number requirements. Method: We conducted a multiple case study with three companies in different domains. We collected data through ten semi-structured interviews with experts from these companies. Results: The main pain-point for stakeholders is handling the vast amount of data from different sources. The foreseen platform should leverage such data to manage changes in requirements according to customers' and users' preferences. It should also offer stakeholders an estimation of how long a requirements engineering task will take to complete, along with an easier requirements dependency identification and requirements reuse strategy. Conclusions: The findings provide empirical evidence about how practitioners wish to improve their requirement engineering processes and tools. The insights are a starting point for in-depth investigations into the problems and solutions presented. Practitioners can use the results to improve existing or design new practices and tools. △ Less

Submitted 6 September, 2018; v1 submitted 7 August, 2018; originally announced August 2018.

Comments: Accepted for publication to the 12th International Symposium on Empirical Software Engineering and Measurement (ESEM18)

arXiv:1807.04100 [pdf, other]

The Effect of Noise on Sofware Engineers' Performance

Authors: Simone Romano, Giuseppe Scanniello, Davide Fucci, Natalia Juristo, Burak Turhan

Abstract: Background: Noise, defined as an unwanted sound, is one of the commonest factors that could affect people's performance in their daily work activities. The software engineering research community has marginally investigated the effects of noise on software engineers' performance. Aims: We studied if noise affects software engineers' performance in (i) comprehending functional requirements and (ii)… ▽ More Background: Noise, defined as an unwanted sound, is one of the commonest factors that could affect people's performance in their daily work activities. The software engineering research community has marginally investigated the effects of noise on software engineers' performance. Aims: We studied if noise affects software engineers' performance in (i) comprehending functional requirements and (ii) fixing faults in the source code. Method: We conducted two experiments with final-year undergraduate students in Computer Science. In the first experiment, we asked 55 students to comprehend functional requirements exposing them or not to noise, while in the second experiment 42 students were asked to fix faults in Java code. Results: The participants in the second experiment, when exposed to noise, had significantly worse performance in fixing faults in the source code. On the other hand, we did not observe any statistically significant difference in the first experiment. Conclusions: Fixing faults in source code seems to be more vulnerable to noise than comprehending functional requirements. △ Less

Submitted 11 July, 2018; originally announced July 2018.

Comments: ESEM18, Oulu (Finland), October 2018

arXiv:1807.02971 [pdf, other]

A Longitudinal Cohort Study on the Retainment of Test-Driven Development

Authors: Davide Fucci, Simone Romano, Maria Teresa Baldassarre, Danilo Caivano, Giuseppe Scanniello, Burak Thuran, Natalia Juristo

Abstract: Background: Test-Driven Development (TDD) is an agile software development practice, which is claimed to boost both external quality of software products and developers' productivity. Aims: We want to study (i) the TDD effects on the external quality of software products as well as the developers' productivity, and (ii) the retainment of TDD over a period of five months. Method: We conducted a (qu… ▽ More Background: Test-Driven Development (TDD) is an agile software development practice, which is claimed to boost both external quality of software products and developers' productivity. Aims: We want to study (i) the TDD effects on the external quality of software products as well as the developers' productivity, and (ii) the retainment of TDD over a period of five months. Method: We conducted a (quantitative) longitudinal cohort study with 30 third year undergraduate students in Computer Science at the University of Bari in Italy. Results: The use of TDD has a statistically significant effect neither on the external quality of software products nor on the developers' productivity. However, we observed that participants using TDD produced significantly more tests than those applying a non-TDD development process and that the retainment of TDD is particularly noticeable in the amount of tests written. Conclusions: Our results should encourage software companies to adopt TDD because who practices TDD tends to write more tests---having more tests can come in handy when testing software systems or localizing faults---and it seems that novice developers retain TDD. △ Less

Submitted 9 July, 2018; originally announced July 2018.

Comments: ESEM, October 2018, Oulu, Finland

arXiv:1806.02592 [pdf, other]

A Simple NLP-based Approach to Support Onboarding and Retention in Open Source Communities

Authors: Christoph Stanik, Lloyd Montgomery, Daniel Martens, Davide Fucci, Walid Maalej

Abstract: Successful open source communities are constantly looking for new members and hel** them become active developers. A common approach for developer onboarding in open source projects is to let newcomers focus on relevant yet easy-to-solve issues to familiarize themselves with the code and the community. The goal of this research is twofold. First, we aim at automatically identifying issues that n… ▽ More Successful open source communities are constantly looking for new members and hel** them become active developers. A common approach for developer onboarding in open source projects is to let newcomers focus on relevant yet easy-to-solve issues to familiarize themselves with the code and the community. The goal of this research is twofold. First, we aim at automatically identifying issues that newcomers can resolve by analyzing the history of resolved issues by simply using the title and description of issues. Second, we aim at automatically identifying issues, that can be resolved by newcomers who later become active developers. We mined the issue trackers of three large open source projects and extracted natural language features from the title and description of resolved issues. In a series of experiments, we optimized and compared the accuracy of four supervised classifiers to address our research goals. Random Forest, achieved up to 91% precision (F1-score 72%) towards the first goal while for the second goal, Decision Tree achieved a precision of 92% (F1-score 91%). A qualitative evaluation gave insights on what information in the issue description is helpful for newcomers. Our approach can be used to automatically identify, label, and recommend issues for newcomers in open source software projects based only on the text of the issues. △ Less

Submitted 16 August, 2018; v1 submitted 7 June, 2018; originally announced June 2018.

arXiv:1805.02544 [pdf, other]

Need for Sleep: the Impact of a Night of Sleep Deprivation on Novice Developers' Performance

Authors: Davide Fucci, Giuseppe Scanniello, Simone Romano, Natalia Juristo

Abstract: We present a quasi-experiment to investigate whether, and to what extent, sleep deprivation impacts the performance of novice software developers using the agile practice of test-first development (TFD). We recruited 45 undergraduates and asked them to tackle a programming task. Among the participants, 23 agreed to stay awake the night before carrying out the task, while 22 slept usually. We analy… ▽ More We present a quasi-experiment to investigate whether, and to what extent, sleep deprivation impacts the performance of novice software developers using the agile practice of test-first development (TFD). We recruited 45 undergraduates and asked them to tackle a programming task. Among the participants, 23 agreed to stay awake the night before carrying out the task, while 22 slept usually. We analyzed the quality (i.e., the functional correctness) of the implementations delivered by the participants in both groups, their engagement in writing source code (i.e., the amount of activities performed in the IDE while tackling the programming task) and ability to apply TFD (i.e., the extent to which a participant can use this practice). By comparing the two groups of participants, we found that a single night of sleep deprivation leads to a reduction of 50% in the quality of the implementations. There is important evidence that the developers' engagement and their prowess to apply TFD are negatively impacted. Our results also show that sleep-deprived developers make more fixes to syntactic mistakes in the source code. We conclude that sleep deprivation has possibly disruptive effects on software development activities. The results open opportunities for improving developers' performance by integrating the study of sleep with other psycho-physiological factors in which the software engineering research community has recently taken an interest in. △ Less

Submitted 7 May, 2018; originally announced May 2018.

Comments: Accepted to IEEE Transactions on Software Engineering

arXiv:1803.10587 [pdf]

A First Implementation of a Design Thinking Workshop During a Mobile App Development Project Course

Authors: Yen Dieu Pham, Davide Fucci, Walid Maalej

Abstract: Due to their characteristics, millennials prefer learning-by-doing and social learning, such as project-based learning. However, software development projects require not only technical skills but also creativity; Design Thinking can serve such purpose. We conducted a workshop following the Design Thinking approach of the d.school, to help students generating ideas for a mobile app development pro… ▽ More Due to their characteristics, millennials prefer learning-by-doing and social learning, such as project-based learning. However, software development projects require not only technical skills but also creativity; Design Thinking can serve such purpose. We conducted a workshop following the Design Thinking approach of the d.school, to help students generating ideas for a mobile app development project course. On top of the details for implementing the workshop, we report our observations, lessons learned, and provide suggestions for further implementation. △ Less

Submitted 28 March, 2018; originally announced March 2018.

Comments: Second IEEE/ACM International Workshop on Software Engineering Education for Millennials

arXiv:1707.08824 [pdf, other]

doi 10.1145/3121257.3121260

Find, Understand, and Extend Development Screencasts on YouTube

Authors: Mathias Ellmann, Alexander Oeser, Davide Fucci, Walid Maalej

Abstract: A software development screencast is a video that captures the screen of a developer working on a particular task while explaining its implementation details. Due to the increased popularity of software development screencasts (e.g., available on YouTube), we study how and to what extent they can be used as additional source of knowledge to answer developer's questions about, for example, the use… ▽ More A software development screencast is a video that captures the screen of a developer working on a particular task while explaining its implementation details. Due to the increased popularity of software development screencasts (e.g., available on YouTube), we study how and to what extent they can be used as additional source of knowledge to answer developer's questions about, for example, the use of a specific API. We first differentiate between development and other types of screencasts using video frame analysis. By using the Cosine algorithm, developers can expect ten development screencasts in the top 20 out of 100 different YouTube videos. We then extracted popular development topics on which screencasts are reporting on YouTube: database operations, system set-up, plug-in development, game development, and testing. Besides, we found six recurring tasks performed in development screencasts, such as object usage and UI operations. Finally, we conducted a similarity analysis by considering only the spoken words (i.e., the screencast transcripts but not the text that might appear in a scene) to link API documents, such as the Javadoc, to the appropriate screencasts. By using Cosine similarity, we identified 38 relevant documents in the top 20 out of 9455 API documents. △ Less

Submitted 27 July, 2017; originally announced July 2017.

arXiv:1703.01078 [pdf, ps, other]

On the Presence of Green and Sustainable Software Engineering in Higher Education Curricula

Authors: Damiano Torre, Giuseppe Procaccianti, Davide Fucci, Sonja Lutovac, Giuseppe Scanniello

Abstract: Nowadays, software is pervasive in our everyday lives. Its sustainability and environmental impact have become major factors to be considered in the development of software systems. Millennials-the newer generation of university students-are particularly keen to learn about and contribute to a more sustainable and green society. The need for training on green and sustainable topics in software eng… ▽ More Nowadays, software is pervasive in our everyday lives. Its sustainability and environmental impact have become major factors to be considered in the development of software systems. Millennials-the newer generation of university students-are particularly keen to learn about and contribute to a more sustainable and green society. The need for training on green and sustainable topics in software engineering has been reflected in a number of recent studies. The goal of this paper is to get a first understanding of what is the current state of teaching sustainability in the software engineering community, what are the motivations behind the current state of teaching, and what can be done to improve it. To this end, we report the findings from a targeted survey of 33 academics on the presence of green and sustainable software engineering in higher education. The major findings from the collected data suggest that sustainability is under-represented in the curricula, while the current focus of teaching is on energy efficiency delivered through a fact-based approach. The reasons vary from lack of awareness, teaching material and suitable technologies, to the high effort required to teach sustainability. Finally, we provide recommendations for educators willing to teach sustainability in software engineering that can help to suit millennial students needs. △ Less

Submitted 3 March, 2017; originally announced March 2017.

Comments: The paper will be presented at the 1st International Workshop on Software Engineering Curricula for Millennials (SECM2017)

arXiv:1611.05994 [pdf, other]

doi 10.1109/TSE.2016.2616877

A Dissection of the Test-Driven Development Process: Does It Really Matter to Test-First or to Test-Last?

Authors: Davide Fucci, Hakan Erdogmus, Burak Turhan, Markku Oivo, Natalia Juristo

Abstract: Background: Test-driven development (TDD) is a technique that repeats short coding cycles interleaved with testing. The developer first writes a unit test for the desired functionality, followed by the necessary production code, and refactors the code. Many empirical studies neglect unique process characteristics related to TDD iterative nature. Aim: We formulate four process characteristic: seque… ▽ More Background: Test-driven development (TDD) is a technique that repeats short coding cycles interleaved with testing. The developer first writes a unit test for the desired functionality, followed by the necessary production code, and refactors the code. Many empirical studies neglect unique process characteristics related to TDD iterative nature. Aim: We formulate four process characteristic: sequencing, granularity, uniformity, and refactoring effort. We investigate how these characteristics impact quality and productivity in TDD and related variations. Method: We analyzed 82 data points collected from 39 professionals, each capturing the process used while performing a specific development task. We built regression models to assess the impact of process characteristics on quality and productivity. Quality was measured by functional correctness. Result: Quality and productivity improvements were primarily positively associated with the granularity and uniformity. Sequencing, the order in which test and production code are written, had no important influence. Refactoring effort was negatively associated with both outcomes. We explain the unexpected negative correlation with quality by possible prevalence of mixed refactoring. Conclusion: The claimed benefits of TDD may not be due to its distinctive test-first dynamic, but rather due to the fact that TDD-like processes encourage fine-grained, steady steps that improve focus and flow. △ Less

Submitted 18 November, 2016; originally announced November 2016.

Showing 1–49 of 49 results for author: Fucci, D