-
Assessing the Use of AutoML for Data-Driven Software Engineering
Authors:
Fabio Calefato,
Luigi Quaranta,
Filippo Lanubile,
Marcos Kalinowski
Abstract:
Background. Due to the widespread adoption of Artificial Intelligence (AI) and Machine Learning (ML) for building software applications, companies are struggling to recruit employees with a deep understanding of such technologies. In this scenario, AutoML is soaring as a promising solution to fill the AI/ML skills gap since it promises to automate the building of end-to-end AI/ML pipelines that wo…
▽ More
Background. Due to the widespread adoption of Artificial Intelligence (AI) and Machine Learning (ML) for building software applications, companies are struggling to recruit employees with a deep understanding of such technologies. In this scenario, AutoML is soaring as a promising solution to fill the AI/ML skills gap since it promises to automate the building of end-to-end AI/ML pipelines that would normally be engineered by specialized team members. Aims. Despite the growing interest and high expectations, there is a dearth of information about the extent to which AutoML is currently adopted by teams develo** AI/ML-enabled systems and how it is perceived by practitioners and researchers. Method. To fill these gaps, in this paper, we present a mixed-method study comprising a benchmark of 12 end-to-end AutoML tools on two SE datasets and a user survey with follow-up interviews to further our understanding of AutoML adoption and perception. Results. We found that AutoML solutions can generate models that outperform those trained and optimized by researchers to perform classification tasks in the SE domain. Also, our findings show that the currently available AutoML solutions do not live up to their names as they do not equally support automation across the stages of the ML development workflow and for all the team members. Conclusions. We derive insights to inform the SE research community on how AutoML can facilitate their activities and tool builders on how to design the next generation of AutoML technologies.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
A Lot of Talk and a Badge: An Exploratory Analysis of Personal Achievements in GitHub
Authors:
Fabio Calefato,
Luigi Quaranta,
Filippo Lanubile
Abstract:
Context. GitHub has introduced a new gamification element through personal achievements, whereby badges are unlocked and displayed on developers' personal profile pages in recognition of their development activities. Objective. In this paper, we present an exploratory analysis using mixed methods to study the diffusion of personal badges in GitHub, in addition to the effects and reactions to their…
▽ More
Context. GitHub has introduced a new gamification element through personal achievements, whereby badges are unlocked and displayed on developers' personal profile pages in recognition of their development activities. Objective. In this paper, we present an exploratory analysis using mixed methods to study the diffusion of personal badges in GitHub, in addition to the effects and reactions to their introduction. Method. First, we conduct an observational study by mining longitudinal data from more than 6,000 developers and performed correlation and regression analysis. Then, we conduct a survey and analyze over 300 GitHub community discussions on the topic of personal badges to gauge how the community responded to the introduction of the new feature. Results. We find that most of the developers sampled own at least a badge, but we also observe an increasing number of users who choose to keep their profile private and opt out of displaying badges. Besides, badges are generally poorly correlated with developers' qualities and dispositions such as timeliness and desire to collaborate. We also find that, except for the Starstruck badge (reflecting the number of followers), their introduction does not have an effect. Finally, the reaction of the community has been in general mixed, as developers find them appealing in principle but without a clear purpose and hardly reflecting their abilities in the current form. Conclusions. We provide recommendations to GitHub platform designers on how to improve the current implementation of personal badges as both a gamification mechanism and as sources of reliable cues of ability for developers' assessment
△ Less
Submitted 2 February, 2024; v1 submitted 26 March, 2023;
originally announced March 2023.
-
Teaching MLOps in Higher Education through Project-Based Learning
Authors:
Filippo Lanubile,
Silverio Martínez-Fernández,
Luigi Quaranta
Abstract:
Building and maintaining production-grade ML-enabled components is a complex endeavor that goes beyond the current approach of academic education, focused on the optimization of ML model performance in the lab. In this paper, we present a project-based learning approach to teaching MLOps, focused on the demonstration and experience with emerging practices and tools to automatize the construction o…
▽ More
Building and maintaining production-grade ML-enabled components is a complex endeavor that goes beyond the current approach of academic education, focused on the optimization of ML model performance in the lab. In this paper, we present a project-based learning approach to teaching MLOps, focused on the demonstration and experience with emerging practices and tools to automatize the construction of ML-enabled components. We examine the design of a course based on this approach, including laboratory sessions that cover the end-to-end ML component life cycle, from model building to production deployment. Moreover, we report on preliminary results from the first edition of the course. During the present year, an updated version of the same course is being delivered in two independent universities; the related learning outcomes will be evaluated to analyze the effectiveness of project-based learning for this specific subject.
△ Less
Submitted 2 February, 2023;
originally announced February 2023.
-
A Preliminary Investigation of MLOps Practices in GitHub
Authors:
Fabio Calefato,
Filippo Lanubile,
Luigi Quaranta
Abstract:
Background. The rapid and growing popularity of machine learning (ML) applications has led to an increasing interest in MLOps, that is, the practice of continuous integration and deployment (CI/CD) of ML-enabled systems. Aims. Since changes may affect not only the code but also the ML model parameters and the data themselves, the automation of traditional CI/CD needs to be extended to manage model…
▽ More
Background. The rapid and growing popularity of machine learning (ML) applications has led to an increasing interest in MLOps, that is, the practice of continuous integration and deployment (CI/CD) of ML-enabled systems. Aims. Since changes may affect not only the code but also the ML model parameters and the data themselves, the automation of traditional CI/CD needs to be extended to manage model retraining in production. Method. In this paper, we present an initial investigation of the MLOps practices implemented in a set of ML-enabled systems retrieved from GitHub, focusing on GitHub Actions and CML, two solutions to automate the development workflow. Results. Our preliminary results suggest that the adoption of MLOps workflows in open-source GitHub projects is currently rather limited. Conclusions. Issues are also identified, which can guide future research work.
△ Less
Submitted 23 September, 2022;
originally announced September 2022.
-
Pynblint: a Static Analyzer for Python Jupyter Notebooks
Authors:
Luigi Quaranta,
Fabio Calefato,
Filippo Lanubile
Abstract:
Jupyter Notebook is the tool of choice of many data scientists in the early stages of ML workflows. The notebook format, however, has been criticized for inducing bad programming practices; indeed, researchers have already shown that open-source repositories are inundated by poor-quality notebooks. Low-quality output from the prototypical stages of ML workflows constitutes a clear bottleneck towar…
▽ More
Jupyter Notebook is the tool of choice of many data scientists in the early stages of ML workflows. The notebook format, however, has been criticized for inducing bad programming practices; indeed, researchers have already shown that open-source repositories are inundated by poor-quality notebooks. Low-quality output from the prototypical stages of ML workflows constitutes a clear bottleneck towards the productization of ML models. To foster the creation of better notebooks, we developed Pynblint, a static analyzer for Jupyter notebooks written in Python. The tool checks the compliance of notebooks (and surrounding repositories) with a set of empirically validated best practices and provides targeted recommendations when violations are detected.
△ Less
Submitted 24 May, 2022;
originally announced May 2022.
-
Eliciting Best Practices for Collaboration with Computational Notebooks
Authors:
Luigi Quaranta,
Fabio Calefato,
Filippo Lanubile
Abstract:
Despite the widespread adoption of computational notebooks, little is known about best practices for their usage in collaborative contexts. In this paper, we fill this gap by eliciting a catalog of best practices for collaborative data science with computational notebooks. With this aim, we first look for best practices through a multivocal literature review. Then, we conduct interviews with profe…
▽ More
Despite the widespread adoption of computational notebooks, little is known about best practices for their usage in collaborative contexts. In this paper, we fill this gap by eliciting a catalog of best practices for collaborative data science with computational notebooks. With this aim, we first look for best practices through a multivocal literature review. Then, we conduct interviews with professional data scientists to assess their awareness of these best practices. Finally, we assess the adoption of best practices through the analysis of 1,380 Jupyter notebooks retrieved from the Kaggle platform. Findings reveal that experts are mostly aware of the best practices and tend to adopt them in their daily work. Nonetheless, they do not consistently follow all the recommendations as, depending on specific contexts, some are deemed unfeasible or counterproductive due to the lack of proper tool support. As such, we envision the design of notebook solutions that allow data scientists not to have to prioritize exploration and rapid prototy** over writing code of quality.
△ Less
Submitted 15 February, 2022;
originally announced February 2022.
-
An in-depth Analysis of Occasional and Recurring Collaborations in Online Music Co-creation
Authors:
Fabio Calefato,
Giuseppe Iaffaldano,
Leonardo Trisolini,
Filippo Lanubile
Abstract:
The success of online creative communities depends on the will of participants to create and derive content in a collaborative environment. Despite their growing popularity, the factors that lead to remixing existing content in online creative communities are not entirely understood. In this paper, we focus on overdubbing, that is, a dyadic collaboration where one author mixes one new track with a…
▽ More
The success of online creative communities depends on the will of participants to create and derive content in a collaborative environment. Despite their growing popularity, the factors that lead to remixing existing content in online creative communities are not entirely understood. In this paper, we focus on overdubbing, that is, a dyadic collaboration where one author mixes one new track with an audio recording previously uploaded by another. We study musicians who collaborate regularly, that is, frequently overdub each other's songs. Building on frequent pattern mining techniques, we develop an approach to seek instances of such recurring collaborations in the Songtree community. We identify 43 instances involving two or three members with a similar reputation in the community. Our findings highlight common and different remix factors in occasional and recurring collaborations. Specifically, fresh and less mature songs are generally overdubbed more; instead, exchanging messages and invitations to collaborate are significant factors only for songs generated through recurring collaborations whereas author reputation (ranking) and applying metadata tags to songs have a positive effect only in occasional collaborations.
△ Less
Submitted 26 October, 2021;
originally announced October 2021.
-
Using Personality Detection Tools for Software Engineering Research: How Far Can We Go?
Authors:
Fabio Calefato,
Filippo Lanubile
Abstract:
Assessing the personality of software engineers may help to match individual traits with the characteristics of development activities such as code review and testing, as well as support managers in team composition. However, self-assessment questionnaires are not a practical solution for collecting multiple observations on a large scale. Instead, automatic personality detection, while overcoming…
▽ More
Assessing the personality of software engineers may help to match individual traits with the characteristics of development activities such as code review and testing, as well as support managers in team composition. However, self-assessment questionnaires are not a practical solution for collecting multiple observations on a large scale. Instead, automatic personality detection, while overcoming these limitations, is based on off-the-shelf solutions trained on non-technical corpora, which might not be readily applicable to technical domains like Software Engineering (SE). In this paper, we first assess the performance of general-purpose personality detection tools when applied to a technical corpus of developers' emails retrieved from the public archives of the Apache Software Foundation. We observe a general low accuracy of predictions and an overall disagreement among the tools. Second, we replicate two previous research studies in SE by replacing the personality detection tool used to infer developers' personalities from pull-request discussions and emails. We observe that the original results are not confirmed, i.e., changing the tool used in the original study leads to diverging conclusions. Our results suggest a need for personality detection tools specially targeted for the software engineering domain.
△ Less
Submitted 22 October, 2021; v1 submitted 11 October, 2021;
originally announced October 2021.
-
KGTorrent: A Dataset of Python Jupyter Notebooks from Kaggle
Authors:
Luigi Quaranta,
Fabio Calefato,
Filippo Lanubile
Abstract:
Computational notebooks have become the tool of choice for many data scientists and practitioners for performing analyses and disseminating results. Despite their increasing popularity, the research community cannot yet count on a large, curated dataset of computational notebooks. In this paper, we fill this gap by introducing KGTorrent, a dataset of Python Jupyter notebooks with rich metadata ret…
▽ More
Computational notebooks have become the tool of choice for many data scientists and practitioners for performing analyses and disseminating results. Despite their increasing popularity, the research community cannot yet count on a large, curated dataset of computational notebooks. In this paper, we fill this gap by introducing KGTorrent, a dataset of Python Jupyter notebooks with rich metadata retrieved from Kaggle, a platform hosting data science competitions for learners and practitioners with any levels of expertise. We describe how we built KGTorrent, and provide instructions on how to use it and refresh the collection to keep it up to date. Our vision is that the research community will use KGTorrent to study how data scientists, especially practitioners, use Jupyter Notebook in the wild and identify potential shortcomings to inform the design of its future extensions.
△ Less
Submitted 18 March, 2021;
originally announced March 2021.
-
Towards Productizing AI/ML Models: An Industry Perspective from Data Scientists
Authors:
Filippo Lanubile,
Fabio Calefato,
Luigi Quaranta,
Maddalena Amoruso,
Fabio Fumarola,
Michele Filannino
Abstract:
The transition from AI/ML models to production-ready AI-based systems is a challenge for both data scientists and software engineers. In this paper, we report the results of a workshop conducted in a consulting company to understand how this transition is perceived by practitioners. Starting from the need for making AI experiments reproducible, the main themes that emerged are related to the use o…
▽ More
The transition from AI/ML models to production-ready AI-based systems is a challenge for both data scientists and software engineers. In this paper, we report the results of a workshop conducted in a consulting company to understand how this transition is perceived by practitioners. Starting from the need for making AI experiments reproducible, the main themes that emerged are related to the use of the Jupyter Notebook as the primary prototy** tool, and the lack of support for software engineering best practices as well as data science specific functionalities.
△ Less
Submitted 18 March, 2021;
originally announced March 2021.
-
Will You Come Back to Contribute? Investigating the Inactivity of OSS Core Developers in GitHub
Authors:
Fabio Calefato,
Marco Aurelio Gerosa,
Giuseppe Iaffaldano,
Filippo Lanubile,
Igor Steinmacher
Abstract:
Several Open Source Software (OSS) projects depend on the continuity of their development communities to remain sustainable. Understanding how developers become inactive or why they take breaks can help communities prevent abandonment and incentivize developers to come back. In this paper, we propose a novel method to identify developers' inactive periods by analyzing the individual rhythm of cont…
▽ More
Several Open Source Software (OSS) projects depend on the continuity of their development communities to remain sustainable. Understanding how developers become inactive or why they take breaks can help communities prevent abandonment and incentivize developers to come back. In this paper, we propose a novel method to identify developers' inactive periods by analyzing the individual rhythm of contributions to the projects. Using this method, we quantitatively analyze the inactivity of core developers in 18 OSS organizations hosted on GitHub. We also survey core developers to receive their feedback about the identified breaks and transitions. Our results show that our method was effective for identifying developers' breaks. About 94% of the surveyed core developers agreed with our state model of inactivity; 71% and 79% of them acknowledged their breaks and state transition, respectively. We also show that all core developers take breaks (at least once) and about a half of them (~45%}) have completely disengaged from a project for at least one year. We also analyzed the probability of transitions to/from inactivity and found that developers who pause their activity have a ~35-55\% chance to return to an active state; yet, if the break lasts for a year or longer, then the probability of resuming activities drops to ~21-26%, with a ~54% chance of complete disengagement. These results may support the creation of policies and mechanisms to make OSS community managers aware of breaks and potential project abandonment.
△ Less
Submitted 30 June, 2021; v1 submitted 8 March, 2021;
originally announced March 2021.
-
Assessment of Off-the-Shelf SE-specific Sentiment Analysis Tools: An Extended Replication Study
Authors:
Nicole Novielli,
Fabio Calefato,
Filippo Lanubile,
Alexander Serebrenik
Abstract:
Sentiment analysis methods have become popular for investigating human communication, including discussions related to software projects. Since general-purpose sentiment analysis tools do not fit well with the information exchanged by software developers, new tools, specific for software engineering (SE), have been developed. We investigate to what extent SE-specific tools for sentiment analysis m…
▽ More
Sentiment analysis methods have become popular for investigating human communication, including discussions related to software projects. Since general-purpose sentiment analysis tools do not fit well with the information exchanged by software developers, new tools, specific for software engineering (SE), have been developed. We investigate to what extent SE-specific tools for sentiment analysis mitigate the threats to conclusion validity of empirical studies in software engineering, highlighted by previous research. First, we replicate two studies addressing the role of sentiment in security discussions on GitHub and in question-writing on Stack Overflow. Then, we extend the previous studies by assessing to what extent the tools agree with each other and with the manual annotation on a gold standard of 600 documents. We find that different SE-specific sentiment analysis tools might lead to contradictory results at a fine-grain level, when used 'off-the-shelf'. Conversely, platform-specific tuning or retraining might be needed to take into account differences in platform conventions, jargon, or document lengths.
△ Less
Submitted 19 February, 2021; v1 submitted 20 October, 2020;
originally announced October 2020.
-
Love, Joy, Anger, Sadness, Fear, and Surprise: SE Needs Special Kinds of AI: A Case Study on Text Mining and SE
Authors:
Nicole Novielli,
Fabio Calefato,
Filippo Lanubile
Abstract:
Do you like your code? What kind of code makes developers happiest? What makes them angriest? Is it possible to monitor the mood of a large team of coders to determine when and where a codebase needs additional help?
Do you like your code? What kind of code makes developers happiest? What makes them angriest? Is it possible to monitor the mood of a large team of coders to determine when and where a codebase needs additional help?
△ Less
Submitted 23 April, 2020;
originally announced April 2020.
-
Can We Use SE-specific Sentiment Analysis Tools in a Cross-Platform Setting?
Authors:
Nicole Novielli,
Fabio Calefato,
Davide Dongiovanni,
Daniela Girardi,
Filippo Lanubile
Abstract:
In this paper, we address the problem of using sentiment analysis tools 'off-the-shelf,' that is when a gold standard is not available for retraining. We evaluate the performance of four SE-specific tools in a cross-platform setting, i.e., on a test set collected from data sources different from the one used for training. We find that (i) the lexicon-based tools outperform the supervised approache…
▽ More
In this paper, we address the problem of using sentiment analysis tools 'off-the-shelf,' that is when a gold standard is not available for retraining. We evaluate the performance of four SE-specific tools in a cross-platform setting, i.e., on a test set collected from data sources different from the one used for training. We find that (i) the lexicon-based tools outperform the supervised approaches retrained in a cross-platform setting and (ii) retraining can be beneficial in within-platform settings in the presence of robust gold standard datasets, even using a minimal training set. Based on our empirical findings, we derive guidelines for reliable use of sentiment analysis tools in software engineering.
△ Less
Submitted 1 April, 2020;
originally announced April 2020.
-
A Case Study on Tool Support for Collaboration in Agile Development
Authors:
Fabio Calefato,
Andrea Giove,
Marco Losavio,
Filippo Lanubile
Abstract:
We report on a longitudinal case study conducted at the Italian site of a large software company to further our understanding of how development and communication tools can be improved to better support agile practices and collaboration. After observing inconsistencies in the way communication tools (i.e., email, Skype, and Slack) were used, we first reinforced the use of Slack as the central hub…
▽ More
We report on a longitudinal case study conducted at the Italian site of a large software company to further our understanding of how development and communication tools can be improved to better support agile practices and collaboration. After observing inconsistencies in the way communication tools (i.e., email, Skype, and Slack) were used, we first reinforced the use of Slack as the central hub for internal communication, while setting clear rules regarding tools usage. As a second main change, we refactored the Jira Scrum board into two separate boards, a detailed one for developers and a high-level one for managers, while also introducing automation rules and the integration with Slack. The first change revealed that the teams of developers used and appreciated Slack differently with the QA team being the most favorable and that the use of channels is hindered by automatic notifications from development tools (e.g., Jenkins). The findings from the second change show that 85\% of the interviewees reported perceived improvements in their workflow. Despite the limitations due to the single nature of the reported case, we highlight the importance for companies to reflect on how to properly set up their agile work environment to improve communication and facilitate collaboration.
△ Less
Submitted 1 April, 2020;
originally announced April 2020.
-
Recognizing Developers' Emotions while Programming
Authors:
Daniela Girardi,
Nicole Novielli,
Davide Fucci,
Filippo Lanubile
Abstract:
Developers experience a wide range of emotions during programming tasks, which may have an impact on job performance. In this paper, we present an empirical study aimed at (i) investigating the link between emotion and progress, (ii) understanding the triggers for developers' emotions and the strategies to deal with negative ones, (iii) identifying the minimal set of non-invasive biometric sensors…
▽ More
Developers experience a wide range of emotions during programming tasks, which may have an impact on job performance. In this paper, we present an empirical study aimed at (i) investigating the link between emotion and progress, (ii) understanding the triggers for developers' emotions and the strategies to deal with negative ones, (iii) identifying the minimal set of non-invasive biometric sensors for emotion recognition during programming task. Results confirm previous findings about the relation between emotions and perceived productivity. Furthermore, we show that developers' emotions can be reliably recognized using only a wristband capturing the electrodermal activity and heart-related metrics.
△ Less
Submitted 6 May, 2021; v1 submitted 24 January, 2020;
originally announced January 2020.
-
A large-scale, in-depth analysis of developers' personalities in the Apache ecosystem
Authors:
Fabio Calefato,
Filippo Lanubile,
Bogdan Vasilescu
Abstract:
Context: Large-scale distributed projects are typically the results of collective efforts performed by multiple developers with heterogeneous personalities. Objective: We aim to find evidence that personalities can explain developers' behavior in large scale-distributed projects. For example, the propensity to trust others - a critical factor for the success of global software engineering - has be…
▽ More
Context: Large-scale distributed projects are typically the results of collective efforts performed by multiple developers with heterogeneous personalities. Objective: We aim to find evidence that personalities can explain developers' behavior in large scale-distributed projects. For example, the propensity to trust others - a critical factor for the success of global software engineering - has been found to influence positively the result of code reviews in distributed projects. Method: In this paper, we perform a quantitative analysis of ecosystem-level data from the code commits and email messages contributed by the developers working on the Apache Software Foundation (ASF) projects, as representative of large scale-distributed projects. Results: We find that there are three common types of personality profiles among Apache developers, characterized in particular by their level of Agreeableness and Neuroticism. We also confirm that developers' personality is stable over time. Moreover, personality traits do not vary with their role, membership, and extent of contribution to the projects. We also find evidence that more open developers are more likely to make contributors to Apache projects. Conclusion: Overall, our findings reinforce the need for future studies on human factors in software engineering to use psychometric tools to control for differences in developers' personalities.
△ Less
Submitted 13 April, 2022; v1 submitted 30 May, 2019;
originally announced May 2019.
-
Why do developers take breaks from contributing to OSS projects? A preliminary analysis
Authors:
Giuseppe Iaffaldano,
Igor Steinmacher,
Fabio Calefato,
Marco Gerosa,
Filippo Lanubile
Abstract:
Creating a successful and sustainable Open Source Software (OSS) project often depends on the strength and the health of the community behind it. Current literature explains the contributors' lifecycle, starting with the motivations that drive people to contribute and barriers to joining OSS projects, covering developers' evolution until they become core members. However, the stages when developer…
▽ More
Creating a successful and sustainable Open Source Software (OSS) project often depends on the strength and the health of the community behind it. Current literature explains the contributors' lifecycle, starting with the motivations that drive people to contribute and barriers to joining OSS projects, covering developers' evolution until they become core members. However, the stages when developers leave the projects are still weakly explored and are not well-defined in existing developers' lifecycle models. In this position paper, we enrich the knowledge about the leaving stage by identifying slee** and dead states, representing temporary and permanent brakes that developers take from contributing. We conducted a preliminary set of semi-structured interviews with active developers. We analyzed the answers by focusing on defining and understanding the reasons for the transitions to/from slee** and dead states. This paper raises new questions that may guide further discussions and research, which may ultimately benefit OSS communities.
△ Less
Submitted 29 July, 2021; v1 submitted 22 March, 2019;
originally announced March 2019.
-
EMTk -- The Emotion Mining Toolkit
Authors:
Fabio Calefato,
Filippo Lanubile,
Nicole Novielli,
Luigi Quaranta
Abstract:
The Emotion Mining Toolkit (EMTk) is a suite of modules and datasets offering a comprehensive solution for mining sentiment and emotions from technical text contributed by developers on communication channels. The toolkit is written in Java, Python, and R, and is released under the MIT open source license. In this paper, we describe its architecture and the benchmark against the previous, standalo…
▽ More
The Emotion Mining Toolkit (EMTk) is a suite of modules and datasets offering a comprehensive solution for mining sentiment and emotions from technical text contributed by developers on communication channels. The toolkit is written in Java, Python, and R, and is released under the MIT open source license. In this paper, we describe its architecture and the benchmark against the previous, standalone versions of our sentiment analysis tools. Results show large improvements in terms of speed.
△ Less
Submitted 12 April, 2021; v1 submitted 22 March, 2019;
originally announced March 2019.
-
An empirical assessment of best-answer prediction models in technical Q&A sites
Authors:
Fabio Calefato,
Filippo Lanubile,
Nicole Novielli
Abstract:
Technical Q&A sites have become essential for software engineers as they constantly seek help from other experts to solve their work problems. Despite their success, many questions remain unresolved, sometimes because the asker does not acknowledge any helpful answer. In these cases, an information seeker can only browse all the answers within a question thread to assess their quality as potential…
▽ More
Technical Q&A sites have become essential for software engineers as they constantly seek help from other experts to solve their work problems. Despite their success, many questions remain unresolved, sometimes because the asker does not acknowledge any helpful answer. In these cases, an information seeker can only browse all the answers within a question thread to assess their quality as potential solutions. We approach this time-consuming problem as a binary-classification task where a best-answer prediction model is built to identify the accepted answer among those within a resolved question thread, and the candidate solutions to those questions that have received answers but are still unresolved. In this paper, we report on a study aimed at assessing 26 best-answer prediction models in two steps. First, we study how models perform when predicting best answers in Stack Overflow, the most popular Q&A site for software engineers. Then, we assess performance in a cross-platform setting where the prediction models are trained on Stack Overflow and tested on other technical Q&A sites. Our findings show that the choice of the classifier and automated parameter tuning have a large impact on the prediction of the best answer. We also demonstrate that our approach to the best-answer prediction problem is generalizable across technical Q&A sites. Finally, we provide practical recommendations to Q&A platform designers to curate and preserve the crowdsourced knowledge shared through these sites.
△ Less
Submitted 22 March, 2019;
originally announced March 2019.
-
A Replication Study on Code Comprehension and Expertise using Lightweight Biometric Sensors
Authors:
Davide Fucci,
Daniela Girardi,
Nicole Novielli,
Luigi Quaranta,
Filippo Lanubile
Abstract:
Code comprehension has been recently investigated from physiological and cognitive perspectives through the use of medical imaging. Floyd et al (i.e., the original study) used fMRI to classify the type of comprehension tasks performed by developers and relate such results to their expertise. We replicate the original study using lightweight biometrics sensors which participants (28 undergrads in c…
▽ More
Code comprehension has been recently investigated from physiological and cognitive perspectives through the use of medical imaging. Floyd et al (i.e., the original study) used fMRI to classify the type of comprehension tasks performed by developers and relate such results to their expertise. We replicate the original study using lightweight biometrics sensors which participants (28 undergrads in computer science) wore when performing comprehension tasks on source code and natural language prose. We developed machine learning models to automatically identify what kind of tasks developers are working on leveraging their brain-, heart-, and skin-related signals. The best improvement over the original study performance is achieved using solely the heart signal obtained through a single device (BAC 87% vs. 79.1%). Differently from the original study, we were not able to observe a correlation between the participants' expertise and the classifier performance (tau = 0.16, p = 0.31). Our findings show that lightweight biometric sensors can be used to accurately recognize comprehension tasks opening interesting scenarios for research and practice.
△ Less
Submitted 2 April, 2019; v1 submitted 8 March, 2019;
originally announced March 2019.
-
Investigating Crowd Creativity in Online Music Communities
Authors:
Fabio Calefato,
Giuseppe Iaffaldano,
Filippo Lanubile,
Federico Maiorano
Abstract:
Crowd creativity is typically associated with peer-production communities focusing on artistic products like animations, video games, and music, but less frequently to Open Source Software (OSS), despite the fact that also developers must be creative to come up with new solutions to their technical challenges. In this paper, we conduct a study to further the understanding of which factors from pri…
▽ More
Crowd creativity is typically associated with peer-production communities focusing on artistic products like animations, video games, and music, but less frequently to Open Source Software (OSS), despite the fact that also developers must be creative to come up with new solutions to their technical challenges. In this paper, we conduct a study to further the understanding of which factors from prior work in both OSS and art communities are predictive of successful collaboration - defined as reuse of previous songs - in three different songwriting communities, namely Songtree, Splice, and ccMixter. The main findings from this study confirm that the success of collaborations is associated with high community status of recognizable authors and low degree of derivativity of songs.
△ Less
Submitted 31 May, 2021; v1 submitted 14 September, 2018;
originally announced September 2018.
-
A Benchmark Study on Sentiment Analysis for Software Engineering Research
Authors:
Nicole Novielli,
Daniela Girardi,
Filippo Lanubile
Abstract:
A recent research trend has emerged to identify developers' emotions, by applying sentiment analysis to the content of communication traces left in collaborative development environments. Trying to overcome the limitations posed by using off-the-shelf sentiment analysis tools, researchers recently started to develop their own tools for the software engineering domain. In this paper, we report a be…
▽ More
A recent research trend has emerged to identify developers' emotions, by applying sentiment analysis to the content of communication traces left in collaborative development environments. Trying to overcome the limitations posed by using off-the-shelf sentiment analysis tools, researchers recently started to develop their own tools for the software engineering domain. In this paper, we report a benchmark study to assess the performance and reliability of three sentiment analysis tools specifically customized for software engineering. Furthermore, we offer a reflection on the open challenges, as they emerge from a qualitative analysis of misclassified texts.
△ Less
Submitted 17 March, 2018;
originally announced March 2018.
-
A Gold Standard for Emotion Annotation in Stack Overflow
Authors:
Nicole Novielli,
Fabio Calefato,
Filippo Lanubile
Abstract:
Software developers experience and share a wide range of emotions throughout a rich ecosystem of communication channels. A recent trend that has emerged in empirical software engineering studies is leveraging sentiment analysis of developers' communication traces. We release a dataset of 4,800 questions, answers, and comments from Stack Overflow, manually annotated for emotions. Our dataset contri…
▽ More
Software developers experience and share a wide range of emotions throughout a rich ecosystem of communication channels. A recent trend that has emerged in empirical software engineering studies is leveraging sentiment analysis of developers' communication traces. We release a dataset of 4,800 questions, answers, and comments from Stack Overflow, manually annotated for emotions. Our dataset contributes to the building of a shared corpus of annotated resources to support research on emotion awareness in software development.
△ Less
Submitted 6 March, 2018;
originally announced March 2018.
-
On Developers' Personality in Large-scale Distributed Projects: The Case of the Apache Ecosystem
Authors:
Fabio Calefato,
Giuseppe Iaffaldano,
Filippo Lanubile,
Bogdan Vasilescu
Abstract:
Large-scale distributed projects are typically the results of collective efforts performed by multiple developers, each one having a different personality. The study of developers' personalities has the potential of explaining their' behavior in various contexts. For example, the propensity to trust others, a critical factor to the success of global software engineering - has been found to influen…
▽ More
Large-scale distributed projects are typically the results of collective efforts performed by multiple developers, each one having a different personality. The study of developers' personalities has the potential of explaining their' behavior in various contexts. For example, the propensity to trust others, a critical factor to the success of global software engineering - has been found to influence positively the result of code reviews in distributed projects. In this paper, we perform a quantitative analysis of developers' personality in open source software projects, intended as an extreme form of distributed projects in which no single organization controls the project. We mine ecosystem-level data from the code commits and email messages contributed by the developers working on the Apache Software Foundation (ASF) projects, as representative of large scale-distributed projects. We find that developers become over time more conscientious, agreeable, and neurotic. Moreover, personality traits do not vary with their role, membership, and extent of contribution to the projects. We also find evidence that more open and more agreeable developers are more likely to become project contributors.
△ Less
Submitted 24 September, 2018; v1 submitted 3 March, 2018;
originally announced March 2018.
-
How to Ask for Technical Help? Evidence-based Guidelines for Writing Questions on Stack Overflow
Authors:
Fabio Calefato,
Filippo Lanubile,
Nicole Novielli
Abstract:
Context: The success of Stack Overflow and other community-based question-and-answer (Q&A) sites depends mainly on the will of their members to answer others' questions. In fact, when formulating requests on Q&A sites, we are not simply seeking for information. Instead, we are also asking for other people's help and feedback. Understanding the dynamics of the participation in Q&A communities is es…
▽ More
Context: The success of Stack Overflow and other community-based question-and-answer (Q&A) sites depends mainly on the will of their members to answer others' questions. In fact, when formulating requests on Q&A sites, we are not simply seeking for information. Instead, we are also asking for other people's help and feedback. Understanding the dynamics of the participation in Q&A communities is essential to improve the value of crowdsourced knowledge.
Objective: In this paper, we investigate how information seekers can increase the chance of eliciting a successful answer to their questions on Stack Overflow by focusing on the following actionable factors: affect, presentation quality, and time.
Method: We develop a conceptual framework of factors potentially influencing the success of questions in Stack Overflow. We quantitatively analyze a set of over 87K questions from the official Stack Overflow dump to assess the impact of actionable factors on the success of technical requests. The information seeker reputation is included as a control factor. Furthermore, to understand the role played by affective states in the success of questions, we qualitatively analyze questions containing positive and negative emotions. Finally, a survey is conducted to understand how Stack Overflow users perceive the guideline suggestions for writing questions.
Results: We found that regardless of user reputation, successful questions are short, contain code snippets, and do not abuse with uppercase characters. As regards affect, successful questions adopt a neutral emotional style.
Conclusion: We provide evidence-based guidelines for writing effective questions on Stack Overflow that software engineers can follow to increase the chance of getting technical help. As for the role of affect, we empirically confirmed community guidelines that suggest avoiding rudeness in question writing.
△ Less
Submitted 24 November, 2017; v1 submitted 12 October, 2017;
originally announced October 2017.
-
Collaboration Success Factors in an Online Music Community
Authors:
Fabio Calefato,
Giuseppe Iaffaldano,
Filippo Lanubile
Abstract:
Online communities have been able to develop large, open-source software (OSS) projects like Linux and Firefox throughout the successful collaborations carried out by their members over the Internet. However, online communities also involve creative arts domains such as animation, video games, and music. Despite their growing popularity, the factors that lead to successful collaborations in these…
▽ More
Online communities have been able to develop large, open-source software (OSS) projects like Linux and Firefox throughout the successful collaborations carried out by their members over the Internet. However, online communities also involve creative arts domains such as animation, video games, and music. Despite their growing popularity, the factors that lead to successful collaborations in these communities are not entirely understood. In this paper, we present a study on creative collaboration in a music community where authors write songs together by 'overdubbing,' that is, by mixing a new track with an existing audio recording. We analyzed the relationship between song- and author-related measures and the likelihood of a song being overdubbed. We found that recent songs, as well as songs with many reactions, are more likely to be overdubbed; authors with a high status in the community and a recognizable identity write songs that the community tends to build upon.
△ Less
Submitted 18 October, 2017; v1 submitted 1 October, 2017;
originally announced October 2017.
-
Establishing Personal Trust-based Connections in Distributed Teams
Authors:
Fabio Calefato,
Filippo Lanubile
Abstract:
Trust is a factor that dramatically contributes to the success or failure of distributed software teams. We present a research model showing that social communication between distant developers enables the affective appraisal of trustworthiness even from a distance, thus increasing project performance. To overcome the limitations of self-reported data, typically questionnaires, we focus on softwar…
▽ More
Trust is a factor that dramatically contributes to the success or failure of distributed software teams. We present a research model showing that social communication between distant developers enables the affective appraisal of trustworthiness even from a distance, thus increasing project performance. To overcome the limitations of self-reported data, typically questionnaires, we focus on software projects following a pull request-based development model and approximate the overall performance of a software project with the history of successful collaborations occurring between developers.
△ Less
Submitted 1 October, 2017; v1 submitted 12 September, 2017;
originally announced September 2017.
-
Sentiment Polarity Detection for Software Development
Authors:
Fabio Calefato,
Filippo Lanubile,
Federico Maiorano,
Nicole Novielli
Abstract:
The role of sentiment analysis is increasingly emerging to study software developers' emotions by mining crowd-generated content within social software engineering tools. However, off-the-shelf sentiment analysis tools have been trained on non-technical domains and general-purpose social media, thus resulting in misclassifications of technical jargon and problem reports. Here, we present Senti4SD,…
▽ More
The role of sentiment analysis is increasingly emerging to study software developers' emotions by mining crowd-generated content within social software engineering tools. However, off-the-shelf sentiment analysis tools have been trained on non-technical domains and general-purpose social media, thus resulting in misclassifications of technical jargon and problem reports. Here, we present Senti4SD, a classifier specifically trained to support sentiment analysis in developers' communication channels. Senti4SD is trained and validated using a gold standard of Stack Overflow questions, answers, and comments manually annotated for sentiment polarity. It exploits a suite of both lexicon- and keyword-based features, as well as semantic features based on word embedding. With respect to a mainstream off-the-shelf tool, which we use as a baseline, Senti4SD reduces the misclassifications of neutral and positive posts as emotionally negative. To encourage replications, we release a lab package including the classifier, the word embedding space, and the gold standard with annotation guidelines.
△ Less
Submitted 25 September, 2017; v1 submitted 9 September, 2017;
originally announced September 2017.
-
Emotion Detection Using Noninvasive Low Cost Sensors
Authors:
Daniela Girardi,
Filippo Lanubile,
Nicole Novielli
Abstract:
Emotion recognition from biometrics is relevant to a wide range of application domains, including healthcare. Existing approaches usually adopt multi-electrodes sensors that could be expensive or uncomfortable to be used in real-life situations. In this study, we investigate whether we can reliably recognize high vs. low emotional valence and arousal by relying on noninvasive low cost EEG, EMG, an…
▽ More
Emotion recognition from biometrics is relevant to a wide range of application domains, including healthcare. Existing approaches usually adopt multi-electrodes sensors that could be expensive or uncomfortable to be used in real-life situations. In this study, we investigate whether we can reliably recognize high vs. low emotional valence and arousal by relying on noninvasive low cost EEG, EMG, and GSR sensors. We report the results of an empirical study involving 19 subjects. We achieve state-of-the- art classification performance for both valence and arousal even in a cross-subject classification setting, which eliminates the need for individual training and tuning of classification models.
△ Less
Submitted 22 August, 2017;
originally announced August 2017.
-
EmoTxt: A Toolkit for Emotion Recognition from Text
Authors:
Fabio Calefato,
Filippo Lanubile,
Nicole Novielli
Abstract:
We present EmoTxt, a toolkit for emotion recognition from text, trained and tested on a gold standard of about 9K question, answers, and comments from online interactions. We provide empirical evidence of the performance of EmoTxt. To the best of our knowledge, EmoTxt is the first open-source toolkit supporting both emotion recognition from text and training of custom emotion classification models…
▽ More
We present EmoTxt, a toolkit for emotion recognition from text, trained and tested on a gold standard of about 9K question, answers, and comments from online interactions. We provide empirical evidence of the performance of EmoTxt. To the best of our knowledge, EmoTxt is the first open-source toolkit supporting both emotion recognition from text and training of custom emotion classification models.
△ Less
Submitted 19 January, 2018; v1 submitted 13 August, 2017;
originally announced August 2017.
-
Mining Communication Data in a Music Community: A Preliminary Analysis
Authors:
Fabio Calefato,
Giuseppe Iaffaldano,
Filippo Lanubile,
Antonio Lategano,
Nicole Novielli
Abstract:
Comments play an important role within online creative communities because they make it possible to foster the production and improvement of authors' artifacts. We investigate how comment-based communication help shape members' behavior within online creative communities. In this paper, we report the results of a preliminary study aimed at mining the communication network of a music community for…
▽ More
Comments play an important role within online creative communities because they make it possible to foster the production and improvement of authors' artifacts. We investigate how comment-based communication help shape members' behavior within online creative communities. In this paper, we report the results of a preliminary study aimed at mining the communication network of a music community for collaborative songwriting, where users collaborate online by first uploading new songs and then by adding new tracks and providing feedback in forms of comments.
△ Less
Submitted 25 February, 2018; v1 submitted 12 May, 2017;
originally announced May 2017.
-
Bootstrap** a Lexicon for Emotional Arousal in Software Engineering
Authors:
Mika V. Mäntylä,
Nicole Novielli,
Filippo Lanubile,
Maëlick Claes,
Miikka Kuutila
Abstract:
Emotional arousal increases activation and performance but may also lead to burnout in software development. We present the first version of a Software Engineering Arousal lexicon (SEA) that is specifically designed to address the problem of emotional arousal in the software developer ecosystem. SEA is built using a bootstrap** approach that combines word embedding model trained on issue-trackin…
▽ More
Emotional arousal increases activation and performance but may also lead to burnout in software development. We present the first version of a Software Engineering Arousal lexicon (SEA) that is specifically designed to address the problem of emotional arousal in the software developer ecosystem. SEA is built using a bootstrap** approach that combines word embedding model trained on issue-tracking data and manual scoring of items in the lexicon. We show that our lexicon is able to differentiate between issue priorities, which are a source of emotional activation and then act as a proxy for arousal. The best performance is obtained by combining SEA (428 words) with a previously created general purpose lexicon by Warriner et al. (13,915 words) and it achieves Cohen's d effect sizes up to 0.5.
△ Less
Submitted 27 March, 2017;
originally announced March 2017.
-
A Preliminary Analysis on the Effects of Propensity to Trust in Distributed Software Development
Authors:
Fabio Calefato,
Filippo Lanubile,
Nicole Novielli
Abstract:
Establishing trust between developers working at distant sites facilitates team collaboration in distributed software development. While previous research has focused on how to build and spread trust in absence of direct, face-to-face communication, it has overlooked the effects of the propensity to trust, i.e., the trait of personality representing the individual disposition to perceive the other…
▽ More
Establishing trust between developers working at distant sites facilitates team collaboration in distributed software development. While previous research has focused on how to build and spread trust in absence of direct, face-to-face communication, it has overlooked the effects of the propensity to trust, i.e., the trait of personality representing the individual disposition to perceive the others as trustworthy. In this study, we present a preliminary, quantitative analysis on how the propensity to trust affects the success of collaborations in a distributed project, where the success is represented by pull requests whose code changes and contributions are successfully merged into the project's repository.
△ Less
Submitted 3 October, 2017; v1 submitted 16 February, 2017;
originally announced February 2017.