-
A Multivocal Literature Review on the Benefits and Limitations of Automated Machine Learning Tools
Authors:
Kelly Azevedo,
Luigi Quaranta,
Fabio Calefato,
Marcos Kalinowski
Abstract:
Context. Advancements in Machine Learning (ML) are revolutionizing every application domain, driving unprecedented transformations and fostering innovation. However, despite these advances, several organizations are experiencing friction in the adoption of ML-based technologies, mainly due to the shortage of ML professionals. In this context, Automated Machine Learning (AutoML) techniques have bee…
▽ More
Context. Advancements in Machine Learning (ML) are revolutionizing every application domain, driving unprecedented transformations and fostering innovation. However, despite these advances, several organizations are experiencing friction in the adoption of ML-based technologies, mainly due to the shortage of ML professionals. In this context, Automated Machine Learning (AutoML) techniques have been presented as a promising solution to democratize ML adoption. Objective. We aim to provide an overview of the evidence on the benefits and limitations of using AutoML tools. Method. We conducted a multivocal literature review, which allowed us to identify 54 sources from the academic literature and 108 sources from the grey literature reporting on AutoML benefits and limitations. We extracted reported benefits and limitations from the papers and applied thematic analysis. Results. We identified 18 benefits and 25 limitations. Concerning the benefits, we highlight that AutoML tools can help streamline the core steps of ML workflows, namely data preparation, feature engineering, model construction, and hyperparameter tuning, with concrete benefits on model performance, efficiency, and scalability. In addition, AutoML empowers both novice and experienced data scientists, promoting ML accessibility. On the other hand, we highlight several limitations that may represent obstacles to the widespread adoption of AutoML. For instance, AutoML tools may introduce barriers to transparency and interoperability, exhibit limited flexibility for complex scenarios, and offer inconsistent coverage of the ML workflow. Conclusions. The effectiveness of AutoML in facilitating the adoption of machine learning by users may vary depending on the tool and the context in which it is used. As of today, AutoML tools are used to increase human expertise rather than replace it, and, as such, they require skilled users.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
Assessing the Use of AutoML for Data-Driven Software Engineering
Authors:
Fabio Calefato,
Luigi Quaranta,
Filippo Lanubile,
Marcos Kalinowski
Abstract:
Background. Due to the widespread adoption of Artificial Intelligence (AI) and Machine Learning (ML) for building software applications, companies are struggling to recruit employees with a deep understanding of such technologies. In this scenario, AutoML is soaring as a promising solution to fill the AI/ML skills gap since it promises to automate the building of end-to-end AI/ML pipelines that wo…
▽ More
Background. Due to the widespread adoption of Artificial Intelligence (AI) and Machine Learning (ML) for building software applications, companies are struggling to recruit employees with a deep understanding of such technologies. In this scenario, AutoML is soaring as a promising solution to fill the AI/ML skills gap since it promises to automate the building of end-to-end AI/ML pipelines that would normally be engineered by specialized team members. Aims. Despite the growing interest and high expectations, there is a dearth of information about the extent to which AutoML is currently adopted by teams develo** AI/ML-enabled systems and how it is perceived by practitioners and researchers. Method. To fill these gaps, in this paper, we present a mixed-method study comprising a benchmark of 12 end-to-end AutoML tools on two SE datasets and a user survey with follow-up interviews to further our understanding of AutoML adoption and perception. Results. We found that AutoML solutions can generate models that outperform those trained and optimized by researchers to perform classification tasks in the SE domain. Also, our findings show that the currently available AutoML solutions do not live up to their names as they do not equally support automation across the stages of the ML development workflow and for all the team members. Conclusions. We derive insights to inform the SE research community on how AutoML can facilitate their activities and tool builders on how to design the next generation of AutoML technologies.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
A Lot of Talk and a Badge: An Exploratory Analysis of Personal Achievements in GitHub
Authors:
Fabio Calefato,
Luigi Quaranta,
Filippo Lanubile
Abstract:
Context. GitHub has introduced a new gamification element through personal achievements, whereby badges are unlocked and displayed on developers' personal profile pages in recognition of their development activities. Objective. In this paper, we present an exploratory analysis using mixed methods to study the diffusion of personal badges in GitHub, in addition to the effects and reactions to their…
▽ More
Context. GitHub has introduced a new gamification element through personal achievements, whereby badges are unlocked and displayed on developers' personal profile pages in recognition of their development activities. Objective. In this paper, we present an exploratory analysis using mixed methods to study the diffusion of personal badges in GitHub, in addition to the effects and reactions to their introduction. Method. First, we conduct an observational study by mining longitudinal data from more than 6,000 developers and performed correlation and regression analysis. Then, we conduct a survey and analyze over 300 GitHub community discussions on the topic of personal badges to gauge how the community responded to the introduction of the new feature. Results. We find that most of the developers sampled own at least a badge, but we also observe an increasing number of users who choose to keep their profile private and opt out of displaying badges. Besides, badges are generally poorly correlated with developers' qualities and dispositions such as timeliness and desire to collaborate. We also find that, except for the Starstruck badge (reflecting the number of followers), their introduction does not have an effect. Finally, the reaction of the community has been in general mixed, as developers find them appealing in principle but without a clear purpose and hardly reflecting their abilities in the current form. Conclusions. We provide recommendations to GitHub platform designers on how to improve the current implementation of personal badges as both a gamification mechanism and as sources of reliable cues of ability for developers' assessment
△ Less
Submitted 2 February, 2024; v1 submitted 26 March, 2023;
originally announced March 2023.
-
Teaching MLOps in Higher Education through Project-Based Learning
Authors:
Filippo Lanubile,
Silverio MartÃnez-Fernández,
Luigi Quaranta
Abstract:
Building and maintaining production-grade ML-enabled components is a complex endeavor that goes beyond the current approach of academic education, focused on the optimization of ML model performance in the lab. In this paper, we present a project-based learning approach to teaching MLOps, focused on the demonstration and experience with emerging practices and tools to automatize the construction o…
▽ More
Building and maintaining production-grade ML-enabled components is a complex endeavor that goes beyond the current approach of academic education, focused on the optimization of ML model performance in the lab. In this paper, we present a project-based learning approach to teaching MLOps, focused on the demonstration and experience with emerging practices and tools to automatize the construction of ML-enabled components. We examine the design of a course based on this approach, including laboratory sessions that cover the end-to-end ML component life cycle, from model building to production deployment. Moreover, we report on preliminary results from the first edition of the course. During the present year, an updated version of the same course is being delivered in two independent universities; the related learning outcomes will be evaluated to analyze the effectiveness of project-based learning for this specific subject.
△ Less
Submitted 2 February, 2023;
originally announced February 2023.
-
A Preliminary Investigation of MLOps Practices in GitHub
Authors:
Fabio Calefato,
Filippo Lanubile,
Luigi Quaranta
Abstract:
Background. The rapid and growing popularity of machine learning (ML) applications has led to an increasing interest in MLOps, that is, the practice of continuous integration and deployment (CI/CD) of ML-enabled systems. Aims. Since changes may affect not only the code but also the ML model parameters and the data themselves, the automation of traditional CI/CD needs to be extended to manage model…
▽ More
Background. The rapid and growing popularity of machine learning (ML) applications has led to an increasing interest in MLOps, that is, the practice of continuous integration and deployment (CI/CD) of ML-enabled systems. Aims. Since changes may affect not only the code but also the ML model parameters and the data themselves, the automation of traditional CI/CD needs to be extended to manage model retraining in production. Method. In this paper, we present an initial investigation of the MLOps practices implemented in a set of ML-enabled systems retrieved from GitHub, focusing on GitHub Actions and CML, two solutions to automate the development workflow. Results. Our preliminary results suggest that the adoption of MLOps workflows in open-source GitHub projects is currently rather limited. Conclusions. Issues are also identified, which can guide future research work.
△ Less
Submitted 23 September, 2022;
originally announced September 2022.
-
Assessing the Quality of Computational Notebooks for a Frictionless Transition from Exploration to Production
Authors:
Luigi Quaranta
Abstract:
The massive trend of integrating data-driven AI capabilities into traditional software systems is rising new intriguing challenges. One of such challenges is achieving a smooth transition from the explorative phase of Machine Learning projects - in which data scientists build prototypical models in the lab - to their production phase - in which software engineers translate prototypes into producti…
▽ More
The massive trend of integrating data-driven AI capabilities into traditional software systems is rising new intriguing challenges. One of such challenges is achieving a smooth transition from the explorative phase of Machine Learning projects - in which data scientists build prototypical models in the lab - to their production phase - in which software engineers translate prototypes into production-ready AI components. To narrow down the gap between these two phases, tools and practices adopted by data scientists might be improved by incorporating consolidated software engineering solutions. In particular, computational notebooks have a prominent role in determining the quality of data science prototypes. In my research project, I address this challenge by studying the best practices for collaboration with computational notebooks and proposing proof-of-concept tools to foster guidelines compliance.
△ Less
Submitted 24 May, 2022;
originally announced May 2022.
-
Pynblint: a Static Analyzer for Python Jupyter Notebooks
Authors:
Luigi Quaranta,
Fabio Calefato,
Filippo Lanubile
Abstract:
Jupyter Notebook is the tool of choice of many data scientists in the early stages of ML workflows. The notebook format, however, has been criticized for inducing bad programming practices; indeed, researchers have already shown that open-source repositories are inundated by poor-quality notebooks. Low-quality output from the prototypical stages of ML workflows constitutes a clear bottleneck towar…
▽ More
Jupyter Notebook is the tool of choice of many data scientists in the early stages of ML workflows. The notebook format, however, has been criticized for inducing bad programming practices; indeed, researchers have already shown that open-source repositories are inundated by poor-quality notebooks. Low-quality output from the prototypical stages of ML workflows constitutes a clear bottleneck towards the productization of ML models. To foster the creation of better notebooks, we developed Pynblint, a static analyzer for Jupyter notebooks written in Python. The tool checks the compliance of notebooks (and surrounding repositories) with a set of empirically validated best practices and provides targeted recommendations when violations are detected.
△ Less
Submitted 24 May, 2022;
originally announced May 2022.
-
Eliciting Best Practices for Collaboration with Computational Notebooks
Authors:
Luigi Quaranta,
Fabio Calefato,
Filippo Lanubile
Abstract:
Despite the widespread adoption of computational notebooks, little is known about best practices for their usage in collaborative contexts. In this paper, we fill this gap by eliciting a catalog of best practices for collaborative data science with computational notebooks. With this aim, we first look for best practices through a multivocal literature review. Then, we conduct interviews with profe…
▽ More
Despite the widespread adoption of computational notebooks, little is known about best practices for their usage in collaborative contexts. In this paper, we fill this gap by eliciting a catalog of best practices for collaborative data science with computational notebooks. With this aim, we first look for best practices through a multivocal literature review. Then, we conduct interviews with professional data scientists to assess their awareness of these best practices. Finally, we assess the adoption of best practices through the analysis of 1,380 Jupyter notebooks retrieved from the Kaggle platform. Findings reveal that experts are mostly aware of the best practices and tend to adopt them in their daily work. Nonetheless, they do not consistently follow all the recommendations as, depending on specific contexts, some are deemed unfeasible or counterproductive due to the lack of proper tool support. As such, we envision the design of notebook solutions that allow data scientists not to have to prioritize exploration and rapid prototy** over writing code of quality.
△ Less
Submitted 15 February, 2022;
originally announced February 2022.
-
KGTorrent: A Dataset of Python Jupyter Notebooks from Kaggle
Authors:
Luigi Quaranta,
Fabio Calefato,
Filippo Lanubile
Abstract:
Computational notebooks have become the tool of choice for many data scientists and practitioners for performing analyses and disseminating results. Despite their increasing popularity, the research community cannot yet count on a large, curated dataset of computational notebooks. In this paper, we fill this gap by introducing KGTorrent, a dataset of Python Jupyter notebooks with rich metadata ret…
▽ More
Computational notebooks have become the tool of choice for many data scientists and practitioners for performing analyses and disseminating results. Despite their increasing popularity, the research community cannot yet count on a large, curated dataset of computational notebooks. In this paper, we fill this gap by introducing KGTorrent, a dataset of Python Jupyter notebooks with rich metadata retrieved from Kaggle, a platform hosting data science competitions for learners and practitioners with any levels of expertise. We describe how we built KGTorrent, and provide instructions on how to use it and refresh the collection to keep it up to date. Our vision is that the research community will use KGTorrent to study how data scientists, especially practitioners, use Jupyter Notebook in the wild and identify potential shortcomings to inform the design of its future extensions.
△ Less
Submitted 18 March, 2021;
originally announced March 2021.
-
Towards Productizing AI/ML Models: An Industry Perspective from Data Scientists
Authors:
Filippo Lanubile,
Fabio Calefato,
Luigi Quaranta,
Maddalena Amoruso,
Fabio Fumarola,
Michele Filannino
Abstract:
The transition from AI/ML models to production-ready AI-based systems is a challenge for both data scientists and software engineers. In this paper, we report the results of a workshop conducted in a consulting company to understand how this transition is perceived by practitioners. Starting from the need for making AI experiments reproducible, the main themes that emerged are related to the use o…
▽ More
The transition from AI/ML models to production-ready AI-based systems is a challenge for both data scientists and software engineers. In this paper, we report the results of a workshop conducted in a consulting company to understand how this transition is perceived by practitioners. Starting from the need for making AI experiments reproducible, the main themes that emerged are related to the use of the Jupyter Notebook as the primary prototy** tool, and the lack of support for software engineering best practices as well as data science specific functionalities.
△ Less
Submitted 18 March, 2021;
originally announced March 2021.
-
EMTk -- The Emotion Mining Toolkit
Authors:
Fabio Calefato,
Filippo Lanubile,
Nicole Novielli,
Luigi Quaranta
Abstract:
The Emotion Mining Toolkit (EMTk) is a suite of modules and datasets offering a comprehensive solution for mining sentiment and emotions from technical text contributed by developers on communication channels. The toolkit is written in Java, Python, and R, and is released under the MIT open source license. In this paper, we describe its architecture and the benchmark against the previous, standalo…
▽ More
The Emotion Mining Toolkit (EMTk) is a suite of modules and datasets offering a comprehensive solution for mining sentiment and emotions from technical text contributed by developers on communication channels. The toolkit is written in Java, Python, and R, and is released under the MIT open source license. In this paper, we describe its architecture and the benchmark against the previous, standalone versions of our sentiment analysis tools. Results show large improvements in terms of speed.
△ Less
Submitted 12 April, 2021; v1 submitted 22 March, 2019;
originally announced March 2019.
-
A Replication Study on Code Comprehension and Expertise using Lightweight Biometric Sensors
Authors:
Davide Fucci,
Daniela Girardi,
Nicole Novielli,
Luigi Quaranta,
Filippo Lanubile
Abstract:
Code comprehension has been recently investigated from physiological and cognitive perspectives through the use of medical imaging. Floyd et al (i.e., the original study) used fMRI to classify the type of comprehension tasks performed by developers and relate such results to their expertise. We replicate the original study using lightweight biometrics sensors which participants (28 undergrads in c…
▽ More
Code comprehension has been recently investigated from physiological and cognitive perspectives through the use of medical imaging. Floyd et al (i.e., the original study) used fMRI to classify the type of comprehension tasks performed by developers and relate such results to their expertise. We replicate the original study using lightweight biometrics sensors which participants (28 undergrads in computer science) wore when performing comprehension tasks on source code and natural language prose. We developed machine learning models to automatically identify what kind of tasks developers are working on leveraging their brain-, heart-, and skin-related signals. The best improvement over the original study performance is achieved using solely the heart signal obtained through a single device (BAC 87% vs. 79.1%). Differently from the original study, we were not able to observe a correlation between the participants' expertise and the classifier performance (tau = 0.16, p = 0.31). Our findings show that lightweight biometric sensors can be used to accurately recognize comprehension tasks opening interesting scenarios for research and practice.
△ Less
Submitted 2 April, 2019; v1 submitted 8 March, 2019;
originally announced March 2019.