-
Please do not go: understanding turnover of software engineers from different perspectives
Authors:
Michelle Larissa Luciano Carvalho,
Paulo da Silva Cruz,
Eduardo Santana de Almeida,
Paulo Anselmo da Mota Silveira Neto,
Rafael Prikladnicki
Abstract:
Turnover consists of moving into and out of professional employees in the company in a given period. Such a phenomenon significantly impacts the software industry since it generates knowledge loss, delays in the schedule, and increased costs in the final project. Despite the efforts made by researchers and professionals to minimize the turnover, more studies are needed to understand the motivation…
▽ More
Turnover consists of moving into and out of professional employees in the company in a given period. Such a phenomenon significantly impacts the software industry since it generates knowledge loss, delays in the schedule, and increased costs in the final project. Despite the efforts made by researchers and professionals to minimize the turnover, more studies are needed to understand the motivation that drives Software Engineers to leave their jobs and the main strategies CEOs adopt to retain these professionals in software development companies. In this paper, we contribute a mixed methods study involving semi-structured interviews with Software Engineers and CEOs to obtain a wider opinion of these professionals about turnover and a subsequent validation survey with additional software engineers to check and review the insights from interviews. In studying such aspects, we identified 19 different reasons for software engineers' turnover and 18 more efficient strategies used in the software development industry to reduce it. Our findings provide several implications for industry and academia, which can drive future research.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Communication-Efficient Byzantine-Resilient Federated Zero-Order Optimization
Authors:
Afonso de Sá Delgado Neto,
Maximilian Egger,
Mayank Bakshi,
Rawad Bitar
Abstract:
We introduce CYBER-0, the first zero-order optimization algorithm for memory-and-communication efficient Federated Learning, resilient to Byzantine faults. We show through extensive numerical experiments on the MNIST dataset and finetuning RoBERTa-Large that CYBER-0 outperforms state-of-the-art algorithms in terms of communication and memory efficiency while reaching similar accuracy. We provide t…
▽ More
We introduce CYBER-0, the first zero-order optimization algorithm for memory-and-communication efficient Federated Learning, resilient to Byzantine faults. We show through extensive numerical experiments on the MNIST dataset and finetuning RoBERTa-Large that CYBER-0 outperforms state-of-the-art algorithms in terms of communication and memory efficiency while reaching similar accuracy. We provide theoretical guarantees on its convergence for convex loss functions.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Unveiling Assumptions: Exploring the Decisions of AI Chatbots and Human Testers
Authors:
Francisco Gomes de Oliveira Neto
Abstract:
The integration of Large Language Models (LLMs) and chatbots introduces new challenges and opportunities for decision-making in software testing. Decision-making relies on a variety of information, including code, requirements specifications, and other software artifacts that are often unclear or exist solely in the developer's mind. To fill in the gaps left by unclear information, we often rely o…
▽ More
The integration of Large Language Models (LLMs) and chatbots introduces new challenges and opportunities for decision-making in software testing. Decision-making relies on a variety of information, including code, requirements specifications, and other software artifacts that are often unclear or exist solely in the developer's mind. To fill in the gaps left by unclear information, we often rely on assumptions, intuition, or previous experiences to make decisions. This paper explores the potential of LLM-based chatbots like Bard, Copilot, and ChatGPT, to support software testers in test decisions such as prioritizing test cases effectively. We investigate whether LLM-based chatbots and human testers share similar "assumptions" or intuition in prohibitive testing scenarios where exhaustive execution of test cases is often impractical. Preliminary results from a survey of 127 testers indicate a preference for diverse test scenarios, with a significant majority (96%) favoring dissimilar test sets. Interestingly, two out of four chatbots mirrored this preference, aligning with human intuition, while the others opted for similar test scenarios, chosen by only 3.9% of testers. Our initial insights suggest a promising avenue within the context of enhancing the collaborative dynamics between testers and chatbots.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
From Human-to-Human to Human-to-Bot Conversations in Software Engineering
Authors:
Ranim Khojah,
Francisco Gomes de Oliveira Neto,
Philipp Leitner
Abstract:
Software developers use natural language to interact not only with other humans, but increasingly also with chatbots. These interactions have different properties and flow differently based on what goal the developer wants to achieve and who they interact with. In this paper, we aim to understand the dynamics of conversations that occur during modern software development after the integration of A…
▽ More
Software developers use natural language to interact not only with other humans, but increasingly also with chatbots. These interactions have different properties and flow differently based on what goal the developer wants to achieve and who they interact with. In this paper, we aim to understand the dynamics of conversations that occur during modern software development after the integration of AI and chatbots, enabling a deeper recognition of the advantages and disadvantages of including chatbot interactions in addition to human conversations in collaborative work. We compile existing conversation attributes with humans and NLU-based chatbots and adapt them to the context of software development. Then, we extend the comparison to include LLM-powered chatbots based on an observational study. We present similarities and differences between human-to-human and human-to-bot conversations, also distinguishing between NLU- and LLM-based chatbots. Furthermore, we discuss how understanding the differences among the conversation styles guides the developer on how to shape their expectations from a conversation and consequently support the communication within a software team. We conclude that the recent conversation styles that we observe with LLM-chatbots can not replace conversations with humans due to certain attributes regarding social aspects despite their ability to support productivity and decrease the developers' mental load.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Breaking Barriers: Investigating the Sense of Belonging Among Women and Non-Binary Students in Software Engineering
Authors:
Lina Boman,
Jonatan Andersson,
Francisco Gomes de Oliveira Neto
Abstract:
Women in computing were among the first programmers in the early 20th century and were substantial contributors to the industry. Today, men dominate the software engineering industry. Research and data show that women are far less likely to pursue a career in this industry, and those that do are less likely than men to stay in it. Reasons for women and other underrepresented minorities to leave th…
▽ More
Women in computing were among the first programmers in the early 20th century and were substantial contributors to the industry. Today, men dominate the software engineering industry. Research and data show that women are far less likely to pursue a career in this industry, and those that do are less likely than men to stay in it. Reasons for women and other underrepresented minorities to leave the industry are a lack of opportunities for growth and advancement, unfair treatment and workplace culture. This research explores how the potential to cultivate or uphold an industry unfavourable to women and non-binary individuals manifests in software engineering education at the university level. For this purpose, the study includes surveys and interviews. We use gender name perception as a survey instrument, and the results show small differences in perceptions of software engineering students based on their gender. Particularly, the survey respondents anchor the values of the male software engineer (Hans) to a variety of technical and non-technical skills, while the same description for a female software engineer (Hanna) is anchored mainly by her managerial skills. With interviews with women and non-binary students, we gain insight on the main barriers to their sense of ambient belonging. The collected data shows that some known barriers from the literature such as tokenism, and stereotype threat, do still exist. However, we find positive factors such as role models and encouragement that strengthen the sense of belonging among these students.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Beyond Code Generation: An Observational Study of ChatGPT Usage in Software Engineering Practice
Authors:
Ranim Khojah,
Mazen Mohamad,
Philipp Leitner,
Francisco Gomes de Oliveira Neto
Abstract:
Large Language Models (LLMs) are frequently discussed in academia and the general public as support tools for virtually any use case that relies on the production of text, including software engineering. Currently there is much debate, but little empirical evidence, regarding the practical usefulness of LLM-based tools such as ChatGPT for engineers in industry. We conduct an observational study of…
▽ More
Large Language Models (LLMs) are frequently discussed in academia and the general public as support tools for virtually any use case that relies on the production of text, including software engineering. Currently there is much debate, but little empirical evidence, regarding the practical usefulness of LLM-based tools such as ChatGPT for engineers in industry. We conduct an observational study of 24 professional software engineers who have been using ChatGPT over a period of one week in their jobs, and qualitatively analyse their dialogues with the chatbot as well as their overall experience (as captured by an exit survey). We find that, rather than expecting ChatGPT to generate ready-to-use software artifacts (e.g., code), practitioners more often use ChatGPT to receive guidance on how to solve their tasks or learn about a topic in more abstract terms. We also propose a theoretical framework for how (i) purpose of the interaction, (ii) internal factors (e.g., the user's personality), and (iii) external factors (e.g., company policy) together shape the experience (in terms of perceived usefulness and trust). We envision that our framework can be used by future research to further the academic discussion on LLM usage by software engineering practitioners, and to serve as a reference point for the design of future empirical LLM research in this domain.
△ Less
Submitted 21 May, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
The use of the open innovation paradigm in the public sector: a systematic review of published studies
Authors:
Joel Alves de Lima Júnior,
Kiev Gama,
Jorge da Silva Correia Neto
Abstract:
The use of the open innovation paradigm has been, over the past years, getting special attention in the public sector. Motivated by an urban environment that is increasingly more complex and challenging, several government agencies have been allocating financial resources and efforts to promote open and participative government initiatives. As a way to try and understand this scenario, a systemati…
▽ More
The use of the open innovation paradigm has been, over the past years, getting special attention in the public sector. Motivated by an urban environment that is increasingly more complex and challenging, several government agencies have been allocating financial resources and efforts to promote open and participative government initiatives. As a way to try and understand this scenario, a systematic review of the literature was conducted, to provide a comprehensive analysis of the scientific papers that were published, seeking to capture, classify, evaluate and synthesize how the use of this paradigm has been put into practice in the public sector. In total, 4,741 preliminary studies were analyzed. From this number, only 37 articles were classified as potentially relevant and moved forward, going through the process of data extraction and analysis. From the data obtained, it was possible to verify that the use of this paradigm started to be reported with a higher frequency in the literature since 2013 and, among the main findings, we highlight the reports of experiences, approach propositions, of understanding how the phenomenon occurs and theoretical reflections. It was also possible to verify that the use of open innovation through social media was one of the pioneer techniques of engagement between the public sector and citizens. In conclusion, the reports confirm that the main challenges of this paradigm applied to the public sector are associated with their respective bureaucratic aspects, therefore lacking a bigger reflection on the procedures and methods to be used in the public sphere.
△ Less
Submitted 8 April, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
Logic-based Explanations for Linear Support Vector Classifiers with Reject Option
Authors:
Francisco Mateus Rocha Filho,
Thiago Alves Rocha,
Reginaldo Pereira Fernandes Ribeiro,
Ajalmar Rêgo da Rocha Neto
Abstract:
Support Vector Classifier (SVC) is a well-known Machine Learning (ML) model for linear classification problems. It can be used in conjunction with a reject option strategy to reject instances that are hard to correctly classify and delegate them to a specialist. This further increases the confidence of the model. Given this, obtaining an explanation of the cause of rejection is important to not bl…
▽ More
Support Vector Classifier (SVC) is a well-known Machine Learning (ML) model for linear classification problems. It can be used in conjunction with a reject option strategy to reject instances that are hard to correctly classify and delegate them to a specialist. This further increases the confidence of the model. Given this, obtaining an explanation of the cause of rejection is important to not blindly trust the obtained results. While most of the related work has developed means to give such explanations for machine learning models, to the best of our knowledge none have done so for when reject option is present. We propose a logic-based approach with formal guarantees on the correctness and minimality of explanations for linear SVCs with reject option. We evaluate our approach by comparing it to Anchors, which is a heuristic algorithm for generating explanations. Obtained results show that our proposed method gives shorter explanations with reduced time cost.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Nanomechanically Induced Transparency
Authors:
E. C. Diniz,
O. P. de Sá Neto
Abstract:
In this paper, we investigate a nanomechanically induced transparency (NIT) effects that arises from the coupling of a nanoelectromechanical system and a trapped ion. By confining the ion in mesoscopic traps and capacitively coupling it with a nanoelectromechanical system suspended as electrodes, the research is intricately focussed on the implications of including the ion's degrees of freedom. Th…
▽ More
In this paper, we investigate a nanomechanically induced transparency (NIT) effects that arises from the coupling of a nanoelectromechanical system and a trapped ion. By confining the ion in mesoscopic traps and capacitively coupling it with a nanoelectromechanical system suspended as electrodes, the research is intricately focussed on the implications of including the ion's degrees of freedom. The Lamb--Dicke approximation is crucial to understanding the effects of phonon exchange with electronic qubits and revealing transparency phenomena in this unique coupling. The results underline the importance of the Lamb--Dicke approximation in modelling the effects of transparency windows in nanoelectromechanical systems.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Detecting Events in Crowds Through Changes in Geometrical Dimensions of Pedestrians
Authors:
Matheus Schreiner Homrich da Silva,
Paulo Brossard de Souza Pinto Neto,
Rodolfo Migon Favaretto,
Soraia Raupp Musse
Abstract:
Security is an important topic in our contemporary world, and the ability to automate the detection of any events of interest that can take place in a crowd is of great interest to a population. We hypothesize that the detection of events in videos is correlated with significant changes in pedestrian behaviors. In this paper, we examine three different scenarios of crowd behavior, containing both…
▽ More
Security is an important topic in our contemporary world, and the ability to automate the detection of any events of interest that can take place in a crowd is of great interest to a population. We hypothesize that the detection of events in videos is correlated with significant changes in pedestrian behaviors. In this paper, we examine three different scenarios of crowd behavior, containing both the cases where an event triggers a change in the behavior of the crowd and two video sequences where the crowd and its motion remain mostly unchanged. With both the videos and the tracking of the individual pedestrians (performed in a pre-processed phase), we use Geomind, a software we developed to extract significant data about the scene, in particular, the geometrical features, personalities, and emotions of each person. We then examine the output, seeking a significant change in the way each person acts as a function of the time, that could be used as a basis to identify events or to model realistic crowd actions. When applied to the games area, our method can use the detected events to find some sort of pattern to be then used in agent simulation. Results indicate that our hypothesis seems valid in the sense that the visually observed events could be automatically detected using GeoMind.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Portuguese FAQ for Financial Services
Authors:
Paulo Finardi,
Wanderley M. Melo,
Edgard D. Medeiros Neto,
Alex F. Mansano,
Pablo B. Costa,
Vinicius F. Caridá
Abstract:
Scarcity of domain-specific data in the Portuguese financial domain has disfavored the development of Natural Language Processing (NLP) applications. To address this limitation, the present study advocates for the utilization of synthetic data generated through data augmentation techniques. The investigation focuses on the augmentation of a dataset sourced from the Central Bank of Brazil FAQ, empl…
▽ More
Scarcity of domain-specific data in the Portuguese financial domain has disfavored the development of Natural Language Processing (NLP) applications. To address this limitation, the present study advocates for the utilization of synthetic data generated through data augmentation techniques. The investigation focuses on the augmentation of a dataset sourced from the Central Bank of Brazil FAQ, employing techniques that vary in semantic similarity. Supervised and unsupervised tasks are conducted to evaluate the impact of augmented data on both low and high semantic similarity scenarios. Additionally, the resultant dataset will be publicly disseminated on the Hugging Face Datasets platform, thereby enhancing accessibility and fostering broader engagement within the NLP research community.
△ Less
Submitted 19 November, 2023;
originally announced November 2023.
-
A note on improving the search of optimal prices in envy-free perfect matchings
Authors:
Marcos Salvatierra,
Juan G. Colonna,
Mario Salvatierra Jr.,
Alcides de C. Amorim Neto
Abstract:
We present a method for finding envy-free prices in a combinatorial auction where the consumers' number $n$ coincides with that of distinct items for sale, each consumer can buy one single item and each item has only one unit available. This is a particular case of the {\it unit-demand envy-free pricing problem}, and was recently revisited by Arbib et al. (2019). These authors proved that using a…
▽ More
We present a method for finding envy-free prices in a combinatorial auction where the consumers' number $n$ coincides with that of distinct items for sale, each consumer can buy one single item and each item has only one unit available. This is a particular case of the {\it unit-demand envy-free pricing problem}, and was recently revisited by Arbib et al. (2019). These authors proved that using a Fibonacci heap for solving the maximum weight perfect matching and the Bellman-Ford algorithm for getting the envy-free prices, the overall time complexity for solving the problem is $O(n^3)$. We propose a method based on dynamic programming design strategy that seeks the optimal envy-free prices by increasing the consumers' utilities, which has the same cubic complexity time as the aforementioned approach, but whose theoretical and empirical results indicate that our method performs faster than the shortest paths strategy, obtaining an average time reduction in determining optimal envy-free prices of approximately 48\%.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
Integrated Design Fabrication and Control of a Bioinspired Multimaterial Soft Robotic Hand
Authors:
Samuel Alves,
Mihail Babcinschi,
Afonso Silva,
Diogo Neto,
Diogo Fonseca,
Pedro Neto
Abstract:
Machines that mimic humans have inspired scientists for centuries. Bio-inspired soft robotic hands are a good example of such an endeavor, featuring intrinsic material compliance and continuous motion to deal with uncertainty and adapt to unstructured environments. Recent research led to impactful achievements in functional designs, modeling, fabrication, and control of soft robots. Nevertheless,…
▽ More
Machines that mimic humans have inspired scientists for centuries. Bio-inspired soft robotic hands are a good example of such an endeavor, featuring intrinsic material compliance and continuous motion to deal with uncertainty and adapt to unstructured environments. Recent research led to impactful achievements in functional designs, modeling, fabrication, and control of soft robots. Nevertheless, the full realization of life-like movements is still challenging to achieve, often based on trial-and-error considerations from design to fabrication, consuming time and resources. In this study, a soft robotic hand is proposed, composed of soft actuator cores and an exoskeleton, featuring a multi-material design aided by finite element analysis (FEA) to define the hand geometry and promote finger's bendability. The actuators are fabricated using molding and the exoskeleton is 3D-printed in a single step. An ON-OFF controller keeps the set fingers' inner pressures related to specific bending angles, even in the presence of leaks. The FEA numerical results were validated by experimental tests, as well as the ability of the hand to grasp objects with different shapes, weights and sizes. This integrated solution will make soft robotic hands more available to people, at a reduced cost, avoiding the time-consuming design-fabrication trial-and-error processes.
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
Identifying Early Help Referrals For Local Authorities With Machine Learning And Bias Analysis
Authors:
Eufrásio de A. Lima Neto,
Jonathan Bailiss,
Axel Finke,
Jo Miller,
Georgina Cosma
Abstract:
Local authorities in England, such as Leicestershire County Council (LCC), provide Early Help services that can be offered at any point in a young person's life when they experience difficulties that cannot be supported by universal services alone, such as schools. This paper investigates the utilisation of machine learning (ML) to assist experts in identifying families that may need to be referre…
▽ More
Local authorities in England, such as Leicestershire County Council (LCC), provide Early Help services that can be offered at any point in a young person's life when they experience difficulties that cannot be supported by universal services alone, such as schools. This paper investigates the utilisation of machine learning (ML) to assist experts in identifying families that may need to be referred for Early Help assessment and support. LCC provided an anonymised dataset comprising 14360 records of young people under the age of 18. The dataset was pre-processed, machine learning models were build, and experiments were conducted to validate and test the performance of the models. Bias mitigation techniques were applied to improve the fairness of these models. During testing, while the models demonstrated the capability to identify young people requiring intervention or early help, they also produced a significant number of false positives, especially when constructed with imbalanced data, incorrectly identifying individuals who most likely did not need an Early Help referral. This paper empirically explores the suitability of data-driven ML models for identifying young people who may require Early Help services and discusses their appropriateness and limitations for this task.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
ISP meets Deep Learning: A Survey on Deep Learning Methods for Image Signal Processing
Authors:
Matheus Henrique Marques da Silva,
Jhessica Victoria Santos da Silva,
Rodrigo Reis Arrais,
Wladimir Barroso Guedes de Araújo Neto,
Leonardo Tadeu Lopes,
Guilherme Augusto Bileki,
Iago Oliveira Lima,
Lucas Borges Rondon,
Bruno Melo de Souza,
Mayara Costa Regazio,
Rodolfo Coelho Dalapicola,
Claudio Filipi Gonçalves dos Santos
Abstract:
The entire Image Signal Processor (ISP) of a camera relies on several processes to transform the data from the Color Filter Array (CFA) sensor, such as demosaicing, denoising, and enhancement. These processes can be executed either by some hardware or via software. In recent years, Deep Learning has emerged as one solution for some of them or even to replace the entire ISP using a single neural ne…
▽ More
The entire Image Signal Processor (ISP) of a camera relies on several processes to transform the data from the Color Filter Array (CFA) sensor, such as demosaicing, denoising, and enhancement. These processes can be executed either by some hardware or via software. In recent years, Deep Learning has emerged as one solution for some of them or even to replace the entire ISP using a single neural network for the task. In this work, we investigated several recent pieces of research in this area and provide deeper analysis and comparison among them, including results and possible points of improvement for future researchers.
△ Less
Submitted 23 May, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature
Authors:
Ana Cláudia Akemi Matsuki de Faria,
Felype de Castro Bastos,
José Victor Nogueira Alves da Silva,
Vitor Lopes Fabris,
Valeska de Sousa Uchoa,
Décio Gonçalves de Aguiar Neto,
Claudio Filipi Goncalves dos Santos
Abstract:
Visual Question Answering (VQA) is an emerging area of interest for researches, being a recent problem in natural language processing and image prediction. In this area, an algorithm needs to answer questions about certain images. As of the writing of this survey, 25 recent studies were analyzed. Besides, 6 datasets were analyzed and provided their link to download. In this work, several recent pi…
▽ More
Visual Question Answering (VQA) is an emerging area of interest for researches, being a recent problem in natural language processing and image prediction. In this area, an algorithm needs to answer questions about certain images. As of the writing of this survey, 25 recent studies were analyzed. Besides, 6 datasets were analyzed and provided their link to download. In this work, several recent pieces of research in this area were investigated and a deeper analysis and comparison among them were provided, including results, the state-of-the-art, common errors, and possible points of improvement for future researchers.
△ Less
Submitted 2 June, 2023; v1 submitted 18 May, 2023;
originally announced May 2023.
-
eXplainable Artificial Intelligence on Medical Images: A Survey
Authors:
Matteus Vargas Simão da Silva,
Rodrigo Reis Arrais,
Jhessica Victoria Santos da Silva,
Felipe Souza Tânios,
Mateus Antonio Chinelatto,
Natalia Backhaus Pereira,
Renata De Paris,
Lucas Cesar Ferreira Domingos,
Rodrigo Dória Villaça,
Vitor Lopes Fabris,
Nayara Rossi Brito da Silva,
Ana Claudia Akemi Matsuki de Faria,
Jose Victor Nogueira Alves da Silva,
Fabiana Cristina Queiroz de Oliveira Marucci,
Francisco Alves de Souza Neto,
Danilo Xavier Silva,
Vitor Yukio Kondo,
Claudio Filipi Gonçalves dos Santos
Abstract:
Over the last few years, the number of works about deep learning applied to the medical field has increased enormously. The necessity of a rigorous assessment of these models is required to explain these results to all people involved in medical exams. A recent field in the machine learning area is explainable artificial intelligence, also known as XAI, which targets to explain the results of such…
▽ More
Over the last few years, the number of works about deep learning applied to the medical field has increased enormously. The necessity of a rigorous assessment of these models is required to explain these results to all people involved in medical exams. A recent field in the machine learning area is explainable artificial intelligence, also known as XAI, which targets to explain the results of such black box models to permit the desired assessment. This survey analyses several recent studies in the XAI field applied to medical diagnosis research, allowing some explainability of the machine learning results in several different diseases, such as cancers and COVID-19.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.
-
Bug Analysis in Jupyter Notebook Projects: An Empirical Study
Authors:
Taijara Loiola de Santana,
Paulo Anselmo da Mota Silveira Neto,
Eduardo Santana de Almeida,
Iftekhar Ahmed
Abstract:
Computational notebooks, such as Jupyter, have been widely adopted by data scientists to write code for analyzing and visualizing data. Despite their growing adoption and popularity, there has been no thorough study to understand Jupyter development challenges from the practitioners' point of view. This paper presents a systematic study of bugs and challenges that Jupyter practitioners face throug…
▽ More
Computational notebooks, such as Jupyter, have been widely adopted by data scientists to write code for analyzing and visualizing data. Despite their growing adoption and popularity, there has been no thorough study to understand Jupyter development challenges from the practitioners' point of view. This paper presents a systematic study of bugs and challenges that Jupyter practitioners face through a large-scale empirical investigation. We mined 14,740 commits from 105 GitHub open-source projects with Jupyter notebook code. Next, we analyzed 30,416 Stack Overflow posts which gave us insights into bugs that practitioners face when develo** Jupyter notebook projects. Finally, we conducted nineteen interviews with data scientists to uncover more details about Jupyter bugs and to gain insights into Jupyter developers' challenges. We propose a bug taxonomy for Jupyter projects based on our results. We also highlight bug categories, their root causes, and the challenges that Jupyter practitioners face.
△ Less
Submitted 13 October, 2022;
originally announced October 2022.
-
Automated Black-Box Boundary Value Detection
Authors:
Felix Dobslaw,
Robert Feldt,
Francisco de Oliveira Neto
Abstract:
The input domain of software systems can typically be divided into sub-domains for which the outputs are similar. To ensure high quality it is critical to test the software on the boundaries between these sub-domains. Consequently, boundary value analysis and testing has been part of the toolbox of software testers for long and is typically taught early to students. However, despite its many argue…
▽ More
The input domain of software systems can typically be divided into sub-domains for which the outputs are similar. To ensure high quality it is critical to test the software on the boundaries between these sub-domains. Consequently, boundary value analysis and testing has been part of the toolbox of software testers for long and is typically taught early to students. However, despite its many argued benefits, boundary value analysis for a given specification or piece of software is typically described in abstract terms which allow for variation in how testers apply it.
Here we propose an automated, black-box boundary value detection method to support software testers in systematic boundary value analysis with consistent results. The method builds on a metric to quantify the level of boundariness of test inputs: the program derivative. By coupling it with search algorithms we find and rank pairs of inputs as good boundary candidates, i.e. inputs close together but with outputs far apart. We implement our AutoBVA approach and evaluate it on a curated dataset of example programs. Our results indicate that even with a simple and generic program derivative variant in combination with broad sampling over the input space, interesting boundary candidates can be identified.
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
Practical Skills Demand Forecasting via Representation Learning of Temporal Dynamics
Authors:
Maysa M. Garcia de Macedo,
Wyatt Clarke,
Eli Lucherini,
Tyler Baldwin,
Dilermando Queiroz Neto,
Rogerio de Paula,
Subhro Das
Abstract:
Rapid technological innovation threatens to leave much of the global workforce behind. Today's economy juxtaposes white-hot demand for skilled labor against stagnant employment prospects for workers unprepared to participate in a digital economy. It is a moment of peril and opportunity for every country, with outcomes measured in long-term capital allocation and the life satisfaction of billions o…
▽ More
Rapid technological innovation threatens to leave much of the global workforce behind. Today's economy juxtaposes white-hot demand for skilled labor against stagnant employment prospects for workers unprepared to participate in a digital economy. It is a moment of peril and opportunity for every country, with outcomes measured in long-term capital allocation and the life satisfaction of billions of workers. To meet the moment, governments and markets must find ways to quicken the rate at which the supply of skills reacts to changes in demand. More fully and quickly understanding labor market intelligence is one route. In this work, we explore the utility of time series forecasts to enhance the value of skill demand data gathered from online job advertisements. This paper presents a pipeline which makes one-shot multi-step forecasts into the future using a decade of monthly skill demand observations based on a set of recurrent neural network methods. We compare the performance of a multivariate model versus a univariate one, analyze how correlation between skills can influence multivariate model results, and present predictions of demand for a selection of skills practiced by workers in the information technology industry.
△ Less
Submitted 18 May, 2022;
originally announced May 2022.
-
Criação e aplicação de ferramenta para auxiliar no ensino de algoritmos e programação de computadores
Authors:
Afonso Henriques Fontes Neto Segundo,
Joel Sotero da Cunha Neto,
Maria Daniela Santabaia Cavalcanti,
Paulo Cirillo Souza Barbosa,
Raul Fontenele Santana
Abstract:
Knowledge about programming is part of the knowledge matrix that will be required of the professionals of the future. Based on this, this work aims to report the development of a teaching tool developed during the monitoring program of the Algorithm and Computer Programming discipline of the University of Fortaleza. The tool combines the knowledge acquired in the books, with a language closer to t…
▽ More
Knowledge about programming is part of the knowledge matrix that will be required of the professionals of the future. Based on this, this work aims to report the development of a teaching tool developed during the monitoring program of the Algorithm and Computer Programming discipline of the University of Fortaleza. The tool combines the knowledge acquired in the books, with a language closer to the students, using video lessons and exercises proposed, with all the content available on the internet. The preliminary results were positive, with the students approving this new approach and believing that it could contribute to a better performance in the discipline.
△ Less
Submitted 31 March, 2022;
originally announced April 2022.
-
Applying PBL in the Development and Modeling of kinematics for Robotic Manipulators with Interdisciplinarity between Computer-Assisted Project, Robotics, and Microcontrollers
Authors:
Afonso Henriques Fontes Neto Segundo,
Joel Sotero da Cunha Neto,
Paulo Cirillo Souza Barbosa,
Raul Fontenele Santana
Abstract:
Considering the difficulty of students in calculating the direct and inverse kinematics of a robotic manipulator using only conventional tools of a classroom, this article proposes the application of Project Based Learning (ABP) through the design, development, mathematical modeling of a robotic manipulator as an integrative project of the disciplines of Industrial Robotics, Microcontrollers and C…
▽ More
Considering the difficulty of students in calculating the direct and inverse kinematics of a robotic manipulator using only conventional tools of a classroom, this article proposes the application of Project Based Learning (ABP) through the design, development, mathematical modeling of a robotic manipulator as an integrative project of the disciplines of Industrial Robotics, Microcontrollers and Computer Assisted Design with students of the Control and Automation Engineering of the University of Fortaleza. Once designed and machined, the manipulator arm was assembled using servo motors connected to a microcontroled prototy** board, to then have its kinematics calculated. At the end are presented the results that the project has brought to the learning of the disciplines on the optics of the tutor and students.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
Development of a robotic manipulator: Applying interdisciplinarity in Computer Assister Project, Microcontrollers and Industrial Robotics
Authors:
Afonso Henriques Fontes Neto Segundo,
Joel Sotero da Cunha Neto,
Reginaldo Florencio da Silva,
Paulo Cirillo Souza Barbosa,
Raul Fontenele Santana
Abstract:
This work was conceived based on Project-Based Learning (ABP) and presents the design, development and mathematical modeling steps of a low-cost robotic manipulator with five degrees of freedom through an interdisciplinary project linking two very important disciplines of the course of Control Engineering and Automation of the University of Fortaleza: Computer Aided Design, Microcontrollers and In…
▽ More
This work was conceived based on Project-Based Learning (ABP) and presents the design, development and mathematical modeling steps of a low-cost robotic manipulator with five degrees of freedom through an interdisciplinary project linking two very important disciplines of the course of Control Engineering and Automation of the University of Fortaleza: Computer Aided Design, Microcontrollers and Industrial Robotics. At the end are presented the results that the project has brought to the best learning of the discipline on the optics of the tutor and students.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
Desenvolvimento de ferramenta de simulação para auxílio no ensino da disciplina de robótica industrial
Authors:
Afonso Henriques Fontes Neto Segundo,
Joel Sotero da Cunha Neto,
Halisson Alves de Oliveira,
Átila Girão de Oliveira,
Reginaldo Florencio da Silva
Abstract:
Currently, robotics is one of the fastest growing areas not only in the industrial sector but also in the consumer and service sectors. Several areas benefit from the technological advancement of robotics, especially the industrial area those benefits from gains in productivity and quality. However, to supply this growing demand it is necessary for the newly graduated professionals to have a deepe…
▽ More
Currently, robotics is one of the fastest growing areas not only in the industrial sector but also in the consumer and service sectors. Several areas benefit from the technological advancement of robotics, especially the industrial area those benefits from gains in productivity and quality. However, to supply this growing demand it is necessary for the newly graduated professionals to have a deeper understanding of how to design and control a robotic manipulator. It is logical that in order to obtain this more in-depth knowledge of robotics, it is necessary to have an experience with a real robotic manipulator, since the practice is a much more efficient way of learning than theory. However, it is known that a robotic arm is not a cheap investment, and its maintenance is not cheap either. Therefore, many educational institutions are not able to provide this type of experience to their students. With this in mind, and through the use of Unity 3D, which is a game development software, a robotic arm simulator has been developed to correlate classroom theory with what actually happens in practice. The robotic manipulators implemented on this simulator can be controlled by both inverse kinematics (which is the industry standard) and direct kinematics.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
Parametrized constant-depth quantum neuron
Authors:
Jonathan H. A. de Carvalho,
Fernando M. de Paula Neto
Abstract:
Quantum computing has been revolutionizing the development of algorithms. However, only noisy intermediate-scale quantum devices are available currently, which imposes several restrictions on the circuit implementation of quantum algorithms. In this paper, we propose a framework that builds quantum neurons based on kernel machines, where the quantum neurons differ from each other by their feature…
▽ More
Quantum computing has been revolutionizing the development of algorithms. However, only noisy intermediate-scale quantum devices are available currently, which imposes several restrictions on the circuit implementation of quantum algorithms. In this paper, we propose a framework that builds quantum neurons based on kernel machines, where the quantum neurons differ from each other by their feature space map**s. Besides contemplating previous schemes, our generalized framework can instantiate quantum neurons with other feature map**s. We present here a neuron that applies a tensor-product feature map** to an exponentially larger space. The proposed neuron is implemented by a circuit of constant depth with a linear number of elementary single-qubit gates. The existing neuron applies a phase-based feature map** with an exponentially expensive circuit implementation, even using multi-qubit gates. Additionally, the proposed neuron has parameters that can change its activation function shape. Here, we show the activation function shape of each quantum neuron. It turns out that parametrization allows the proposed neuron to optimally fit underlying patterns that the existing neuron cannot fit, as demonstrated in the toy problems addressed here. The feasibility of those quantum neuron solutions is also contemplated in the demonstration through executions on a quantum simulator. Finally, we compare those kernel-based quantum neurons in the problem of handwritten digit recognition, where the performances of quantum neurons that implement classical activation functions are also contrasted here. The repeated evidence of the parametrization potential achieved in real-life problems allows concluding that this work provides a quantum neuron with improved discriminative abilities. As a consequence, the generalized framework of quantum neurons can contribute toward practical quantum advantage.
△ Less
Submitted 28 September, 2023; v1 submitted 24 February, 2022;
originally announced February 2022.
-
An Open Platform for Research about Cognitive Load in Virtual Reality
Authors:
Olivier Augereau,
Gabriel Brocheton,
Pedro Paulo Do Prado Neto
Abstract:
The cognitive load can be used to assess if someone is struggling while performing a task. It can be used in many different situations such as in driving, piloting, studying, playing, working, etc. This information can help to design better systems and even to create interactive systems that can be aware of the user's cognitive load and adapt itself to the user. We propose an open source platform…
▽ More
The cognitive load can be used to assess if someone is struggling while performing a task. It can be used in many different situations such as in driving, piloting, studying, playing, working, etc. This information can help to design better systems and even to create interactive systems that can be aware of the user's cognitive load and adapt itself to the user. We propose an open source platform that can be used for doing research about cognitive load in virtual reality (VR). Our platform can be used for stimulating cognitive load through several VR scenes and for analyzing cognitive load through objective and subjective measurements.
△ Less
Submitted 17 January, 2022;
originally announced January 2022.
-
Automated Support for Unit Test Generation: A Tutorial Book Chapter
Authors:
Afonso Fontes,
Gregory Gay,
Francisco Gomes de Oliveira Neto,
Robert Feldt
Abstract:
Unit testing is a stage of testing where the smallest segment of code that can be tested in isolation from the rest of the system - often a class - is tested. Unit tests are typically written as executable code, often in a format provided by a unit testing framework such as pytest for Python.
Creating unit tests is a time and effort-intensive process with many repetitive, manual elements. To ill…
▽ More
Unit testing is a stage of testing where the smallest segment of code that can be tested in isolation from the rest of the system - often a class - is tested. Unit tests are typically written as executable code, often in a format provided by a unit testing framework such as pytest for Python.
Creating unit tests is a time and effort-intensive process with many repetitive, manual elements. To illustrate how AI can support unit testing, this chapter introduces the concept of search-based unit test generation. This technique frames the selection of test input as an optimization problem - we seek a set of test cases that meet some measurable goal of a tester - and unleashes powerful metaheuristic search algorithms to identify the best possible test cases within a restricted timeframe. This chapter introduces two algorithms that can generate pytest-formatted unit tests, tuned towards coverage of source code statements. The chapter concludes by discussing more advanced concepts and gives pointers to further reading for how artificial intelligence can support developers and testers when unit testing software.
△ Less
Submitted 26 October, 2021;
originally announced October 2021.
-
EsmamDS: A more diverse exceptional survival model mining approach
Authors:
Juliana Barcellos Mattos,
Paulo S. G. de Mattos Neto,
Renato Vimieiro
Abstract:
A variety of works in the literature strive to uncover the factors associated with survival behaviour. However, the computational tools to provide such information are global models designed to predict if or when a (survival) event will occur. When approaching the problem of explaining differences in survival behaviour, those approaches rely on (assumptions of) predictive features followed by risk…
▽ More
A variety of works in the literature strive to uncover the factors associated with survival behaviour. However, the computational tools to provide such information are global models designed to predict if or when a (survival) event will occur. When approaching the problem of explaining differences in survival behaviour, those approaches rely on (assumptions of) predictive features followed by risk stratification. In other words, they lack the ability to discover new information on factors related to survival. In contrast, we approach such a problem from the perspective of descriptive supervised pattern mining to discover local patterns associated with different survival behaviours. Hence, we introduce the EsmamDS algorithm: an Exceptional Model Mining framework to provide straightforward characterisations of subgroups presenting unusual survival models -- given by the Kaplan-Meier estimates. This work builds on the Esmam algorithm to address the problem of pattern redundancy and provide a more informative and diverse characterisation of survival behaviour.
△ Less
Submitted 6 September, 2021;
originally announced September 2021.
-
STN PLAD: A Dataset for Multi-Size Power Line Assets Detection in High-Resolution UAV Images
Authors:
André Luiz Buarque Vieira-e-Silva,
Heitor Felix,
Thiago de Menezes Chaves,
Francisco Paulo Magalhães Simões,
Veronica Teichrieb,
Michel Mozinho dos Santos,
Hemir da Cunha Santiago,
Virginia Adélia Cordeiro Sgotti,
Henrique Baptista Duffles Teixeira Lott Neto
Abstract:
Many power line companies are using UAVs to perform their inspection processes instead of putting their workers at risk by making them climb high voltage power line towers, for instance. A crucial task for the inspection is to detect and classify assets in the power transmission lines. However, public data related to power line assets are scarce, preventing a faster evolution of this area. This wo…
▽ More
Many power line companies are using UAVs to perform their inspection processes instead of putting their workers at risk by making them climb high voltage power line towers, for instance. A crucial task for the inspection is to detect and classify assets in the power transmission lines. However, public data related to power line assets are scarce, preventing a faster evolution of this area. This work proposes the Power Line Assets Dataset, containing high-resolution and real-world images of multiple high-voltage power line components. It has 2,409 annotated objects divided into five classes: transmission tower, insulator, spacer, tower plate, and Stockbridge damper, which vary in size (resolution), orientation, illumination, angulation, and background. This work also presents an evaluation with popular deep object detection methods, showing considerable room for improvement. The STN PLAD dataset is publicly available at https://github.com/andreluizbvs/PLAD.
△ Less
Submitted 2 September, 2021; v1 submitted 17 August, 2021;
originally announced August 2021.
-
TRANSMUT-SPARK: Transformation Mutation for Apache Spark
Authors:
Joao Batista de Souza Neto,
Anamaria Martins Moreira,
Genoveva Vargas-Solar,
Martin A. Musicante
Abstract:
We propose TRANSMUT-Spark, a tool that automates the mutation testing process of Big Data processing code within Spark programs. Apache Spark is an engine for Big Data Processing. It hides the complexity inherent to Big Data parallel and distributed programming and processing through built-in functions, underlying parallel processes, and data management strategies. Nonetheless, programmers must cl…
▽ More
We propose TRANSMUT-Spark, a tool that automates the mutation testing process of Big Data processing code within Spark programs. Apache Spark is an engine for Big Data Processing. It hides the complexity inherent to Big Data parallel and distributed programming and processing through built-in functions, underlying parallel processes, and data management strategies. Nonetheless, programmers must cleverly combine these functions within programs and guide the engine to use the right data management strategies to exploit the large number of computational resources required by Big Data processing and avoid substantial production losses. Many programming details in data processing code within Spark programs are prone to false statements that need to be correctly and automatically tested. This paper explores the application of mutation testing in Spark programs, a fault-based testing technique that relies on fault simulation to evaluate and design test sets. The paper introduces the TRANSMUT-Spark solution for testing Spark programs. TRANSMUT-Spark automates the most laborious steps of the process and fully executes the mutation testing process. The paper describes how the tool automates the mutants generation, test execution, and adequacy analysis phases of mutation testing with TRANSMUT-Spark. It also discusses the results of experiments that were carried out to validate the tool to argue its scope and limitations.
△ Less
Submitted 5 August, 2021;
originally announced August 2021.
-
An Abstract View of Big Data Processing Programs
Authors:
Joao Batista de Souza Neto,
Anamaria Martins Moreira,
Genoveva Vargas-Solar,
Martin A. Musicante
Abstract:
This paper proposes a model for specifying data flow based parallel data processing programs agnostic of target Big Data processing frameworks. The paper focuses on the formal abstract specification of non-iterative and iterative programs, generalizing the strategies adopted by data flow Big Data processing frameworks. The proposed model relies on monoid AlgebraandPetri Netstoabstract Big Data pro…
▽ More
This paper proposes a model for specifying data flow based parallel data processing programs agnostic of target Big Data processing frameworks. The paper focuses on the formal abstract specification of non-iterative and iterative programs, generalizing the strategies adopted by data flow Big Data processing frameworks. The proposed model relies on monoid AlgebraandPetri Netstoabstract Big Data processing programs in two levels: a high level representing the program data flow and a lower level representing data transformation operations (e.g., filtering, aggregation, join). We extend the model for data processing programs proposed in [1], to enable the use of iterative programs. The general specification of iterative data processing programs implemented by data flow-based parallel programming models is essential given the democratization of iterative and greedy Big Data analytics algorithms. Indeed, these algorithms call for revisiting parallel programming models to express iterations. The paper gives a comparative analysis of the iteration strategies proposed byApache Spark, DryadLINQ, Apache Beam and Apache Flink. It discusses how the model achieves to generalize these strategies.
△ Less
Submitted 5 August, 2021;
originally announced August 2021.
-
On Applying the Lackadaisical Quantum Walk Algorithm to Search for Multiple Solutions on Grids
Authors:
Jonathan H. A. de Carvalho,
Luciano S. de Souza,
Fernando M. de Paula Neto,
Tiago A. E. Ferreira
Abstract:
Quantum computing promises to improve the information processing power to levels unreachable by classical computation. Quantum walks are heading the development of quantum algorithms for searching information on graphs more efficiently than their classical counterparts. A quantum-walk-based algorithm standing out in the literature is the lackadaisical quantum walk. The lackadaisical quantum walk i…
▽ More
Quantum computing promises to improve the information processing power to levels unreachable by classical computation. Quantum walks are heading the development of quantum algorithms for searching information on graphs more efficiently than their classical counterparts. A quantum-walk-based algorithm standing out in the literature is the lackadaisical quantum walk. The lackadaisical quantum walk is an algorithm developed to search graph structures whose vertices have a self-loop of weight $l$. This paper addresses several issues related to applying the lackadaisical quantum walk to search for multiple solutions on grids successfully. Firstly, we show that only one of the two stop** conditions found in the literature is suitable for simulations. We also demonstrate that the final success probability depends on both the space density of solutions and the relative distance between solutions. Furthermore, this work generalizes the lackadaisical quantum walk to search for multiple solutions on grids of arbitrary dimensions. In addition, we propose an optimal adjustment of the self-loop weight $l$ for such $d$-dimensional grids. It turns out other fits of $l$ found in the literature are particular cases. Finally, we observe a two-to-one relation between the steps of the lackadaisical quantum walk and Grover's algorithm, which requires modifications in the stop** condition. In conclusion, this work deals with practical issues one should consider when applying the lackadaisical quantum walk, besides expanding the technique to a broader range of search problems.
△ Less
Submitted 9 January, 2023; v1 submitted 11 June, 2021;
originally announced June 2021.
-
Integração e Entrega Contínua para aplicações móveis desenvolvidas em React Native
Authors:
Pedro José de Souza Neto,
Vinicius Cardoso Garcia
Abstract:
Continuous integration and continuous delivery are not new for developers who create web applications, however in the development of mobile applications this practice is still not very common mainly because of the challenges during the process of distributing the application. In the face of the growing number of applications, a greater requirement for quality and ever-shorter delivery times, deliv…
▽ More
Continuous integration and continuous delivery are not new for developers who create web applications, however in the development of mobile applications this practice is still not very common mainly because of the challenges during the process of distributing the application. In the face of the growing number of applications, a greater requirement for quality and ever-shorter delivery times, delivering a healthy code is often extremely important to keep up with the competition. The purpose of this work is to implement an integration and continuous delivery pipeline for mobile applications developed in React Native. It intends to automate the process of build and delivery of applications developed with this technology.
△ Less
Submitted 30 March, 2021;
originally announced March 2021.
-
Software Development During COVID-19 Pandemic: an Analysis of Stack Overflow and GitHub
Authors:
Pedro Almir Martins de Oliveira,
Pedro de Alcântara dos Santos Neto,
Gleison Silva,
Irvayne Ibiapina,
Werney Lira,
Rossana Maria de Castro Andrade
Abstract:
The new coronavirus became a severe health issue for the world. This situation has motivated studies of different areas to combat this pandemic. In software engineering, we point out data visualization projects to follow the disease evolution, machine learning to estimate the pandemic behavior, and computer vision processing radiologic images. Most of these projects are stored in version control s…
▽ More
The new coronavirus became a severe health issue for the world. This situation has motivated studies of different areas to combat this pandemic. In software engineering, we point out data visualization projects to follow the disease evolution, machine learning to estimate the pandemic behavior, and computer vision processing radiologic images. Most of these projects are stored in version control systems, and there are discussions about them in Question & Answer websites. In this work, we conducted a Mining Software Repository on a large number of questions and projects aiming to find trends that could help researchers and practitioners to fight against the coronavirus. We analyzed 1,190 questions from Stack Overflow and Data Science Q\&A and 60,352 GitHub projects. We identified a correlation between the questions and projects throughout the pandemic. The main questions about coronavirus are how-to, related to web scra** and data visualization, using Python, JavaScript, and R. The most recurrent GitHub projects are machine learning projects, using JavaScript, Python, and Java.
△ Less
Submitted 9 March, 2021;
originally announced March 2021.
-
Using mutation testing to measure behavioural test diversity
Authors:
Francisco Gomes de Oliveira Neto,
Felix Dobslaw,
Robert Feldt
Abstract:
Diversity has been proposed as a key criterion to improve testing effectiveness and efficiency.It can be used to optimise large test repositories but also to visualise test maintenance issues and raise practitioners' awareness about waste in test artefacts and processes. Even though these diversity-based testing techniques aim to exercise diverse behavior in the system under test (SUT), the divers…
▽ More
Diversity has been proposed as a key criterion to improve testing effectiveness and efficiency.It can be used to optimise large test repositories but also to visualise test maintenance issues and raise practitioners' awareness about waste in test artefacts and processes. Even though these diversity-based testing techniques aim to exercise diverse behavior in the system under test (SUT), the diversity has mainly been measured on and between artefacts (e.g., inputs, outputs or test scripts). Here, we introduce a family of measures to capture behavioural diversity (b-div) of test cases by comparing their executions and failure outcomes. Using failure information to capture the SUT behaviour has been shown to improve effectiveness of history-based test prioritisation approaches. However, history-based techniques require reliable test execution logs which are often not available or can be difficult to obtain due to flaky tests, scarcity of test executions, etc. To be generally applicable we instead propose to use mutation testing to measure behavioral diversity by running the set of test cases on various mutated versions of the SUT. Concretely, we propose two specific b-div measures (based on accuracy and Matthew's correlation coefficient, respectively) and compare them with artefact-based diversity (a-div) for prioritising the test suites of 6 different open-source projects. Our results show that our b-div measures outperform a-div and random selection in all of the studied projects. The improvement is substantial with an average increase in average percentage of faults detected (APFD) of between 19% to 31% depending on the size of the subset of prioritised tests.
△ Less
Submitted 18 October, 2020;
originally announced October 2020.
-
STULL: Unbiased Online Sampling for Visual Exploration of Large Spatiotemporal Data
Authors:
Guizhen Wang,
**g**g Guo,
Mingjie Tang,
José Florencio de Queiroz Neto,
Calvin Yau,
Anas Daghistani,
Morteza Karimzadeh,
Walid G. Aref,
David S. Ebert
Abstract:
Online sampling-supported visual analytics is increasingly important, as it allows users to explore large datasets with acceptable approximate answers at interactive rates. However, existing online spatiotemporal sampling techniques are often biased, as most researchers have primarily focused on reducing computational latency. Biased sampling approaches select data with unequal probabilities and p…
▽ More
Online sampling-supported visual analytics is increasingly important, as it allows users to explore large datasets with acceptable approximate answers at interactive rates. However, existing online spatiotemporal sampling techniques are often biased, as most researchers have primarily focused on reducing computational latency. Biased sampling approaches select data with unequal probabilities and produce results that do not match the exact data distribution, leading end users to incorrect interpretations. In this paper, we propose a novel approach to perform unbiased online sampling of large spatiotemporal data. The proposed approach ensures the same probability of selection to every point that qualifies the specifications of a user's multidimensional query. To achieve unbiased sampling for accurate representative interactive visualizations, we design a novel data index and an associated sample retrieval plan. Our proposed sampling approach is suitable for a wide variety of visual analytics tasks, e.g., tasks that run aggregate queries of spatiotemporal data. Extensive experiments confirm the superiority of our approach over a state-of-the-art spatial online sampling technique, demonstrating that within the same computational time, data samples generated in our approach are at least 50% more accurate in representing the actual spatial distribution of the data and enable approximate visualizations to present closer visual appearances to the exact ones.
△ Less
Submitted 29 August, 2020;
originally announced August 2020.
-
A Deep Dive on the Impact of COVID-19 in Software Development
Authors:
Paulo Anselmo da Mota Silveira Neto,
Umme Ayda Mannan,
Eduardo Santana de Almeida,
Nachiappan Nagappan,
David Lo,
Pavneet Singh Kochhar,
Cuiyun Gao,
Iftekhar Ahmed
Abstract:
Context: COVID-19 pandemic has impacted different business sectors around the world. Objective. This study investigates the impact of COVID-19 on software projects and software development professionals. Method: We conducted a mining software repository study based on 100 GitHub projects developed in Java using ten different metrics. Next, we surveyed 279 software development professionals for bet…
▽ More
Context: COVID-19 pandemic has impacted different business sectors around the world. Objective. This study investigates the impact of COVID-19 on software projects and software development professionals. Method: We conducted a mining software repository study based on 100 GitHub projects developed in Java using ten different metrics. Next, we surveyed 279 software development professionals for better understanding the impact of COVID-19 on daily activities and wellbeing. Results: We identified 12 observations related to productivity, code quality, and wellbeing. Conclusions: Our findings highlight that the impact of COVID-19 is not binary (reduce productivity vs. increase productivity) but rather a spectrum. For many of our observations, substantial proportions of respondents have differing opinions from each other. We believe that more research is needed to uncover specific conditions that cause certain outcomes to be more prevalent.
△ Less
Submitted 16 August, 2020;
originally announced August 2020.
-
Deep Learning Brasil -- NLP at SemEval-2020 Task 9: Overview of Sentiment Analysis of Code-Mixed Tweets
Authors:
Manoel Veríssimo dos Santos Neto,
Ayrton Denner da Silva Amaral,
Nádia Félix Felipe da Silva,
Anderson da Silva Soares
Abstract:
In this paper, we describe a methodology to predict sentiment in code-mixed tweets (hindi-english). Our team called verissimo.manoel in CodaLab developed an approach based on an ensemble of four models (MultiFiT, BERT, ALBERT, and XLNET). The final classification algorithm was an ensemble of some predictions of all softmax values from these four models. This architecture was used and evaluated in…
▽ More
In this paper, we describe a methodology to predict sentiment in code-mixed tweets (hindi-english). Our team called verissimo.manoel in CodaLab developed an approach based on an ensemble of four models (MultiFiT, BERT, ALBERT, and XLNET). The final classification algorithm was an ensemble of some predictions of all softmax values from these four models. This architecture was used and evaluated in the context of the SemEval 2020 challenge (task 9), and our system got 72.7% on the F1 score.
△ Less
Submitted 28 July, 2020;
originally announced August 2020.
-
Critical Point Calculations by Numerical Inversion of Functions
Authors:
C. N. Parajara,
G. M. Platt,
F. D. Moura Neto,
M. Escobar,
G. B. Libotte
Abstract:
In this work, we propose a new approach to the problem of critical point calculation, based on the formulation of Heidemann and Khalil (1980). This leads to a $2 \times 2$ system of nonlinear algebraic equations in temperature and molar volume, which makes possible the prediction of critical points of the mixture through an adaptation of the technique of inversion of functions from the plane to th…
▽ More
In this work, we propose a new approach to the problem of critical point calculation, based on the formulation of Heidemann and Khalil (1980). This leads to a $2 \times 2$ system of nonlinear algebraic equations in temperature and molar volume, which makes possible the prediction of critical points of the mixture through an adaptation of the technique of inversion of functions from the plane to the plane, proposed by Malta, Saldanha, and Tomei (1993). The results are compared to those obtained by three methodologies: ($i$) the classical method of Heidemann and Khalil (1980), which uses a double-loop structure, also in terms of temperature and molar volume; ($ii$) the algorithm of Dimitrakopoulos, Jia, and Li (2014), which employs a damped Newton algorithm and ($iii$) the methodology proposed by Nichita and Gomez (2010), based on a stochastic algorithm. The proposed methodology proves to be robust and accurate in the prediction of critical points, as well as provides a global view of the nonlinear problem.
△ Less
Submitted 30 May, 2020;
originally announced June 2020.
-
An Empirical Study of Bots in Software Development -- Characteristics and Challenges from a Practitioner's Perspective
Authors:
Linda Erlenhov,
Francisco Gomes de Oliveira Neto,
Philipp Leitner
Abstract:
Software engineering bots - automated tools that handle tedious tasks - are increasingly used by industrial and open source projects to improve developer productivity. Current research in this area is held back by a lack of consensus of what software engineering bots (DevBots) actually are, what characteristics distinguish them from other tools, and what benefits and challenges are associated with…
▽ More
Software engineering bots - automated tools that handle tedious tasks - are increasingly used by industrial and open source projects to improve developer productivity. Current research in this area is held back by a lack of consensus of what software engineering bots (DevBots) actually are, what characteristics distinguish them from other tools, and what benefits and challenges are associated with DevBot usage. In this paper we report on a mixed-method empirical study of DevBot usage in industrial practice. We report on findings from interviewing 21 and surveying a total of 111 developers. We identify three different personas among DevBot users (focusing on autonomy, chat interfaces, and "smartness"), each with different definitions of what a DevBot is, why developers use them, and what they struggle with. We conclude that future DevBot research should situate their work within our framework, to clearly identify what type of bot the work targets, and what advantages practitioners can expect. Further, we find that there currently is a lack of general purpose "smart" bots that go beyond simple automation tools or chat interfaces. This is problematic, as we have seen that such bots, if available, can have a transformative effect on the projects that use them.
△ Less
Submitted 29 October, 2020; v1 submitted 28 May, 2020;
originally announced May 2020.
-
Challenges and guidelines on designing test cases for test bots
Authors:
Linda Erlenhov,
Francisco Gomes de Oliveira Neto,
Martin Chukaleski,
Samer Daknache
Abstract:
Test bots are automated testing tools that autonomously and periodically run a set of test cases that check whether the system under test meets the requirements set forth by the customer. The automation decreases the amount of time a development team spends on testing. As development projects become larger, it is important to focus on improving the test bots by designing more effective test cases…
▽ More
Test bots are automated testing tools that autonomously and periodically run a set of test cases that check whether the system under test meets the requirements set forth by the customer. The automation decreases the amount of time a development team spends on testing. As development projects become larger, it is important to focus on improving the test bots by designing more effective test cases because otherwise time and usage costs can increase greatly and misleading conclusions from test results might be drawn, such as false positives in the test execution. However, literature currently lacks insights on how test case design affects the effectiveness of test bots. This paper uses a case study approach to investigate those effects by identifying challenges in designing tests for test bots. Our results include guidelines for test design schema for such bots that support practitioners in overcoming the challenges mentioned by participants during our study.
△ Less
Submitted 21 April, 2020;
originally announced April 2020.
-
Data integration and prediction models of photovoltaic production from Brazilian northeastern
Authors:
Hugo Abreu Mendes,
Henrique Ferreira Nunes,
Manoel da Nobrega Marinho,
Paulo Salgado Gomes de Mattos Neto
Abstract:
All productive branches of society need an estimate to be able to control their expenses well. In the energy business, electric utilities use this information to control the power flow in the grid. For better energy production estimation of photovoltaic systems, it is necessary to join multiples geospatial and meteorological variables. This work proposes the creation of a satellite data integratio…
▽ More
All productive branches of society need an estimate to be able to control their expenses well. In the energy business, electric utilities use this information to control the power flow in the grid. For better energy production estimation of photovoltaic systems, it is necessary to join multiples geospatial and meteorological variables. This work proposes the creation of a satellite data integration platform, with production estimation models, base stations measurement and actual production capacity. This work presents statistical, probabilistic and artificial intelligence models that generate spatial and temporal production estimates that could improve production gains as well as facilitate the monitoring and supervision of new enterprises are presented.
△ Less
Submitted 6 March, 2020; v1 submitted 29 January, 2020;
originally announced January 2020.
-
Hybrid Coded Replication in LoRa Networks
Authors:
Jean Michel de Souza Sant'Ana,
Arliones Hoeller,
Richard Demo Souza,
Samuel Montejo-Sánchez,
Hirley Alves,
Mario de Noronha Neto
Abstract:
Low Power Wide Area Networks (LPWAN) are wireless connectivity solutions for Internet-of-Things (IoT) applications, including industrial automation. Among the several LPWAN technologies, LoRaWAN has been extensively addressed by the research community and the industry. However, the reliability and scalability of LoRaWAN are still uncertain. One of the techniques to increase the reliability of LoRa…
▽ More
Low Power Wide Area Networks (LPWAN) are wireless connectivity solutions for Internet-of-Things (IoT) applications, including industrial automation. Among the several LPWAN technologies, LoRaWAN has been extensively addressed by the research community and the industry. However, the reliability and scalability of LoRaWAN are still uncertain. One of the techniques to increase the reliability of LoRaWAN is message replication, which exploits time diversity. This paper proposes a novel hybrid coded message replication scheme that interleaves simple repetition and a recently proposed coded replication method. We analyze the optimization of the proposed scheme under minimum reliability requirements and show that it enhances the network performance without requiring additional transmit power compared to the competing replication techniques.
△ Less
Submitted 9 January, 2020;
originally announced January 2020.
-
Boundary Value Exploration for Software Analysis
Authors:
Felix Dobslaw,
Francisco Gomes de Oliveira Neto,
Robert Feldt
Abstract:
For software to be reliable and resilient, it is widely accepted that tests must be created and maintained alongside the software itself. One safeguard from vulnerabilities and failures in code is to ensure correct behavior on the boundaries between the input space sub-domains. So-called boundary value analysis (BVA) and boundary value testing (BVT) techniques aim to exercise those boundaries and…
▽ More
For software to be reliable and resilient, it is widely accepted that tests must be created and maintained alongside the software itself. One safeguard from vulnerabilities and failures in code is to ensure correct behavior on the boundaries between the input space sub-domains. So-called boundary value analysis (BVA) and boundary value testing (BVT) techniques aim to exercise those boundaries and increase test effectiveness. However, the concepts of BVA and BVT themselves are not generally well defined, and it is not clear how to identify relevant sub-domains, and thus the boundaries delineating them, given a specification. This has limited adoption and hindered automation. We clarify BVA and BVT and introduce Boundary Value Exploration (BVE) to describe techniques that support them by hel** to detect and identify boundary inputs. Additionally, we propose two concrete BVE techniques based on information-theoretic distance functions: (i) an algorithm for boundary detection and (ii) the usage of software visualization to explore the behavior of the software under test and identify its boundary behavior. As an initial evaluation, we apply these techniques on a much used and well-tested date handling library. Our results reveal questionable behavior at boundaries highlighted by our techniques. In conclusion, we argue that the boundary value exploration that our techniques enable is a step towards automated boundary value analysis and testing, fostering their wider use and improving test effectiveness and efficiency.
△ Less
Submitted 12 October, 2020; v1 submitted 18 January, 2020;
originally announced January 2020.
-
Estimating Return on Investment for GUI Test Automation Tools
Authors:
Felix Dobslaw,
Robert Feldt,
David Michaelsson,
Patrick Haar,
Francisco G. de Oliveira Neto,
Richard Torkar
Abstract:
Automated graphical user interface (GUI) tests can reduce manual testing activities and increase test frequency. This motivates the conversion of manual test cases into automated GUI tests. However, it is not clear whether such automation is cost-effective given that GUI automation scripts add to the code base and demand maintenance as a system evolves. In this paper, we introduce a method for est…
▽ More
Automated graphical user interface (GUI) tests can reduce manual testing activities and increase test frequency. This motivates the conversion of manual test cases into automated GUI tests. However, it is not clear whether such automation is cost-effective given that GUI automation scripts add to the code base and demand maintenance as a system evolves. In this paper, we introduce a method for estimating maintenance cost and Return on Investment (ROI) for Automated GUI Testing (AGT). The method utilizes the existing source code change history and can be used for evaluation also of other testing or quality assurance automation technologies. We evaluate the method for a real-world, industrial software system and compare two fundamentally different AGT tools, namely Selenium and EyeAutomate, to estimate and compare their ROI. We also report on their defect-finding capabilities and usability. The quantitative data is complemented by interviews with employees at the case company. The method was successfully applied and estimated maintenance cost and ROI for both tools are reported. Overall, the study supports earlier results showing that implementation time is the leading cost for introducing AGT. The findings further suggest that while EyeAutomate tests are significantly faster to implement, Selenium tests require more of a programming background but less maintenance.
△ Less
Submitted 1 November, 2019; v1 submitted 8 July, 2019;
originally announced July 2019.
-
AllocTC-Sharing: A New Bandwidth Allocation Model for DS-TE Networks
Authors:
Rafael F. Reale,
Walter da C. P. neto,
Joberto S. B. Martins
Abstract:
DiffServ-aware MPLS-TE (DS-TE) allows bandwidth reservation for Traffic Classes (TCs) in MPLS-based engineered networks and, as such, improves the basic MPLS-TE model. In DS-TE networks, per-Class quality of service guarantees are provided while being possible to achieve improved network utilization. DS-TE requires the use of a Bandwidth Allocation Model (BAM) that establishes the amount of bandwi…
▽ More
DiffServ-aware MPLS-TE (DS-TE) allows bandwidth reservation for Traffic Classes (TCs) in MPLS-based engineered networks and, as such, improves the basic MPLS-TE model. In DS-TE networks, per-Class quality of service guarantees are provided while being possible to achieve improved network utilization. DS-TE requires the use of a Bandwidth Allocation Model (BAM) that establishes the amount of bandwidth per-Class and any eventual sharing among them. This paper proposes a new bandwidth allocation model (AllocTC-Sharing) in which the higher priority traffic classes are allowed to use non allocated resources of lower priority traffic classes and vice versa. By adopting this dual sense allocation strategy for dynamic bandwidth allocation, it is shown that AllocTC-Sharing model preserves bandwidth constraints for traffic classes and improves overall link utilization.
△ Less
Submitted 16 April, 2019;
originally announced April 2019.
-
A Method to Assess and Argue for Practical Significance in Software Engineering
Authors:
Richard Torkar,
Carlo A. Furia,
Robert Feldt,
Francisco Gomes de Oliveira Neto,
Lucas Gren,
Per Lenberg,
Neil A. Ernst
Abstract:
A key goal of empirical research in software engineering is to assess practical significance, which answers whether the observed effects of some compared treatments show a relevant difference in practice in realistic scenarios. Even though plenty of standard techniques exist to assess statistical significance, connecting it to practical significance is not straightforward or routinely done; indeed…
▽ More
A key goal of empirical research in software engineering is to assess practical significance, which answers whether the observed effects of some compared treatments show a relevant difference in practice in realistic scenarios. Even though plenty of standard techniques exist to assess statistical significance, connecting it to practical significance is not straightforward or routinely done; indeed, only a few empirical studies in software engineering assess practical significance in a principled and systematic way.
In this paper, we argue that Bayesian data analysis provides suitable tools to assess practical significance rigorously. We demonstrate our claims in a case study comparing different test techniques. The case study's data was previously analyzed (Afzal et al., 2015) using standard techniques focusing on statistical significance. Here, we build a multilevel model of the same data, which we fit and validate using Bayesian techniques. Our method is to apply cumulative prospect theory on top of the statistical model to quantitatively connect our statistical analysis output to a practically meaningful context. This is then the basis both for assessing and arguing for practical significance.
Our study demonstrates that Bayesian analysis provides a technically rigorous yet practical framework for empirical software engineering. A substantial side effect is that any uncertainty in the underlying data will be propagated through the statistical model, and its effects on practical significance are made clear.
Thus, in combination with cumulative prospect theory, Bayesian analysis supports seamlessly assessing practical significance in an empirical software engineering context, thus potentially clarifying and extending the relevance of research for practitioners.
△ Less
Submitted 25 December, 2020; v1 submitted 26 September, 2018;
originally announced September 2018.
-
Visualizing test diversity to support test optimisation
Authors:
Francisco Gomes de Oliveira Neto,
Robert Feldt,
Linda Erlenhov,
José Benardi de Souza Nunes
Abstract:
Diversity has been used as an effective criteria to optimise test suites for cost-effective testing. Particularly, diversity-based (alternatively referred to as similarity-based) techniques have the benefit of being generic and applicable across different Systems Under Test (SUT), and have been used to automatically select or prioritise large sets of test cases. However, it is a challenge to feedb…
▽ More
Diversity has been used as an effective criteria to optimise test suites for cost-effective testing. Particularly, diversity-based (alternatively referred to as similarity-based) techniques have the benefit of being generic and applicable across different Systems Under Test (SUT), and have been used to automatically select or prioritise large sets of test cases. However, it is a challenge to feedback diversity information to developers and testers since results are typically many-dimensional. Furthermore, the generality of diversity-based approaches makes it harder to choose when and where to apply them. In this paper we address these challenges by investigating: i) what are the trade-off in using different sources of diversity (e.g., diversity of test requirements or test scripts) to optimise large test suites, and ii) how visualisation of test diversity data can assist testers for test optimisation and improvement. We perform a case study on three industrial projects and present quantitative results on the fault detection capabilities and redundancy levels of different sets of test cases. Our key result is that test similarity maps, based on pair-wise diversity calculations, helped industrial practitioners identify issues with their test repositories and decide on actions to improve. We conclude that the visualisation of diversity information can assist testers in their maintenance and optimisation activities.
△ Less
Submitted 17 July, 2018; v1 submitted 15 July, 2018;
originally announced July 2018.
-
A Testability Analysis Framework for Non-Functional Properties
Authors:
Michael Felderer,
Bogdan Marculescu,
Francisco Gomes de Oliveira Neto,
Robert Feldt,
Richard Torkar
Abstract:
This paper presents background, the basic steps and an example for a testability analysis framework for non-functional properties.
This paper presents background, the basic steps and an example for a testability analysis framework for non-functional properties.
△ Less
Submitted 20 February, 2018;
originally announced February 2018.
-
Driving Simulator Platform for Development and Evaluation of Safety and Emergency Systems
Authors:
Andrés E. Gómez,
Tiago C. dos Santos,
Carlos M. Massera,
Arthur de M. Neto,
Denis F. Wolf
Abstract:
According to data from the United Nations, more than 3000 people have died each day in the world due to road traffic collision. Considering recent researches, the human error may be considered as the main responsible for these fatalities. Because of this, researchers seek alternatives to transfer the vehicle control from people to autonomous systems. However, providing this technological innovatio…
▽ More
According to data from the United Nations, more than 3000 people have died each day in the world due to road traffic collision. Considering recent researches, the human error may be considered as the main responsible for these fatalities. Because of this, researchers seek alternatives to transfer the vehicle control from people to autonomous systems. However, providing this technological innovation for the people may demand complex challenges in the legal, economic and technological areas. Consequently, carmakers and researchers have divided the driving automation in safety and emergency systems that improve the driver perception on the road. This may reduce the human error. Therefore, the main contribution of this study is to propose a driving simulator platform to develop and evaluate safety and emergency systems, in the first design stage. This driving simulator platform has an advantage: a flexible software structure.This allows in the simulation one adaptation for development or evaluation of a system. The proposed driving simulator platform was tested in two applications: cooperative vehicle system development and the influence evaluation of a Driving Assistance System (\textit{DAS}) on a driver. In the cooperative vehicle system development, the results obtained show that the increment of the time delay in the communication among vehicles ($V2V$) is determinant for the system performance. On the other hand, in the influence evaluation of a \textit{DAS} in a driver, it was possible to conclude that the \textit{DAS'} model does not have the level of influence necessary in a driver to avoid an accident.
△ Less
Submitted 1 February, 2018;
originally announced February 2018.