Search | arXiv e-print repository

Tracking the 2024 US Presidential Election Chatter on Tiktok: A Public Multimodal Dataset

Authors: Gabriela Pinto, Charles Bickham, Tanishq Salkar, Luca Luceri, Emilio Ferrara

Abstract: This paper documents our release of a large-scale data collection of TikTok posts related to the upcoming 2024 U.S. Presidential Election. Our current data comprises 1.8 million videos published between November 1, 2023, and May 26, 2024. Its exploratory analysis identifies the most common keywords, hashtags, and bigrams in both Spanish and English posts, focusing on the election and the two main… ▽ More This paper documents our release of a large-scale data collection of TikTok posts related to the upcoming 2024 U.S. Presidential Election. Our current data comprises 1.8 million videos published between November 1, 2023, and May 26, 2024. Its exploratory analysis identifies the most common keywords, hashtags, and bigrams in both Spanish and English posts, focusing on the election and the two main Presidential candidates, President Joe Biden and Donald Trump. We utilized the TikTok Research API, incorporating various election-related keywords and hashtags, to capture the full scope of relevant content. To address the limitations of the TikTok Research API, we also employed third-party scrapers to expand our dataset. The dataset is publicly available at https://github.com/gabbypinto/US2024PresElectionTikToks △ Less

Submitted 2 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

Comments: The 2024 Election Integrity Initiative

Report number: HUMANS Lab -- Working Paper No. 2024.3

arXiv:2405.03848 [pdf, other]

CityLearn v2: Energy-flexible, resilient, occupant-centric, and carbon-aware management of grid-interactive communities

Authors: Kingsley Nweye, Kathryn Kaspar, Giacomo Buscemi, Tiago Fonseca, Giuseppe Pinto, Dipanjan Ghose, Satvik Duddukuru, Pavani Pratapa, Han Li, Javad Mohammadi, Luis Lino Ferreira, Tianzhen Hong, Mohamed Ouf, Alfonso Capozzoli, Zoltan Nagy

Abstract: As more distributed energy resources become part of the demand-side infrastructure, it is important to quantify the energy flexibility they provide on a community scale, particularly to understand the impact of geographic, climatic, and occupant behavioral differences on their effectiveness, as well as identify the best control strategies to accelerate their real-world adoption. CityLearn provides… ▽ More As more distributed energy resources become part of the demand-side infrastructure, it is important to quantify the energy flexibility they provide on a community scale, particularly to understand the impact of geographic, climatic, and occupant behavioral differences on their effectiveness, as well as identify the best control strategies to accelerate their real-world adoption. CityLearn provides an environment for benchmarking simple and advanced distributed energy resource control algorithms including rule-based, model-predictive, and reinforcement learning control. CityLearn v2 presented here extends CityLearn v1 by providing a simulation environment that leverages the End-Use Load Profiles for the U.S. Building Stock dataset to create virtual grid-interactive communities for resilient, multi-agent distributed energy resources and objective control with dynamic occupant feedback. This work details the v2 environment design and provides application examples that utilize reinforcement learning to manage battery energy storage system charging/discharging cycles, vehicle-to-grid control, and thermal comfort during heat pump power modulation. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2402.05882 [pdf, other]

GET-Tok: A GenAI-Enriched Multimodal TikTok Dataset Documenting the 2022 Attempted Coup in Peru

Authors: Gabriela Pinto, Keith Burghardt, Kristina Lerman, Emilio Ferrara

Abstract: TikTok is one of the largest and fastest-growing social media sites in the world. TikTok features, however, such as voice transcripts, are often missing and other important features, such as OCR or video descriptions, do not exist. We introduce the Generative AI Enriched TikTok (GET-Tok) data, a pipeline for collecting TikTok videos and enriched data by augmenting the TikTok Research API with gene… ▽ More TikTok is one of the largest and fastest-growing social media sites in the world. TikTok features, however, such as voice transcripts, are often missing and other important features, such as OCR or video descriptions, do not exist. We introduce the Generative AI Enriched TikTok (GET-Tok) data, a pipeline for collecting TikTok videos and enriched data by augmenting the TikTok Research API with generative AI models. As a case study, we collect videos about the attempted coup in Peru initiated by its former President, Pedro Castillo, and its accompanying protests. The data includes information on 43,697 videos published from November 20, 2022 to March 1, 2023 (102 days). Generative AI augments the collected data via transcripts of TikTok videos, text descriptions of what is shown in the videos, what text is displayed within the video, and the stances expressed in the video. Overall, this pipeline will contribute to a better understanding of online discussion in a multimodal setting with applications of Generative AI, especially outlining the utility of this pipeline in non-English-language social media. Our code used to produce the pipeline is in a public Github repository: https://github.com/gabbypinto/GET-Tok-Peru. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: Github repository: https://github.com/gabbypinto/GET-Tok-Peru

arXiv:2402.01573 [pdf, other]

An Actionable Framework for Understanding and Improving Talent Retention as a Competitive Advantage in IT Organizations

Authors: Luiz Alexandre Costa, Edson Dias, Danilo Monteiro Ribeiro, Awdren Fontão, Gustavo Pinto, Rodrigo Pereira dos Santos, Alexander Serebrenik

Abstract: In the rapidly evolving global business landscape, the demand for software has intensified competition among organizations, leading to challenges in retaining highly qualified IT members in software organizations. One of the problems faced by IT organizations is the retention of these strategic professionals, also known as talent. This work presents an actionable framework for Talent Retention (TR… ▽ More In the rapidly evolving global business landscape, the demand for software has intensified competition among organizations, leading to challenges in retaining highly qualified IT members in software organizations. One of the problems faced by IT organizations is the retention of these strategic professionals, also known as talent. This work presents an actionable framework for Talent Retention (TR) used in IT organizations. It is based on our findings from interviews performed with 21 IT managers. The TR Framework is our main research outcome. Our framework encompasses a set of factors, contextual characteristics, barriers, strategies, and co** mechanisms. Our findings indicated that software engineers can be differentiated from other professional groups, and beyond competitive salaries, other elements for retaining talent in IT organizations should be considered, such as psychological safety, work-life balance, a positive work environment, innovative and challenging projects, and flexible work. A better understanding of factors could guide IT managers in improving talent management processes by addressing Software Engineering challenges, identifying important elements, and exploring strategies at the individual, team, and organizational levels. △ Less

Submitted 24 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: arXiv admin note: text overlap with arXiv:2205.06352 by other authors

arXiv:2401.09252 [pdf, other]

doi 10.1145/3519021

3D Scene Geometry Estimation from 360$^\circ$ Imagery: A Survey

Authors: Thiago Lopes Trugillo da Silveira, Paulo Gamarra Lessa Pinto, Jeffri Erwin Murrugarra Llerena, Claudio Rosito Jung

Abstract: This paper provides a comprehensive survey on pioneer and state-of-the-art 3D scene geometry estimation methodologies based on single, two, or multiple images captured under the omnidirectional optics. We first revisit the basic concepts of the spherical camera model, and review the most common acquisition technologies and representation formats suitable for omnidirectional (also called 360… ▽ More This paper provides a comprehensive survey on pioneer and state-of-the-art 3D scene geometry estimation methodologies based on single, two, or multiple images captured under the omnidirectional optics. We first revisit the basic concepts of the spherical camera model, and review the most common acquisition technologies and representation formats suitable for omnidirectional (also called 360$^\circ$, spherical or panoramic) images and videos. We then survey monocular layout and depth inference approaches, highlighting the recent advances in learning-based solutions suited for spherical data. The classical stereo matching is then revised on the spherical domain, where methodologies for detecting and describing sparse and dense features become crucial. The stereo matching concepts are then extrapolated for multiple view camera setups, categorizing them among light fields, multi-view stereo, and structure from motion (or visual simultaneous localization and map**). We also compile and discuss commonly adopted datasets and figures of merit indicated for each purpose and list recent results for completeness. We conclude this paper by pointing out current and future trends. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: Published in ACM Computing Surveys

Journal ref: ACM Comput. Surv. 55, 4, Article 68, 2023

arXiv:2311.18452 [pdf, other]

Developer Experiences with a Contextualized AI Coding Assistant: Usability, Expectations, and Outcomes

Authors: Gustavo Pinto, Cleidson de Souza, Thayssa Rocha, Igor Steinmacher, Alberto de Souza, Edward Monteiro

Abstract: In the rapidly advancing field of artificial intelligence, software development has emerged as a key area of innovation. Despite the plethora of general-purpose AI assistants available, their effectiveness diminishes in complex, domain-specific scenarios. Noting this limitation, both the academic community and industry players are relying on contextualized coding AI assistants. These assistants su… ▽ More In the rapidly advancing field of artificial intelligence, software development has emerged as a key area of innovation. Despite the plethora of general-purpose AI assistants available, their effectiveness diminishes in complex, domain-specific scenarios. Noting this limitation, both the academic community and industry players are relying on contextualized coding AI assistants. These assistants surpass general-purpose AI tools by integrating proprietary, domain-specific knowledge, offering precise and relevant solutions. Our study focuses on the initial experiences of 62 participants who used a contextualized coding AI assistant -- named StackSpot AI -- in a controlled setting. According to the participants, the assistants' use resulted in significant time savings, easier access to documentation, and the generation of accurate codes for internal APIs. However, challenges associated with the knowledge sources necessary to make the coding assistant access more contextual information as well as variable responses and limitations in handling complex codes were observed. The study's findings, detailing both the benefits and challenges of contextualized AI assistants, underscore their potential to revolutionize software development practices, while also highlighting areas for further refinement. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2311.18450 [pdf, other]

Lessons from Building StackSpot AI: A Contextualized AI Coding Assistant

Authors: Gustavo Pinto, Cleidson de Souza, João Batista Neto, Alberto de Souza, Tarcísio Gotto, Edward Monteiro

Abstract: With their exceptional natural language processing capabilities, tools based on Large Language Models (LLMs) like ChatGPT and Co-Pilot have swiftly become indispensable resources in the software developer's toolkit. While recent studies suggest the potential productivity gains these tools can unlock, users still encounter drawbacks, such as generic or incorrect answers. Additionally, the pursuit o… ▽ More With their exceptional natural language processing capabilities, tools based on Large Language Models (LLMs) like ChatGPT and Co-Pilot have swiftly become indispensable resources in the software developer's toolkit. While recent studies suggest the potential productivity gains these tools can unlock, users still encounter drawbacks, such as generic or incorrect answers. Additionally, the pursuit of improved responses often leads to extensive prompt engineering efforts, diverting valuable time from writing code that delivers actual value. To address these challenges, a new breed of tools, built atop LLMs, is emerging. These tools aim to mitigate drawbacks by employing techniques like fine-tuning or enriching user prompts with contextualized information. In this paper, we delve into the lessons learned by a software development team venturing into the creation of such a contextualized LLM-based application, using retrieval-based techniques, called CodeBuddy. Over a four-month period, the team, despite lacking prior professional experience in LLM-based applications, built the product from scratch. Following the initial product release, we engaged with the development team responsible for the code generative components. Through interviews and analysis of the application's issue tracker, we uncover various intriguing challenges that teams working on LLM-based applications might encounter. For instance, we found three main group of lessons: LLM-based lessons, User-based lessons, and Technical lessons. By understanding these lessons, software development teams could become better prepared to build LLM-based applications. △ Less

Submitted 4 January, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

arXiv:2307.16696 [pdf, other]

Large Language Models for Education: Grading Open-Ended Questions Using ChatGPT

Authors: Gustavo Pinto, Isadora Cardoso-Pereira, Danilo Monteiro Ribeiro, Danilo Lucena, Alberto de Souza, Kiev Gama

Abstract: As a way of addressing increasingly sophisticated problems, software professionals face the constant challenge of seeking improvement. However, for these individuals to enhance their skills, their process of studying and training must involve feedback that is both immediate and accurate. In the context of software companies, where the scale of professionals undergoing training is large, but the nu… ▽ More As a way of addressing increasingly sophisticated problems, software professionals face the constant challenge of seeking improvement. However, for these individuals to enhance their skills, their process of studying and training must involve feedback that is both immediate and accurate. In the context of software companies, where the scale of professionals undergoing training is large, but the number of qualified professionals available for providing corrections is small, delivering effective feedback becomes even more challenging. To circumvent this challenge, this work presents an exploration of using Large Language Models (LLMs) to support the correction process of open-ended questions in technical training. In this study, we utilized ChatGPT to correct open-ended questions answered by 42 industry professionals on two topics. Evaluating the corrections and feedback provided by ChatGPT, we observed that it is capable of identifying semantic details in responses that other metrics cannot observe. Furthermore, we noticed that, in general, subject matter experts tended to agree with the corrections and feedback given by ChatGPT. △ Less

Submitted 1 August, 2023; v1 submitted 31 July, 2023; originally announced July 2023.

Comments: 10 pages, 2 figures

Journal ref: SBES EDU Track, 2023

arXiv:2305.17106 [pdf, other]

Understanding Self-Efficacy in the Context of Software Engineering: A Qualitative Study in the Industry

Authors: Danilo Monteiro Ribeiro, Rayfran Rocha Lima, César França, Alberto de Souza, Isadora Cardoso-Pereira, Gustavo Pinto

Abstract: CONTEXT: Self-efficacy is a concept researched in various areas of knowledge that impacts various factors such as performance, satisfaction, and motivation. In Software Engineering, it has mainly been studied in the academic context, presenting results similar to other areas of knowledge. However, it is also important to understand its impact in the industrial context. OBJECTIVE: Therefore, this s… ▽ More CONTEXT: Self-efficacy is a concept researched in various areas of knowledge that impacts various factors such as performance, satisfaction, and motivation. In Software Engineering, it has mainly been studied in the academic context, presenting results similar to other areas of knowledge. However, it is also important to understand its impact in the industrial context. OBJECTIVE: Therefore, this study aims to understand the impact on the software development context with a focus on understanding the behavioral signs of self-efficacy in software engineers and how self-efficacy can impact the work-day of software engineers. METHOD: A qualitative research was conducted using semi-structured questionnaires with 31 interviewees from a software development company located in Brazil. The interviewees participated in a Bootcamp and were later assigned to software development teams. Thematic analysis was used to analyze the data. RESULTS: In the perception of the interviewees, 21 signs were found that are related to people with high and low self-efficacy. These signs were divided into two dimensions: social and cognitive. Also, 18 situations were found that can lead to an increase or decrease of self-efficacy of software engineers. Finally, 12 factors were mentioned that can impact software development teams. CONCLUSION: This work evidences a set of behavioral signs that can help team leaders to better perceive the self-efficacy of their members. It also presents a set of situations that both leaders and individuals can use to improve their self-efficacy in the development context, and finally, factors that can be impacted by self-efficacy in the software development context are also presented. Finally, this work emphasizes the importance of understanding self-efficacy in the industrial context. △ Less

Submitted 2 June, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: 10 pages, 3 figures

Journal ref: Published at EASE 2023

arXiv:2303.05429 [pdf, other]

Supporting the Careers of Developers with Disabilities: Lessons from Zup Innovation

Authors: Isadora Cardoso-Pereira, Geraldo Gomes, Danilo Monteiro Ribeiro, Alberto de Souza, Danilo Lucena, Gustavo Pinto

Abstract: People with still face discrimination, which creates significant obstacles to accessing higher education, ultimately hindering their access to high-skilled occupations. In this study we present Catalisa, an eight-month training camp (developed by Zup Innovation) that hires and trains people with disabilities as software developers. We interviewed 12 Catalisa participants to better understand their… ▽ More People with still face discrimination, which creates significant obstacles to accessing higher education, ultimately hindering their access to high-skilled occupations. In this study we present Catalisa, an eight-month training camp (developed by Zup Innovation) that hires and trains people with disabilities as software developers. We interviewed 12 Catalisa participants to better understand their challenges and limitations regarding inclusion and accessibility. We offer four recommendations to improve inclusion and accessibility in Catalisa-like programs, that we hope could motive others to build a more inclusive and equitable workplace that benefits everyone. △ Less

Submitted 26 May, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

Comments: 5 pages (two columns), 1 figures

arXiv:2210.07342 [pdf, other]

Cognitive-Driven Development Helps Software Teams to Keep Code Units Under the Limit!

Authors: Gustavo Pinto, Alberto de Souza

Abstract: Software design techniques are undoubtedly crucial in the process of designing good software. Over the years, a large number of design techniques have been proposed by both researchers and practitioners. Unfortunately, despite their uniqueness, it is not uncommon to find software products that make subpar design decisions, leading to design degradation challenges. One potential reason for this beh… ▽ More Software design techniques are undoubtedly crucial in the process of designing good software. Over the years, a large number of design techniques have been proposed by both researchers and practitioners. Unfortunately, despite their uniqueness, it is not uncommon to find software products that make subpar design decisions, leading to design degradation challenges. One potential reason for this behavior is that developers do not have a clear vision of how much a code unit could grow; without this vision, a code unit can grow endlessly, even when developers are equipped with an arsenal of design practices. Different than other design techniques, Cognitive Driven Development (CDD for short) focuses on 1) defining and 2) limiting the number of coding elements that developers could use at a given code unit. In this paper, we report on the experiences of a software development team in using CDD for building from scratch a learning management tool at Zup Innovation, a Brazilian tech company. By curating commit traces left in the repositories, combined with the developers' perception, we organized a set of findings and lessons that could be useful for those interested in adopting CDD. For instance, we noticed that by using CDD, despite the evolution of the product, developers were able to keep the code units under a small amount of size (in terms of size). Furthermore, although limiting the complexity is at the heart of CDD, we also discovered that developers tend to relax this notion of limit so that they can cope with the different complexities of the software. Still, we noticed that CDD could also influence testing practices; limiting the code units' size makes testing easier to perform. △ Less

Submitted 24 May, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: 28 pages, submitted to JSS (in Practice Track)

arXiv:2208.00269 [pdf, ps, other]

Automatically Categorising GitHub Repositories by Application Domain

Authors: Francisco Zanartu, Christoph Treude, Bruno Cartaxo, Hudson Silva Borges, Pedro Moura, Markus Wagner, Gustavo Pinto

Abstract: GitHub is the largest host of open source software on the Internet. This large, freely accessible database has attracted the attention of practitioners and researchers alike. But as GitHub's growth continues, it is becoming increasingly hard to navigate the plethora of repositories which span a wide range of domains. Past work has shown that taking the application domain into account is crucial fo… ▽ More GitHub is the largest host of open source software on the Internet. This large, freely accessible database has attracted the attention of practitioners and researchers alike. But as GitHub's growth continues, it is becoming increasingly hard to navigate the plethora of repositories which span a wide range of domains. Past work has shown that taking the application domain into account is crucial for tasks such as predicting the popularity of a repository and reasoning about project quality. In this work, we build on a previously annotated dataset of 5,000 GitHub repositories to design an automated classifier for categorising repositories by their application domain. The classifier uses state-of-the-art natural language processing techniques and machine learning to learn from multiple data sources and catalogue repositories according to five application domains. We contribute with (1) an automated classifier that can assign popular repositories to each application domain with at least 70% precision, (2) an investigation of the approach's performance on less popular repositories, and (3) a practical application of this approach to answer how the adoption of software engineering practices differs across application domains. Our work aims to help the GitHub community identify repositories of interest and opens promising avenues for future work investigating differences between repositories from different application domains. △ Less

Submitted 30 July, 2022; originally announced August 2022.

arXiv:2206.10655 [pdf, other]

To What Extent Cognitive-Driven Development Improves Code Readability?

Authors: Leonardo Barbosa, Victor Hugo Santiago, Alberto Luiz Oliveira Tavares de Souza, Gustavo Pinto

Abstract: Cognitive-Driven Development (CDD) is a coding design technique that aims to reduce the cognitive effort that developers place in understanding a given code unit (e.g., a class). By following CDD design practices, it is expected that the coding units to be smaller, and, thus, easier to maintain and evolve. However, it is so far unknown whether these smaller code units coded using CDD standards are… ▽ More Cognitive-Driven Development (CDD) is a coding design technique that aims to reduce the cognitive effort that developers place in understanding a given code unit (e.g., a class). By following CDD design practices, it is expected that the coding units to be smaller, and, thus, easier to maintain and evolve. However, it is so far unknown whether these smaller code units coded using CDD standards are, indeed, easier to understand. In this work we aim to assess to what CDD improves code readability. To achieve this goal, we conducted a two-phase study. We start by inviting professional software developers to vote (and justify their rationale) on the most readable pair of code snippets (from a set of 10 pairs); one of the pairs was coded using CDD practices. We received 133 answers. In the second phase, we applied the state-of-the art readability model on the 10-pairs of CDD-guided refactorings. We observed some conflicting results. On the one hand, developers perceived that seven (out of 10) CDD-guided refactorings were more readable than their counterparts; for two other CDD-guided refactorings, developers were undecided, while only in one of the CDD-guided refactorings, developers preferred the original code snippet. On the other hand, we noticed that only one CDD-guided refactorings have better performance readability, assessed by state-of-the-art readability models. Our results provide initial evidence that CDD could be an interesting approach for software design. △ Less

Submitted 21 June, 2022; originally announced June 2022.

Comments: 10 pages, 7 figures

arXiv:2204.12274 [pdf, other]

Socio-technical constraints and affordances of virtual collaboration -- A study of four online hackathons

Authors: Wendy Mendes, Albert Richard, Tähe-Kai Tillo, Gustavo Pinto, Kiev Gama, Alexander Nolte

Abstract: Hackathons and similar time-bounded events have become a popular form of collaboration. They are commonly organized as in-person events during which teams engage in intense collaboration over a short period of time to complete a project that is of interest to them. Most research to date has focused on studying how teams collaborate in a co-located setting, pointing towards the advantages of radica… ▽ More Hackathons and similar time-bounded events have become a popular form of collaboration. They are commonly organized as in-person events during which teams engage in intense collaboration over a short period of time to complete a project that is of interest to them. Most research to date has focused on studying how teams collaborate in a co-located setting, pointing towards the advantages of radical co-location. The global pandemic of 2020, however, has led to many hackathons moving online, which challenges our current understanding of how they function. In this paper, we address this gap by presenting findings from a multiple-case study of 10 hackathon teams that participated in 4 hackathons across two continents. By analyzing the collected data, we found that teams merged synchronous and asynchronous means of communication to maintain a common understanding of work progress as well as to maintain awareness of each other's tasks. Task division was self-assigned based on individual skills or interests, while leaders emerged from different strategies (e.g., participant experience, the responsibility of registering the team in an event). Some of the affordances of in-person hackathons, such as the radical co-location of team members, could be partially reproduced in teams that kept synchronous communication channels while working (i.e., shared audio territories), in a sort of "radical virtual co-location". However, others, such as interactions with other teams, easy access to mentors, and networking with other participants, decreased. In addition, the technical constraints of the different communication tools and platforms brought technical problems and were overwhelming to participants. Our work contributes to understanding the virtual collaboration of small teams in the context of online hackathons and how technologies and event structures proposed by organizers imply this collaboration. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Comments: Accepted in Proceedings of the ACM on Human Computer Interaction (CSCW'22)

arXiv:2110.12241 [pdf, ps, other]

Changing Software Engineers' Self-Efficacy with Bootcamps:A Research Proposal

Authors: Danilo Monteiro Ribeiro, Alberto Souza, Victor Santiago, Danilo Lucena, Geraldo Gomes, Gustavo Pinto

Abstract: In several areas of knowledge, self-efficacy is related to the perfomance of individuals, including in Software Engineering. However,it is not clear how self-efficacy can be modified in training conducted by the industry. Furthermore, we still do not understand how self-efficacy can impact an individual's team and career in the industry. This lack of understanding can negatively impact how compani… ▽ More In several areas of knowledge, self-efficacy is related to the perfomance of individuals, including in Software Engineering. However,it is not clear how self-efficacy can be modified in training conducted by the industry. Furthermore, we still do not understand how self-efficacy can impact an individual's team and career in the industry. This lack of understanding can negatively impact how companies and individuals perceive the importance of self-efficacy in the field. Therefore, We present a research proposal that aims to understand the relationship between self-efficacy and training in Software Engineering. Moreover, we look to understand the role of self-efficacy at Software Development industry. We propose a longitudinal case study with software engineers at Zup Innovation that participating of our bootcamp training. We expect to collect data to support our assumptions that self-efficacy can be related to training in Software Engineering. The other assumption is that self-efficacy at the beginning of training is higher than the middle, and that self-efficacy at the end of training is higher than the middle. We expect that the study proposed in this article will motivate a discussion about self-efficacy and the importance of training employers in the industry of software development. △ Less

Submitted 26 October, 2021; v1 submitted 23 October, 2021; originally announced October 2021.

Comments: 7 pages, 0 figures, SEET

arXiv:2107.05792 [pdf, other]

What Evidence We Would Miss If We Do Not Use Grey Literature?

Authors: Fernando Kamei, Gustavo Pinto, Igor Wiese, Márcio Ribeiro, Sérgio Soares

Abstract: Context: Over the last years, Grey Literature (GL) is gaining increasing attention in Secondary Studies in Software Engineering (SE). Notably, Multivocal Literature Review (MLR) studies, that search for evidence in both Traditional Literature (TL) and GL, is particularly benefiting from this raise of GL content. Despite the growing interest in MLR-based studies, the literature assessing how GL has… ▽ More Context: Over the last years, Grey Literature (GL) is gaining increasing attention in Secondary Studies in Software Engineering (SE). Notably, Multivocal Literature Review (MLR) studies, that search for evidence in both Traditional Literature (TL) and GL, is particularly benefiting from this raise of GL content. Despite the growing interest in MLR-based studies, the literature assessing how GL has contributed to MLR studies is still scarce. Objective: This research aims to assess how the use of GL contributed to MLR studies. By contributing, we mean, understanding to what extent GL is providing evidence that is indeed used by an MLR to answer its research question. Method: We conducted a tertiary study to identify MLR studies published between 2017 and 2019, selecting nine MLRs studies. Using qualitative and quantitative analysis, we identified the GL used and assessed to what extent these MLRs are contributing to MLR studies. Results: Our analysis identified that 1) GL provided evidence not found in TL, 2) most of the GL sources were used to provide recommendations to solve problems, explain a topic, and classify the findings, and 3) 19 different GL types were used in the studies; these GLs were mainly produced by SE practitioners (including blog posts, slides presentations, or project descriptions). Conclusions: We evidence how GL contributed to MLR studies. We observed that if these GLs were not included in the MLR, several findings would have been omitted or weakened. We also described the challenges involved when conducting this investigation, along with potential ways to deal with them, which may help future SE researchers. △ Less

Submitted 17 August, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

arXiv:2106.11160 [pdf, other]

Effects of boundary conditions in fully convolutional networks for learning spatio-temporal dynamics

Authors: Antonio Alguacil, Wagner Gonçalves Pinto, Michael Bauerheim, Marc C. Jacob, Stéphane Moreau

Abstract: Accurate modeling of boundary conditions is crucial in computational physics. The ever increasing use of neural networks as surrogates for physics-related problems calls for an improved understanding of boundary condition treatment, and its influence on the network accuracy. In this paper, several strategies to impose boundary conditions (namely padding, improved spatial context, and explicit enco… ▽ More Accurate modeling of boundary conditions is crucial in computational physics. The ever increasing use of neural networks as surrogates for physics-related problems calls for an improved understanding of boundary condition treatment, and its influence on the network accuracy. In this paper, several strategies to impose boundary conditions (namely padding, improved spatial context, and explicit encoding of physical boundaries) are investigated in the context of fully convolutional networks applied to recurrent tasks. These strategies are evaluated on two spatio-temporal evolving problems modeled by partial differential equations: the 2D propagation of acoustic waves (hyperbolic PDE) and the heat equation (parabolic PDE). Results reveal a high sensitivity of both accuracy and stability on the boundary implementation in such recurrent tasks. It is then demonstrated that the choice of the optimal padding strategy is directly linked to the data semantics. Furthermore, the inclusion of additional input spatial context or explicit physics-based rules allows a better handling of boundaries in particular for large number of recurrences, resulting in more robust and stable neural networks, while facilitating the design and versatility of such networks. △ Less

Submitted 4 July, 2021; v1 submitted 21 June, 2021; originally announced June 2021.

Comments: 16 pages, 8 figures, submitted to ECML PKDD 2021 Conference

arXiv:2105.05482 [pdf, other]

On the reproducibility of fully convolutional neural networks for modeling time-space evolving physical systems

Authors: Wagner Gonçalves Pinto, Antonio Alguacil, Michaël Bauerheim

Abstract: Reproducibility of a deep-learning fully convolutional neural network is evaluated by training several times the same network on identical conditions (database, hyperparameters, hardware) with non-deterministic Graphics Processings Unit (GPU) operations. The propagation of two-dimensional acoustic waves, typical of time-space evolving physical systems, is studied on both recursive and non-recursiv… ▽ More Reproducibility of a deep-learning fully convolutional neural network is evaluated by training several times the same network on identical conditions (database, hyperparameters, hardware) with non-deterministic Graphics Processings Unit (GPU) operations. The propagation of two-dimensional acoustic waves, typical of time-space evolving physical systems, is studied on both recursive and non-recursive tasks. Significant changes in models properties (weights, featured fields) are observed. When tested on various propagation benchmarks, these models systematically returned estimations with a high level of deviation, especially for the recurrent analysis which strongly amplifies variability due to the non-determinism. Trainings performed with double floating-point precision provide slightly better estimations and a significant reduction of the variability of both the network parameters and its testing error range. △ Less

Submitted 12 May, 2021; originally announced May 2021.

arXiv:2104.13435 [pdf, other]

Grey Literature in Software Engineering: A Critical Review

Authors: Fernando Kamei, Igor Wiese, Crescencio Lima, Ivanilton Polato, Vilmar Nepomuceno, Waldemar Ferreira, Márcio Ribeiro, Carolline Pena, Bruno Cartaxo, Gustavo Pinto, Sérgio Soares

Abstract: Context: Grey Literature (GL) recently has grown in Software Engineering (SE) research since the increased use of online communication channels by software engineers. However, there is still a limited understanding of how SE research is taking advantage of GL. Objective: This research aimed to understand how SE researchers use GL in their secondary studies. Method: We conducted a tertiary study of… ▽ More Context: Grey Literature (GL) recently has grown in Software Engineering (SE) research since the increased use of online communication channels by software engineers. However, there is still a limited understanding of how SE research is taking advantage of GL. Objective: This research aimed to understand how SE researchers use GL in their secondary studies. Method: We conducted a tertiary study of studies published between 2011 and 2018 in high-quality software engineering conferences and journals. We then applied qualitative and quantitative analysis to investigate 446 potential studies. Results: From the 446 selected studies, 126 studies cited GL but only 95 of those used GL to answer a specific research question representing almost 21% of all the 446 secondary studies. Interestingly, we identified that few studies employed specific search mechanisms and used additional criteria for assessing GL. Moreover, by the time we conducted this research, 49% of the GL URLs are not working anymore. Based on our findings, we discuss some challenges in using GL and potential mitigation plans. Conclusion: In this paper, we summarized the last 10 years of software engineering research that uses GL, showing that GL has been essential for bringing practical new perspectives that are scarce in traditional literature. By drawing the current landscape of use, we also raise some awareness of related challenges (and strategies to deal with them). △ Less

Submitted 12 May, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

Comments: 29 pages, 9 figures

arXiv:2103.13762 [pdf, other]

Mining Energy-Related Practices in Robotics Software

Authors: Michel Albonico, Ivano Malavolta, Gustavo Pinto, Emitza Guzman, Katerina Chinnappan, Patricia Lago

Abstract: Robots are becoming more and more commonplace in many industry settings. This successful adoption can be partly attributed to (1) their increasingly affordable cost and (2) the possibility of develo** intelligent, software-driven robots. Unfortunately, robotics software consumes significant amounts of energy. Moreover, robots are often battery-driven, meaning that even a small energy improvement… ▽ More Robots are becoming more and more commonplace in many industry settings. This successful adoption can be partly attributed to (1) their increasingly affordable cost and (2) the possibility of develo** intelligent, software-driven robots. Unfortunately, robotics software consumes significant amounts of energy. Moreover, robots are often battery-driven, meaning that even a small energy improvement can help reduce its energy footprint and increase its autonomy and user experience. In this paper, we study the Robot Operating System (ROS) ecosystem, the de-facto standard for develo** and prototy** robotics software. We analyze 527 energy-related data points (including commits, pull-requests, and issues on ROS-related repositories, ROS-related questions on StackOverflow, ROS Discourse, ROS Answers, and the official ROS Wiki). Our results include a quantification of the interest of roboticists on software energy efficiency, 10 recurrent causes, and 14 solutions of energy-related issues, and their implied trade-offs with respect to other quality attributes. Those contributions support roboticists and researchers towards having energy-efficient software in future robotics projects. △ Less

Submitted 25 March, 2021; originally announced March 2021.

Comments: 11 pages

Journal ref: MSR 2021

arXiv:2012.07140 [pdf, other]

How Trans-Inclusive are Hackathons?

Authors: Rafa Prado, Wendy Galeno, Kiev Gama, Gustavo Pinto

Abstract: Hackathons are fun! People go there to learn, meet new colleagues, intensively work on a collaborative project, and mix pizza with energy drinks. However, for transgender community and other minorities, hackathons can have an uncomfortable atmosphere. Some transgender and non-conforming people that, although enjoying hackathons, decided not to participate anymore, afraid of LGBQTPhobia and other d… ▽ More Hackathons are fun! People go there to learn, meet new colleagues, intensively work on a collaborative project, and mix pizza with energy drinks. However, for transgender community and other minorities, hackathons can have an uncomfortable atmosphere. Some transgender and non-conforming people that, although enjoying hackathons, decided not to participate anymore, afraid of LGBQTPhobia and other discomforts. In this paper we surveyed 44 trans and cis hackathons participants and interviewed seven transgender ones. By understanding their needs and challenges, we introduce five recommendations to make hackathons more inclusive. △ Less

Submitted 15 December, 2020; v1 submitted 13 December, 2020; originally announced December 2020.

Comments: 4 pages, 1 figure

Journal ref: IEEE Software 2021

arXiv:2012.05016 [pdf, other]

From One to Hundreds: Multi-Licensing in the JavaScript Ecosystem

Authors: João Pedro Moraes, Ivanilton Polato, Igor Wiese, Filipe Saraiva, Gustavo Pinto

Abstract: Open source licenses create a legal framework that plays a crucial role in the widespread adoption of open source projects. Without a license, any source code available on the internet could not be openly (re)distributed. Although recent studies provide evidence that most popular open source projects have a license, developers might lack confidence or expertise when they need to combine software l… ▽ More Open source licenses create a legal framework that plays a crucial role in the widespread adoption of open source projects. Without a license, any source code available on the internet could not be openly (re)distributed. Although recent studies provide evidence that most popular open source projects have a license, developers might lack confidence or expertise when they need to combine software licenses, leading to a mistaken project license unification.This license usage is challenged by the high degree of reuse that occurs in the heart of modern software development practices, in which third-party libraries and frameworks are easily and quickly integrated into a software codebase.This scenario creates what we call "multi-licensed" projects, which happens when one project has components that are licensed under more than one license. Although these components exist at the file-level, they naturally impact licensing decisions at the project-level. In this paper, we conducted a mix-method study to shed some light on these questions. We started by parsing 1,426,263 (source code and non-source code) files available on 1,552 JavaScript projects, looking for license information. Among these projects, we observed that 947 projects (61%) employ more than one license. On average, there are 4.7 licenses per studied project (max: 256). Among the reasons for multi-licensing is to incorporate the source code of third-party libraries into the project's codebase. When doing so, we observed that 373 of the multi-licensed projects introduced at least one license incompatibility issue. We also surveyed with 83 maintainers of these projects aimed to cross-validate our findings. We observed that 63% of the surveyed maintainers are not aware of the multi-licensing implications. For those that are aware, they adopt multiple licenses mostly to conform with third-party libraries' licenses. △ Less

Submitted 9 December, 2020; originally announced December 2020.

Comments: Submitted to EMSE, 33 pages

arXiv:2012.03759 [pdf, other]

Exposing Bugs in JavaScript Engines through Test Transplantation and Differential Testing

Authors: Igor Lima, Jefferson Silva, Breno Miranda, Gustavo Pinto, Marcelo d'Amorim

Abstract: Context. JavaScript is a popular programming language today with several implementations competing for market dominance. Although a specification document and a conformance test suite exist to guide engine development, bugs occur and have important practical consequences. Implementing correct engines is challenging because the spec is intentionally incomplete and evolves frequently. Objective. Thi… ▽ More Context. JavaScript is a popular programming language today with several implementations competing for market dominance. Although a specification document and a conformance test suite exist to guide engine development, bugs occur and have important practical consequences. Implementing correct engines is challenging because the spec is intentionally incomplete and evolves frequently. Objective. This paper investigates the use of test transplantation and differential testing for revealing functional bugs in JavaScript engines. The former technique runs the regression test suite of a given engine on another engine. The latter technique fuzzes existing inputs and then compares the output produced by different engines with a differential oracle. Method. We conducted experiments with engines from five major players-Apple, Facebook, Google, Microsoft, and Mozilla-to assess the effectiveness of test transplantation and differential testing. Results. Our results indicate that both techniques revealed several bugs, many of which confirmed by developers. We reported 35 bugs with test transplantation (23 of these bugs confirmed and 19 fixed) and reported 24 bugs with differential testing (17 of these confirmed and 10 fixed). Results indicate that most of these bugs affected two engines-Apple's JSC and Microsoft's ChakraCore (24 and 26 bugs, respectively). To summarize, our results show that test transplantation and differential testing are easy to apply and very effective in finding bugs in complex software, such as JavaScript engines. △ Less

Submitted 7 December, 2020; originally announced December 2020.

Comments: 32 pages, 2 figuras

Journal ref: Software Quality Journal 2021

arXiv:2012.03738 [pdf, other]

Small Changes, Big Impacts: Leveraging Diversity to Improve Energy Efficiency

Authors: Wellington Oliveira, Hugo Matalonga, Gustavo Pinto, Fernando Castor, João Paulo Fernandes

Abstract: In the last few years, a growing body of research has proposed methods, techniques, and tools to support developers in the construction of software that consumes less energy. These solutions leverage diverse approaches such as version history mining, analytical models, identifying energy-efficient color schemes, and optimizing the packaging of HTTP requests. In this chapter, we present a complem… ▽ More In the last few years, a growing body of research has proposed methods, techniques, and tools to support developers in the construction of software that consumes less energy. These solutions leverage diverse approaches such as version history mining, analytical models, identifying energy-efficient color schemes, and optimizing the packaging of HTTP requests. In this chapter, we present a complementary approach. We advocate that developers should leverage software diversity to make software systems more energy-efficient. Our main insight is that non-specialists can build software that consumes less energy by alternating at development time between readily available, diversely-designed pieces of software implemented by third-parties. These pieces of software can vary in nature, granularity, and quality attributes. Examples include data structures and constructs for thread management and synchronization. △ Less

Submitted 7 December, 2020; originally announced December 2020.

Comments: 31 pages, 7 figures

arXiv:2012.03716 [pdf, other]

How Successful Are Open Source Contributions From Countries with Different Levels of Human Development?

Authors: Leonardo Furtado, Bruno Cartaxo, Christoph Treude, Gustavo Pinto

Abstract: Are Brazilian developers less likely to have a contribution accepted than their peers from, say, the United Kingdom? In this paper we studied whether the developers' location relates to the outcome of a pull request. We curated the locations of 14k contributors who performed 44k pull requests to 20 open source projects. Our results indeed suggest that developers from countries with low human devel… ▽ More Are Brazilian developers less likely to have a contribution accepted than their peers from, say, the United Kingdom? In this paper we studied whether the developers' location relates to the outcome of a pull request. We curated the locations of 14k contributors who performed 44k pull requests to 20 open source projects. Our results indeed suggest that developers from countries with low human development indexes (HDI) not only perform a small fraction of the overall pull requests, but they also are the ones that face rejection the most. △ Less

Submitted 7 December, 2020; originally announced December 2020.

Comments: 5 pages, 1 figure

Journal ref: IEEE Software 2021

arXiv:2009.05926 [pdf, other]

On the Use of Grey Literature: A Survey with the Brazilian Software Engineering Research Community

Authors: Fernando Kamei, Igor Wiese, Gustavo Pinto, Márcio Ribeiro, Sérgio Soares

Abstract: Background: The use of Grey Literature (GL) has been investigate in diverse research areas. In Software Engineering (SE), this topic has an increasing interest over the last years. Problem: Even with the increase of GL published in diverse sources, the understanding of their use on the SE research community is still controversial. Objective: To understand how Brazilian SE researchers use GL, we ai… ▽ More Background: The use of Grey Literature (GL) has been investigate in diverse research areas. In Software Engineering (SE), this topic has an increasing interest over the last years. Problem: Even with the increase of GL published in diverse sources, the understanding of their use on the SE research community is still controversial. Objective: To understand how Brazilian SE researchers use GL, we aimed to become aware of the criteria to assess the credibility of their use, as well as the benefits and challenges. Method: We surveyed 76 active SE researchers participants of a flagship SE conference in Brazil, using a questionnaire with 11 questions to share their views on the use of GL in the context of SE research. We followed a qualitative approach to analyze open questions. Results: We found that most surveyed researchers use GL mainly to understand new topics. Our work identified new findings, including: 1) GL sources used by SE researchers (e.g., blogs, community website); 2) motivations to use (e.g., to understand problems and to complement research findings) or reasons to avoid GL (e.g., lack of reliability, lack of scientific value); 3) the benefit that is easy to access and read GL and the challenge of GL to have its scientific value recognized; and 4) criteria to assess GL credibility, showing the importance of the content owner to be renowned (e.g., renowned author and institutions). Conclusions: Our findings contribute to form a body of knowledge on the use of GL by SE researchers, by discussing novel (some contradictory) results and providing a set of lessons learned to both SE researchers and practitioners. △ Less

Submitted 13 September, 2020; originally announced September 2020.

arXiv:2008.08652 [pdf, other]

The Organization of Software Teams in the Quest for Continuous Delivery: A Grounded Theory Approach

Authors: Leonardo Leite, Gustavo Pinto, Fabio Kon, Paulo Meirelles

Abstract: Context: To accelerate time-to-market and improve customer satisfaction, software-producing organizations have adopted continuous delivery practices, impacting the relations between development and infrastructure professionals. Yet, no substantial literature has substantially tackled how the software industry structures the organization of development and infrastructure teams. Objective: In this… ▽ More Context: To accelerate time-to-market and improve customer satisfaction, software-producing organizations have adopted continuous delivery practices, impacting the relations between development and infrastructure professionals. Yet, no substantial literature has substantially tackled how the software industry structures the organization of development and infrastructure teams. Objective: In this study, we investigate how software-producing organizations structure their development and infrastructure teams, specifically how is the division of labor among these groups and how they interact. Method: After brainstorming with 7 DevOps experts to better formulate our research and procedures, we collected and analyzed data from 37 semi-structured interviews with IT professionals, following Grounded Theory guidelines. Results: After a careful analysis, we identified four common organizational structures: (1) siloed departments, (2) classical DevOps, (3) cross-functional teams, and (4) platform teams. We also observed that some companies are transitioning between these structures. Conclusion: The main contribution of this study is a theory in the form of a taxonomy that organizes the found structures along with their properties. This theory could guide researchers and practitioners to think about how to better structure development and infrastructure professionals in software-producing organizations. △ Less

Submitted 23 June, 2021; v1 submitted 19 August, 2020; originally announced August 2020.

Comments: Version accepted for publication in the Information and Software Technology journal (Jun, 2021) / CC-BY-NC-ND license / affiliation of last author changed

ACM Class: D.2

arXiv:2007.13891 [pdf, other]

Work Practices and Perceptions from Women Core Developers in OSS Communities

Authors: Edna Dias Canedo, Rodrigo Bonifácio, Márcio Vinícius Okimoto, Alexander Serebrenik, Gustavo Pinto, Eduardo Monteiro

Abstract: The effect of gender diversity in open source communities has gained increasing attention from practitioners and researchers. For instance, organizations such as the Python Software Foundation and the OpenStack Foundation started actions to increase gender diversity and promote women to top positions in the communities. Although the general underrepresentation of women (a.k.a. horizontal segregati… ▽ More The effect of gender diversity in open source communities has gained increasing attention from practitioners and researchers. For instance, organizations such as the Python Software Foundation and the OpenStack Foundation started actions to increase gender diversity and promote women to top positions in the communities. Although the general underrepresentation of women (a.k.a. horizontal segregation) in open source communities has been explored in a number of research studies, little is known about the vertical segregation in open source communities -- which occurs when there are fewer women in high-level positions. To address this research gap, in this paper we present the results of a mixed-methods study on gender diversity and work practices of core developers contributing to open-source communities. In the first study, we used mining-software repositories procedures to identify the core developers of 711 open source projects, in order to understand how common are women core developers in open source communities and characterize their work practices. In the second study, we surveyed the women core developers we identified in the first study to collect their perceptions of gender diversity and gender bias they might have observed while contributing to open source systems. Our findings show that open source communities present both horizontal and vertical segregation (only 2.3% of the core developers are women). Nevertheless, differently from previous studies, most of the women core developers (65.7%) report never having experienced gender discrimination when contributing to an open source project. Finally, we did not note substantial differences between the work practices among women and men core developers. We reflect on these findings and present some ideas that might increase the participation of women in open source communities. △ Less

Submitted 27 July, 2020; originally announced July 2020.

Comments: Preprint of our paper published at ESEM 2020

arXiv:2006.03611 [pdf, other]

Neuropsychiatric Disease Classification Using Functional Connectomics -- Results of the Connectomics in NeuroImaging Transfer Learning Challenge

Authors: Markus D. Schirmer, Archana Venkataraman, Islem Rekik, Minjeong Kim, Stewart H. Mostofsky, Mary Beth Nebel, Keri Rosch, Karen Seymour, Deana Crocetti, Hassna Irzan, Michael Hütel, Sebastien Ourselin, Neil Marlow, Andrew Melbourne, Egor Levchenko, Shuo Zhou, Mwiza Kunda, Hai** Lu, Nicha C. Dvornek, Juntang Zhuang, Gideon Pinto, Sandip Samal, Jennings Zhang, Jorge L. Bernal-Rusiel, Rudolph Pienaar , et al. (1 additional authors not shown)

Abstract: Large, open-source consortium datasets have spurred the development of new and increasingly powerful machine learning approaches in brain connectomics. However, one key question remains: are we capturing biologically relevant and generalizable information about the brain, or are we simply overfitting to the data? To answer this, we organized a scientific challenge, the Connectomics in NeuroImaging… ▽ More Large, open-source consortium datasets have spurred the development of new and increasingly powerful machine learning approaches in brain connectomics. However, one key question remains: are we capturing biologically relevant and generalizable information about the brain, or are we simply overfitting to the data? To answer this, we organized a scientific challenge, the Connectomics in NeuroImaging Transfer Learning Challenge (CNI-TLC), held in conjunction with MICCAI 2019. CNI-TLC included two classification tasks: (1) diagnosis of Attention-Deficit/Hyperactivity Disorder (ADHD) within a pre-adolescent cohort; and (2) transference of the ADHD model to a related cohort of Autism Spectrum Disorder (ASD) patients with an ADHD comorbidity. In total, 240 resting-state fMRI time series averaged according to three standard parcellation atlases, along with clinical diagnosis, were released for training and validation (120 neurotypical controls and 120 ADHD). We also provided demographic information of age, sex, IQ, and handedness. A second set of 100 subjects (50 neurotypical controls, 25 ADHD, and 25 ASD with ADHD comorbidity) was used for testing. Models were submitted in a standardized format as Docker images through ChRIS, an open-source image analysis platform. Utilizing an inclusive approach, we ranked the methods based on 16 different metrics. The final rank was calculated using the rank product for each participant across all measures. Furthermore, we assessed the calibration curves of each method. Five participants submitted their model for evaluation, with one outperforming all other methods in both ADHD and ASD classification. However, further improvements are needed to reach the clinical translation of functional connectomics. We are kee** the CNI-TLC open as a publicly available resource for develo** and validating new classification methodologies in the field of connectomics. △ Less

Submitted 25 November, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

Comments: CNI-TLC was held in conjunction with MICCAI 2019

arXiv:2003.10572 [pdf, other]

Characterizing the Roles of Contributors in Open-source Scientific Software Projects

Authors: Reed Milewicz, Gustavo Pinto, Paige Rodeghero

Abstract: The development of scientific software is, more than ever, critical to the practice of science, and this is accompanied by a trend towards more open and collaborative efforts. Unfortunately, there has been little investigation into who is driving the evolution of such scientific software or how the collaboration happens. In this paper, we address this problem. We present an extensive analysis of s… ▽ More The development of scientific software is, more than ever, critical to the practice of science, and this is accompanied by a trend towards more open and collaborative efforts. Unfortunately, there has been little investigation into who is driving the evolution of such scientific software or how the collaboration happens. In this paper, we address this problem. We present an extensive analysis of seven open-source scientific software projects in order to develop an empirically-informed model of the development process. This analysis was complemented by a survey of 72 scientific software developers. In the majority of the projects, we found senior research staff (e.g. professors) to be responsible for half or more of commits (an average commit share of 72%) and heavily involved in architectural concerns (seniors were more likely to interact with files related to the build system, project meta-data, and developer documentation). Juniors (e.g.graduate students) also contribute substantially -- in one studied project, juniors made almost 100% of its commits. Still, graduate students had the longest contribution periods among juniors (with 1.72 years of commit activity compared to 0.98 years for postdocs and 4 months for undergraduates). Moreover, we also found that third-party contributors are scarce, contributing for just one day for the project. The results from this study aim to help scientists to better understand their own projects, communities, and the contributors' behavior, while paving the road for future software engineering research △ Less

Submitted 23 March, 2020; originally announced March 2020.

Comments: 12 pages

arXiv:2003.10006 [pdf, ps, other]

Rapid Reviews in Software Engineering

Authors: Bruno Cartaxo, Gustavo Pinto, Sergio Soares

Abstract: Integrating research evidence into practice is one of the main goals of Evidence-Based Software Engineering (EBSE). Secondary studies, one of the main EBSE products, are intended to summarize the best research evidence and make them easily consumable by practitioners. However, recent studies show that some secondary studies lack connections with software engineering practice. In this chapter, we p… ▽ More Integrating research evidence into practice is one of the main goals of Evidence-Based Software Engineering (EBSE). Secondary studies, one of the main EBSE products, are intended to summarize the best research evidence and make them easily consumable by practitioners. However, recent studies show that some secondary studies lack connections with software engineering practice. In this chapter, we present the concept of Rapid Reviews, which are lightweight secondary studies focused on delivering evidence to practitioners in a timely manner. Rapid reviews support practitioners in their decision-making, and should be conducted bounded to a practical problem, inserted into a practical context. Thus, Rapid Reviews can be easily integrated in a knowledge/technology transfer initiative. After describing the basic concepts, we present the results and experiences of conducting two Rapid Reviews. We also provide guidelines to help researchers and practitioners who want to conduct Rapid Reviews, and we finally discuss topics that my concern the research community about the feasibility of Rapid Reviews as an Evidence-Based method. In conclusion, we believe Rapid Reviews might interest researchers and practitioners working in the intersection between software engineering research and practice. △ Less

Submitted 22 March, 2020; originally announced March 2020.

Comments: 27 pages

arXiv:2002.00770 [pdf, other]

Analyzing the evolution and diversity of SBES Program Committee

Authors: Fabio Pacheco, Igor Wiese, Bruno Cartaxo, Igor Steinmacher, Gustavo Pinto

Abstract: The Brazilian Symposium on Software Engineering (SBES) is one of the most important Latin American Software Engineering conferences. It was first held in 1987, and in 2019 marks its 33rd edition. Over these years, many researchers have participated in SBES, attending the conference, submitting, and reviewing papers. The researchers who participate in the Program Committee (PC) and perform the revi… ▽ More The Brazilian Symposium on Software Engineering (SBES) is one of the most important Latin American Software Engineering conferences. It was first held in 1987, and in 2019 marks its 33rd edition. Over these years, many researchers have participated in SBES, attending the conference, submitting, and reviewing papers. The researchers who participate in the Program Committee (PC) and perform the reviewers' role are fundamentally important to SBES, since their evaluations (e.g., deciding whether a paper is accepted or not) have the potential of drawing what SBES is now. Knowing that diversity is an important aspect of any group work, we wanted to understand diversity in the SBES PC community. We investigated a number of characteristics of SBES PC members, including their gender and geographic location. We also analyzed the turnover and renovation of the committee. Among the findings, we observed that although the number of participants in the SBES PC has increased over the years, most of them are men (~80%) and from the Southeast and Northeast of Brazil, with very few members from the North region. We also observed that there is a small turnover: during the 2010 decade, only 11% of new members were added to the PC. Finally, we investigated the participation of the PC members publishing papers at SBES. We observed that only 24% of the papers accepted to SBES were authored by members who were not committee members of the respective year. Moreover, committee members usually do not collaborate among themselves: a significant number of the papers are authored by the PC members and students. This paper may contribute to the SBES community, in particular, its special interest group, in understanding the needs and challenges of the PC's participants. △ Less

Submitted 3 February, 2020; originally announced February 2020.

arXiv:2001.00278 [pdf, ps, other]

Motivic clustering schemes for directed graphs

Authors: Facundo Mémoli, Guilherme Vituri F. Pinto

Abstract: Motivated by the concept of network motifs we construct certain clustering methods (functors) which are parametrized by a given collection of motifs (or representers). Motivated by the concept of network motifs we construct certain clustering methods (functors) which are parametrized by a given collection of motifs (or representers). △ Less

Submitted 6 January, 2020; v1 submitted 1 January, 2020; originally announced January 2020.

Comments: 23 pages

arXiv:1907.01602 [pdf, other]

Continuous Integration Theater

Authors: Wagner Felidré, Leonardo Furtado, Daniel da Costa, Bruno Cartaxo, Gustavo Pinto

Abstract: Background: Continuous Integration (CI) systems are now the bedrock of several software development practices. Several tools such as TravisCI, CircleCI, and Hudson, that implement CI practices, are commonly adopted by software engineers. However, the way that software engineers use these tools could lead to what we call "Continuous Integration Theater", a situation in which software engineers do n… ▽ More Background: Continuous Integration (CI) systems are now the bedrock of several software development practices. Several tools such as TravisCI, CircleCI, and Hudson, that implement CI practices, are commonly adopted by software engineers. However, the way that software engineers use these tools could lead to what we call "Continuous Integration Theater", a situation in which software engineers do not employ these tools effectively, leading to unhealthy CI practices. Aims: The goal of this paper is to make sense of how commonplace are these unhealthy continuous integration practices being employed in practice. Method: By inspecting 1,270 open-source projects that use TravisCI, the most used CI service, we quantitatively studied how common is to use CI (1) with infrequent commits, (2) in a software project with poor test coverage, (3) with builds that stay broken for long periods, and (4) with builds that take too long to run. Results: We observed that 748 ($sim$60%) projects face infrequent commits, which essentially makes the merging process harder. Moreover, we were able to find code coverage information for 51 projects. The average code coverage was 78%, although Ruby projects have a higher code coverage than Java projects (86% and 63%, respectively). However, some projects with very small coverage ($sim$4%) were found. Still, we observed that 85% of the studied projects have at least one broken build that take more than four days to be fixed. Interestingly, very small projects (up to 1,000 lines of code) are the ones that take the longest to fix broken builds. Finally, we noted that, for the majority of the studied projects, the build is executed under the 10 minutes rule of thumb. Conclusions: Our results are important to an increasing community of software engineers that employ CI practices on daily basis but may not be aware of bad practices that are eventually employed. △ Less

Submitted 2 July, 2019; originally announced July 2019.

Comments: to appear at ESEM 2019

arXiv:1906.11351 [pdf, other]

Software Engineering Research Community Viewpoints on Rapid Reviews

Authors: Bruno Cartaxo, Gustavo Pinto, Baldoino Fonseca, Márcio Ribeiro, Pedro Pinheiro, Sergio Soares, Maria Teresa Baldassarre

Abstract: Background: One of the most important current challenges of Software Engineering (SE) research is to provide relevant evidence to practice. In health related fields, Rapid Reviews (RRs) have shown to be an effective method to achieve that goal. However, little is known about how the SE research community perceives the potential applicability of RRs. Aims: The goal of this study is to understand th… ▽ More Background: One of the most important current challenges of Software Engineering (SE) research is to provide relevant evidence to practice. In health related fields, Rapid Reviews (RRs) have shown to be an effective method to achieve that goal. However, little is known about how the SE research community perceives the potential applicability of RRs. Aims: The goal of this study is to understand the SE research community viewpoints towards the use of RRs as a means to provide evidence to practitioners. Method: To understand their viewpoints, we invited 37 researchers to analyze 50 opinion statements about RRs, and rate them according to what extent they agree with each statement. Q-Methodology was employed to identify the most salient viewpoints, represented by the so called factors. Results: Four factors were identified: Factor A groups undecided researchers that need more evidence before using RRs; Researchers grouped in Factor B are generally positive about RRs, but highlight the need to define minimum standards; Factor C researchers are more skeptical and reinforce the importance of high quality evidence; Researchers aligned to Factor D have a pragmatic point of view, considering RRs can be applied based on the context and constraints faced by practitioners. Conclusions: In conclusion, although there are opposing viewpoints, there are also some common grounds. For example, all viewpoints agree that both RRs and Systematic Reviews can be poorly or well conducted. △ Less

Submitted 26 June, 2019; originally announced June 2019.

Comments: To appear at ESEM 2019. 12 pages

arXiv:1809.05415 [pdf, ps, other]

doi 10.1145/3239235.3240299

Building a Collaborative Culture: A Grounded Theory of Well Succeeded DevOps Adoption in Practice

Authors: Welder Pinheiro Luz, Gustavo Pinto, Rodrigo Bonifácio

Abstract: Background. DevOps is a set of practices and cultural values that aims to reduce the barriers between development and operations teams. Due to its increasing interest and imprecise definitions, existing research works have tried to characterize DevOps---mainly using a set of concepts and related practices. Aims. Nevertheless, little is known about thepractitioners practitioners' understanding abou… ▽ More Background. DevOps is a set of practices and cultural values that aims to reduce the barriers between development and operations teams. Due to its increasing interest and imprecise definitions, existing research works have tried to characterize DevOps---mainly using a set of concepts and related practices. Aims. Nevertheless, little is known about thepractitioners practitioners' understanding about successful paths for DevOps adoption. The lack of such understanding might hinder institutions to adopt DevOps practices. Therefore, our goal here is to present a theory about DevOps adoption, highlighting the main related concepts that contribute to its adoption in industry. Method. Our work builds upon Classic Grounded Theory. We interviewed practitioners that contributed to DevOps adoption in 15 companies from different domains and across 5 countries. We empirically evaluate our model through a case study, whose goal is to increase the maturity level of DevOps adoption at the Brazilian Federal Court of Accounts, a Brazilian Government institution.Results. This paper presents a model to improve both the understanding and guidance of DevOps adoption. The model increments the existing view of DevOps by explaining the role and motivation of each category (and their relationships) in the DevOps adoption process. We organize this model in terms of DevOps enabler categories and DevOps outcome categories. We provide evidence that collaboration is the core DevOps concern, contrasting with an existing wisdom that implanting specific tools to automate building, deployment, and infrastructure provisioning and management is enough to achieve DevOps. Conclusions. Altogether, our results contribute to (a) generating an adequate understanding of DevOps, from the perspective of practitioners; and (b) assisting other institutions in the migration path towards DevOps adoption. △ Less

Submitted 14 September, 2018; originally announced September 2018.

Comments: 11 pages

arXiv:1807.03863 [pdf, other]

doi 10.1007/978-3-030-17065-3_25

Blockchain-based PKI for Crowdsourced IoT Sensor Information

Authors: Guilherme Pinto, João Pedro Dias, Hugo Sereno Ferreira

Abstract: The Internet of Things is progressively getting broader, evol-ving its scope while creating new markets and adding more to the existing ones. However, both generation and analysis of large amounts of data, which are integral to this concept, may require the proper protection and privacy-awareness of some sensitive information. In order to control the access to this data, allowing devices to verify… ▽ More The Internet of Things is progressively getting broader, evol-ving its scope while creating new markets and adding more to the existing ones. However, both generation and analysis of large amounts of data, which are integral to this concept, may require the proper protection and privacy-awareness of some sensitive information. In order to control the access to this data, allowing devices to verify the reliability of their own interactions with other endpoints of the network is a crucial step to ensure this required safeness. Through the implementation of a blockchain-based Public Key Infrastructure connected to the Keybase platform, it is possible to achieve a simple protocol that binds devices' public keys to their owner accounts, which are respectively supported by identity proofs. The records of this blockchain represent digital signatures performed by this Keybase users on their respective devices' public keys, claiming their ownership. Resorting to this distributed and decentralized PKI, any device is able to autonomously verify the entity in control of a certain node of the network and prevent future interactions with unverified parties. △ Less

Submitted 10 July, 2018; originally announced July 2018.

arXiv:1805.01342 [pdf, other]

Open Source Development Around the World: A Comparative Study

Authors: Thais Mombach, Marco Tulio Valente, Cuiting Chen, Magiel Bruntink, Gustavo Pinto

Abstract: Open source software has an increasing importance in our modern society, providing basic services to other software systems and also supporting the rapid development of a variety of end-user applications. Recently, world-wide code sharing platforms, like GitHub, are also contributing to open source's growth. However, little is known on how this growth is distributed around the world and about the… ▽ More Open source software has an increasing importance in our modern society, providing basic services to other software systems and also supporting the rapid development of a variety of end-user applications. Recently, world-wide code sharing platforms, like GitHub, are also contributing to open source's growth. However, little is known on how this growth is distributed around the world and about the characteristics of the projects developed in different countries. In this article, we provide a characterization of 2,648 open source projects developed in 20 countries. We reveal the number of projects per country, the popularity and programming language of each country's project and also show how the number of projects in a country correlates to its GDP. Finally, we assess the maintainability and internal code quality of the studied projects, using a tool called BetterCodeHub. △ Less

Submitted 3 May, 2018; originally announced May 2018.

Comments: 11 pages, 8 pages

Showing 1–38 of 38 results for author: Pinto, G