Search | arXiv e-print repository

arXiv:2010.00822 [pdf, other]

doi 10.1109/TSE.2021.3092813

Including Everyone, Everywhere: Understanding Opportunities and Challenges of Geographic Gender-Inclusion in OSS

Authors: Gede Artha Azriadi Prana, Denae Ford, Ayushi Rastogi, David Lo, Rahul Purandare, Nachiappan Nagappan

Abstract: The gender gap is a significant concern facing the software industry as the development becomes more geographically distributed. Widely shared reports indicate that gender differences may be specific to each region. However, how complete can these reports be with little to no research reflective of the Open Source Software (OSS) process and communities software is now commonly developed in? Our st… ▽ More The gender gap is a significant concern facing the software industry as the development becomes more geographically distributed. Widely shared reports indicate that gender differences may be specific to each region. However, how complete can these reports be with little to no research reflective of the Open Source Software (OSS) process and communities software is now commonly developed in? Our study presents a multi-region geographical analysis of gender inclusion on GitHub. This mixed-methods approach includes quantitatively investigating differences in gender inclusion in projects across geographic regions and investigate these trends over time using data from contributions to 21,456 project repositories. We also qualitatively understand the unique experiences of developers contributing to these projects through a survey that is strategically targeted to developers in various regions worldwide. Our findings indicate that gender diversity is low across all parts of the world, with no substantial difference across regions. However, there has been statistically significant improvement in diversity worldwide since 2014, with certain regions such as Africa improving at faster pace. We also find that most motivations and barriers to contributions (e.g., lack of resources to contribute and poor working environment) were shared across regions, however, some insightful differences, such as how to make projects more inclusive, did arise. From these findings, we derive and present implications for tools that can foster inclusion in open source software communities and empower contributions from everyone, everywhere. △ Less

Submitted 15 September, 2021; v1 submitted 2 October, 2020; originally announced October 2020.

Comments: 19 pages, 16 tables, 3 figures, Includes appendices

Journal ref: IEEE Transactions on Software Engineering 2021

arXiv:1912.07352 [pdf, other]

Analyzing Offline Social Engagements: An Empirical Study of Meetup Events Related to Software Development

Authors: Abhishek Sharma, Gede Artha Azriadi Prana, Anamika Sawhney, Nachiappan Nagappan, David Lo

Abstract: Software developers use a variety of social media channels and tools in order to keep themselves up to date, collaborate with other developers, and find projects to contribute to. Meetup is one of such social media used by software developers to organize community gatherings. Liu et al. characterized Meetup as an event-based social network (EBSN) which contains valuable offline social interactions… ▽ More Software developers use a variety of social media channels and tools in order to keep themselves up to date, collaborate with other developers, and find projects to contribute to. Meetup is one of such social media used by software developers to organize community gatherings. Liu et al. characterized Meetup as an event-based social network (EBSN) which contains valuable offline social interactions in addition to online interactions. Recently, Storey et al. found out that Meetup was one of the social channels used by developers. We in this work investigate in detail the dynamics of Meetup groups and events related to software development, which has not been done in any of the previous works. First, we identified 6,317 Meetup groups related to software development and extracted 185,758 events organized by them. Then we took a statistically significant sample of 452 events on which we performed open coding, based on which we were able to develop 9 categories of events (8 main categories + Others). Next, we did a popularity analysis of the categories of events and found that Talks by Domain Experts, Hands-on Sessions, and Open Discussions are the most popular categories of events organized by Meetup groups related to software development. Our findings show that more popular categories are those where developers can learn and gain knowledge. On doing a diversity analysis of Meetup groups we found 19.82% of the members on an average are female, which is a larger proportion as compared to numbers reported in previous studies on other social media. From a broader software development community point of view information from this new forum can be valuable to identify and understand emerging topics and associations among them which can be helpful to identify future trends as well as current best practices. △ Less

Submitted 16 December, 2019; originally announced December 2019.

arXiv:1810.13144 [pdf, other]

SIEVE: Hel** Developers Sift Wheat from Chaff via Cross-Platform Analysis

Authors: Agus Sulistya, Gede Artha Azriadi Prana, Abhishek Sharma, David Lo, Christoph Treude

Abstract: Software developers have benefited from various sources of knowledge such as forums, question-and-answer sites, and social media platforms to help them in various tasks. Extracting software-related knowledge from different platforms involves many challenges. In this paper, we propose an approach to improve the effectiveness of knowledge extraction tasks by performing cross-platform analysis. Our a… ▽ More Software developers have benefited from various sources of knowledge such as forums, question-and-answer sites, and social media platforms to help them in various tasks. Extracting software-related knowledge from different platforms involves many challenges. In this paper, we propose an approach to improve the effectiveness of knowledge extraction tasks by performing cross-platform analysis. Our approach is based on transfer representation learning and word embeddings, leveraging information extracted from a source platform which contains rich domain-related content. The information extracted is then used to solve tasks in another platform (considered as target platform) with less domain-related contents. We first build a word embeddings model as a representation learned from the source platform, and use the model to improve the performance of knowledge extraction tasks in the target platform. We experiment with Software Engineering Stack Exchange and Stack Overflow as source platforms, and two different target platforms, i.e., Twitter and YouTube. Our experiments show that our approach improves performance of existing work for the tasks of identifying software-related tweets and helpful YouTube comments. △ Less

Submitted 31 October, 2018; originally announced October 2018.

arXiv:1802.06997 [pdf, other]

Categorizing the Content of GitHub README Files

Authors: Gede Artha Azriadi Prana, Christoph Treude, Ferdian Thung, Thushari Atapattu, David Lo

Abstract: README files play an essential role in sha** a developer's first impression of a software repository and in documenting the software project that the repository hosts. Yet, we lack a systematic understanding of the content of a typical README file as well as tools that can process these files automatically. To close this gap, we conduct a qualitative study involving the manual annotation of 4,22… ▽ More README files play an essential role in sha** a developer's first impression of a software repository and in documenting the software project that the repository hosts. Yet, we lack a systematic understanding of the content of a typical README file as well as tools that can process these files automatically. To close this gap, we conduct a qualitative study involving the manual annotation of 4,226 README file sections from 393 randomly sampled GitHub repositories and we design and evaluate a classifier and a set of features that can categorize these sections automatically. We find that information discussing the `What' and `How' of a repository is very common, while many README files lack information regarding the purpose and status of a repository. Our multi-label classifier which can predict eight different categories achieves an F1 score of 0.746. To evaluate the usefulness of the classification, we used the automatically determined classes to label sections in GitHub README files using badges and showed files with and without these badges to twenty software professionals. The majority of participants perceived the automated labeling of sections based on our classifier to ease information discovery. This work enables the owners of software repositories to improve the quality of their documentation and it has the potential to make it easier for the software development community to discover relevant information in GitHub README files. △ Less

Submitted 30 July, 2018; v1 submitted 20 February, 2018; originally announced February 2018.

Showing 1–4 of 4 results for author: Prana, G A A