Search | arXiv e-print repository

An Empirical Study of Developers' Challenges in Implementing Workflows as Code: A Case Study on Apache Airflow

Authors: Jerin Yasmin, Jiale Wang, Yuan Tian, Bram Adams

Abstract: The Workflows as Code paradigm is becoming increasingly essential to streamline the design and management of complex processes within data-intensive software systems. These systems require robust capabilities to process, analyze, and extract insights from large datasets. Workflow orchestration platforms such as Apache Airflow are pivotal in meeting these needs, as they effectively support the impl… ▽ More The Workflows as Code paradigm is becoming increasingly essential to streamline the design and management of complex processes within data-intensive software systems. These systems require robust capabilities to process, analyze, and extract insights from large datasets. Workflow orchestration platforms such as Apache Airflow are pivotal in meeting these needs, as they effectively support the implementation of the Workflows as Code paradigm. Nevertheless, despite its considerable advantages, developers still face challenges due to the specialized demands of workflow orchestration and the complexities of distributed execution environments. In this paper, we manually study 1,000 sampled Stack Overflow posts derived from 9,591 Airflow-related questions to understand developers' challenges and root causes while implementing Workflows as Code. Our analysis results in a hierarchical taxonomy of Airflow-related challenges that contains 7 high-level categories and 14 sub-categories. We find that the most significant obstacles for developers arise when defining and executing their workflow. Our in-depth analysis identifies 10 root causes behind the challenges, including incorrect workflow configuration, complex environmental setup, and a lack of basic knowledge about Airflow and the external systems that it interacts with. Additionally, our analysis of references shared within the collected posts reveals that beyond the frequently cited Airflow documentation, documentation from external systems and third-party providers is also commonly referenced to address Airflow-related challenges. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: This is the preprint version of a paper that has been submitted to the Journal of Systems and Software

arXiv:2405.00796 [pdf, other]

Does Using Bazel Help Speed Up Continuous Integration Builds?

Authors: Shenyu Zheng, Bram Adams, Ahmed E. Hassan

Abstract: A long continuous integration (CI) build forces developers to wait for CI feedback before starting subsequent development activities, leading to time wasted. In addition to a variety of build scheduling and test selection heuristics studied in the past, new artifact-based build technologies like Bazel have built-in support for advanced performance optimizations such as parallel build and increment… ▽ More A long continuous integration (CI) build forces developers to wait for CI feedback before starting subsequent development activities, leading to time wasted. In addition to a variety of build scheduling and test selection heuristics studied in the past, new artifact-based build technologies like Bazel have built-in support for advanced performance optimizations such as parallel build and incremental build (caching of build results). However, little is known about the extent to which new build technologies like Bazel deliver on their promised benefits, especially for long-build duration projects. In this study, we collected 383 Bazel projects from GitHub, then studied their parallel and incremental build usage of Bazel in 4 popular CI services, and compared the results with Maven projects. We conducted 3,500 experiments on 383 Bazel projects and analyzed the build logs of a subset of 70 buildable projects to evaluate the performance impact of Bazel's parallel builds. Additionally, we performed 102,232 experiments on the 70 buildable projects' last 100 commits to evaluate Bazel's incremental build performance. Our results show that 31.23% of Bazel projects adopt a CI service but do not use Bazel in the CI service, while for those who do use Bazel in CI, 27.76% of them use other tools to facilitate Bazel's execution. Compared to sequential builds, the median speedups for long-build duration projects are 2.00x, 3.84x, 7.36x, and 12.80x, at parallelism degrees 2, 4, 8, and 16, respectively, even though, compared to a clean build, applying incremental build achieves a median speedup of 4.22x (with a build system tool-independent CI cache) and 4.71x (with a build system tool-specific cache) for long-build duration projects. Our results provide guidance for developers to improve the usage of Bazel in their projects. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.15632 [pdf, other]

Non-Fungible Programs: Private Full-Stack Applications for Web3

Authors: Blake Regalia, Benjamin Adams

Abstract: The greatest advantage that Web3 applications offer over Web 2.0 is the evolution of the data access layer. Opaque, centralized services that compelled trust from users are replaced by trustless, decentralized systems of smart contracts. However, the public nature of blockchain-based databases, on which smart contracts transact, has typically presented a challenge for applications that depend on d… ▽ More The greatest advantage that Web3 applications offer over Web 2.0 is the evolution of the data access layer. Opaque, centralized services that compelled trust from users are replaced by trustless, decentralized systems of smart contracts. However, the public nature of blockchain-based databases, on which smart contracts transact, has typically presented a challenge for applications that depend on data privacy or that rely on participants having incomplete information. This has changed with the introduction of confidential smart contract networks that encrypt the memory state of active contracts as well as their databases stored on-chain. With confidentiality, contracts can more readily implement novel interaction mechanisms that were previously infeasible. Meanwhile, in both Web 2.0 and Web3 applications the user interface continues to play a crucial role in translating user intent into actionable requests. In many cases, developers have shifted intelligence and autonomy into the client-side, leveraging Web technologies for compute, graphics, and networking. Web3's reliance on such frontends has revealed a pain point though, namely that decentralized applications are not accessible to end users without a persistent host serving the application. Here we introduce the Non-Fungible Program (NFP) model for develo** self-contained frontend applications that are distributed via blockchain, powered by Web technology, and backed by private databases persisted in encrypted smart contracts. Access to frontend code, as well as backend services, is controlled and guaranteed by smart contracts according to the NFT ownership model, eliminating the need for a separate host. By extension, NFP applications bring interactivity to token owners and enable new functionalities, such as authorization mechanisms for oracles, supplementary Web services, and overlay networks in a secure manner. In addition... △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 13 pages, 2 figures. ChainScience 2024

Report number: Chainsci/2024/04

arXiv:2404.10703 [pdf, other]

doi 10.1145/3660806

An Empirical Study on Code Review Activity Prediction and Its Impact in Practice

Authors: Doriane Olewicki, Sarra Habchi, Bram Adams

Abstract: During code reviews, an essential step in software quality assurance, reviewers have the difficult task of understanding and evaluating code changes to validate their quality and prevent introducing faults to the codebase. This is a tedious process where the effort needed is highly dependent on the code submitted, as well as the author's and the reviewer's experience, leading to median wait times… ▽ More During code reviews, an essential step in software quality assurance, reviewers have the difficult task of understanding and evaluating code changes to validate their quality and prevent introducing faults to the codebase. This is a tedious process where the effort needed is highly dependent on the code submitted, as well as the author's and the reviewer's experience, leading to median wait times for review feedback of 15-64 hours. Through an initial user study carried with 29 experts, we found that re-ordering the files changed by a patch within the review environment has potential to improve review quality, as more comments are written (+23%), and participants' file-level hot-spot precision and recall increases to 53% (+13%) and 28% (+8%), respectively, compared to the alphanumeric ordering. Hence, this paper aims to help code reviewers by predicting which files in a submitted patch need to be (1) commented, (2) revised, or (3) are hot-spots (commented or revised). To predict these tasks, we evaluate two different types of text embeddings (i.e., Bag-of-Words and Large Language Models encoding) and review process features (i.e., code size-based and history-based features). Our empirical study on three open-source and two industrial datasets shows that combining the code embedding and review process features leads to better results than the state-of-the-art approach. For all tasks, F1-scores (median of 40-62%) are significantly better than the state-of-the-art (from +1 to +9%). △ Less

Submitted 13 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: 20 pages + 3 pages ref

Journal ref: FSE 2024

arXiv:2403.18958 [pdf, other]

A State-of-the-practice Release-readiness Checklist for Generative AI-based Software Products

Authors: Harsh Patel, Dominique Boucher, Emad Fallahzadeh, Ahmed E. Hassan, Bram Adams

Abstract: This paper investigates the complexities of integrating Large Language Models (LLMs) into software products, with a focus on the challenges encountered for determining their readiness for release. Our systematic review of grey literature identifies common challenges in deploying LLMs, ranging from pre-training and fine-tuning to user experience considerations. The study introduces a comprehensive… ▽ More This paper investigates the complexities of integrating Large Language Models (LLMs) into software products, with a focus on the challenges encountered for determining their readiness for release. Our systematic review of grey literature identifies common challenges in deploying LLMs, ranging from pre-training and fine-tuning to user experience considerations. The study introduces a comprehensive checklist designed to guide practitioners in evaluating key release readiness aspects such as performance, monitoring, and deployment strategies, aiming to enhance the reliability and effectiveness of LLM-based applications in real-world settings. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.17214 [pdf, other]

doi 10.1145/3650105.3652301

Exploring the Impact of the Output Format on the Evaluation of Large Language Models for Code Translation

Authors: Marcos Macedo, Yuan Tian, Filipe R. Cogo, Bram Adams

Abstract: Code translation between programming languages is a long-existing and critical task in software engineering, facilitating the modernization of legacy systems, ensuring cross-platform compatibility, and enhancing software performance. With the recent advances in large language models (LLMs) and their applications to code translation, there is an increasing need for comprehensive evaluation of these… ▽ More Code translation between programming languages is a long-existing and critical task in software engineering, facilitating the modernization of legacy systems, ensuring cross-platform compatibility, and enhancing software performance. With the recent advances in large language models (LLMs) and their applications to code translation, there is an increasing need for comprehensive evaluation of these models. In this study, we empirically analyze the generated outputs of eleven popular instruct-tuned LLMs with parameters ranging from 1B up to 46.7B on 3,820 translation pairs across five languages, including C, C++, Go, Java, and Python. Our analysis found that between 26.4% and 73.7% of code translations produced by our evaluated LLMs necessitate post-processing, as these translations often include a mix of code, quotes, and text rather than being purely source code. Overlooking the output format of these models can inadvertently lead to underestimation of their actual performance. This is particularly evident when evaluating them with execution-based metrics such as Computational Accuracy (CA). Our results demonstrate that a strategic combination of prompt engineering and regular expression can effectively extract the source code from the model generation output. In particular, our method can help eleven selected models achieve an average Code Extraction Success Rate (CSR) of 92.73%. Our findings shed light on and motivate future research to conduct more reliable benchmarks of LLMs for code translation. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: Accepted into 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering (Forge)

arXiv:2403.17154 [pdf, other]

On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance

Authors: Jaskirat Singh, Bram Adams, Ahmed E. Hassan

Abstract: Deciding what combination of operators to use across the Edge AI tiers to achieve specific latency and model performance requirements is an open question for MLOps engineers. This study aims to empirically assess the accuracy vs inference time trade-off of different black-box Edge AI deployment strategies, i.e., combinations of deployment operators and deployment tiers. In this paper, we conduct i… ▽ More Deciding what combination of operators to use across the Edge AI tiers to achieve specific latency and model performance requirements is an open question for MLOps engineers. This study aims to empirically assess the accuracy vs inference time trade-off of different black-box Edge AI deployment strategies, i.e., combinations of deployment operators and deployment tiers. In this paper, we conduct inference experiments involving 3 deployment operators (i.e., Partitioning, Quantization, Early Exit), 3 deployment tiers (i.e., Mobile, Edge, Cloud) and their combinations on four widely used Computer-Vision models to investigate the optimal strategies from the point of view of MLOps developers. Our findings suggest that Edge deployment using the hybrid Quantization + Early Exit operator could be preferred over non-hybrid operators (Quantization/Early Exit on Edge, Partition on Mobile-Edge) when faster latency is a concern at medium accuracy loss. However, when minimizing accuracy loss is a concern, MLOps engineers should prefer using only a Quantization operator on edge at a latency reduction or increase, respectively over the Early Exit/Partition (on edge/mobile-edge) and Quantized Early Exit (on edge) operators. In scenarios constrained by Mobile CPU/RAM resources, a preference for Partitioning across mobile and edge tiers is observed over mobile deployment. For models with smaller input data samples (such as FCN), a network-constrained cloud deployment can also be a better alternative than Mobile/Edge deployment and Partitioning strategies. For models with large input data samples (ResNet, ResNext, DUC), an edge tier having higher network/computational capabilities than Cloud/Mobile can be a more viable option than Partitioning and Mobile/Cloud deployment strategies. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.11025 [pdf, other]

Pre-Trained Language Models Represent Some Geographic Populations Better Than Others

Authors: Jonathan Dunn, Benjamin Adams, Harish Tayyar Madabushi

Abstract: This paper measures the skew in how well two families of LLMs represent diverse geographic populations. A spatial probing task is used with geo-referenced corpora to measure the degree to which pre-trained language models from the OPT and BLOOM series represent diverse populations around the world. Results show that these models perform much better for some populations than others. In particular,… ▽ More This paper measures the skew in how well two families of LLMs represent diverse geographic populations. A spatial probing task is used with geo-referenced corpora to measure the degree to which pre-trained language models from the OPT and BLOOM series represent diverse populations around the world. Results show that these models perform much better for some populations than others. In particular, populations across the US and the UK are represented quite well while those in South and Southeast Asia are poorly represented. Analysis shows that both families of models largely share the same skew across populations. At the same time, this skew cannot be fully explained by sociolinguistic factors, economic factors, or geographic factors. The basic conclusion from this analysis is that pre-trained models do not equally represent the world's population: there is a strong skew towards specific geographic populations. This finding challenges the idea that a single model can be used for all populations. △ Less

Submitted 16 March, 2024; originally announced March 2024.

arXiv:2402.15990 [pdf, other]

doi 10.1007/s10664-024-10474-4

An Empirical Study of Challenges in Machine Learning Asset Management

Authors: Zhimin Zhao, Yihao Chen, Abdul Ali Bangash, Bram Adams, Ahmed E. Hassan

Abstract: In machine learning (ML), efficient asset management, including ML models, datasets, algorithms, and tools, is vital for resource optimization, consistent performance, and a streamlined development lifecycle. This enables quicker iterations, adaptability, reduced development-to-deployment time, and reliable outputs. Despite existing research, a significant knowledge gap remains in operational chal… ▽ More In machine learning (ML), efficient asset management, including ML models, datasets, algorithms, and tools, is vital for resource optimization, consistent performance, and a streamlined development lifecycle. This enables quicker iterations, adaptability, reduced development-to-deployment time, and reliable outputs. Despite existing research, a significant knowledge gap remains in operational challenges like model versioning, data traceability, and collaboration, which are crucial for the success of ML projects. Our study aims to address this gap by analyzing 15,065 posts from developer forums and platforms, employing a mixed-method approach to classify inquiries, extract challenges using BERTopic, and identify solutions through open card sorting and BERTopic clustering. We uncover 133 topics related to asset management challenges, grouped into 16 macro-topics, with software dependency, model deployment, and model training being the most discussed. We also find 79 solution topics, categorized under 18 macro-topics, highlighting software dependency, feature development, and file management as key solutions. This research underscores the need for further exploration of identified pain points and the importance of collaborative efforts across academia, industry, and the research community. △ Less

Submitted 28 February, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

Journal ref: Empirical Software Engineering 2024

arXiv:2312.15350 [pdf, other]

Why Not Mitigate Vulnerabilities in Helm Charts?

Authors: Yihao Chen, Jiahuei Lin, Bram Adams, Ahmed E. Hassan

Abstract: [Context]: Containerization ensures the resilience of distributed applications by Kubernetes. Helm is a package manager for Kubernetes applications. A Helm package, namely "Chart'', is a set of pre-configured resources that one could quickly deploy a complex application. However, Helm broadens the attack surface of the distributed applications. [Objective]: This study aims to investigate the pre… ▽ More [Context]: Containerization ensures the resilience of distributed applications by Kubernetes. Helm is a package manager for Kubernetes applications. A Helm package, namely "Chart'', is a set of pre-configured resources that one could quickly deploy a complex application. However, Helm broadens the attack surface of the distributed applications. [Objective]: This study aims to investigate the prevalence of fixable vulnerabilities, the factors related to the vulnerabilities, and current mitigation strategies in Helm Charts. [Method]: We conduct a mixed-methods study on 11,035 Helm Charts affected by 10,982 fixable vulnerabilities. We analyze the complexity of Charts and compare the distribution of vulnerabilities between official and unofficial Charts. Subsequently, we investigate vulnerability mitigation strategies from the Chart-associated repositories by a grounded theory. [Results]: Our findings highlight that the complexity of a Chart correlates with the number of vulnerabilities, and the official Charts do not contain fewer vulnerabilities compared to unofficial Charts. The 10,982 fixable vulnerabilities are at a median of high severity and can be easily exploited. In addition, we identify 11 vulnerability mitigation strategies in three categories. Due to the complexity of Charts, maintainers are required to investigate where a vulnerability impacts and how to mitigate it. The use of automated strategies is low as automation has limited capability(e.g., a higher number of false positives) in such complex Charts. [Conclusion]: There exists need for automation tools that assist maintainers in mitigating vulnerabilities to reduce manual effort. In addition, Chart maintainers lack incentives to mitigate vulnerabilities, given a lack of guidelines for mitigation responsibilities. Adopting a shared responsibility model in the Helm ecosystem would increase its security. △ Less

Submitted 23 December, 2023; originally announced December 2023.

arXiv:2312.15058 [pdf, other]

doi 10.1109/MS.2024.3366111

The State of Documentation Practices of Third-party Machine Learning Models and Datasets

Authors: Ernesto Lang Oreamuno, Rohan Faiyaz Khan, Abdul Ali Bangash, Catherine Stinson, Bram Adams

Abstract: Model stores offer third-party ML models and datasets for easy project integration, minimizing coding efforts. One might hope to find detailed specifications of these models and datasets in the documentation, leveraging documentation standards such as model and dataset cards. In this study, we use statistical analysis and hybrid card sorting to assess the state of the practice of documenting model… ▽ More Model stores offer third-party ML models and datasets for easy project integration, minimizing coding efforts. One might hope to find detailed specifications of these models and datasets in the documentation, leveraging documentation standards such as model and dataset cards. In this study, we use statistical analysis and hybrid card sorting to assess the state of the practice of documenting model cards and dataset cards in one of the largest model stores in use today--Hugging Face (HF). Our findings show that only 21,902 models (39.62\%) and 1,925 datasets (28.48\%) have documentation. Furthermore, we observe inconsistency in ethics and transparency-related documentation for ML models and datasets. △ Less

Submitted 22 December, 2023; originally announced December 2023.

Comments: 7 pages, 4 figures, IEEESoftware format

Journal ref: IEEE Software 2024

arXiv:2311.12019 [pdf, other]

An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software

Authors: Aaditya Bhatia, Foutse Khomh, Bram Adams, Ahmed E Hassan

Abstract: The emergence of open-source ML libraries such as TensorFlow and Google Auto ML has enabled developers to harness state-of-the-art ML algorithms with minimal overhead. However, during this accelerated ML development process, said developers may often make sub-optimal design and implementation decisions, leading to the introduction of technical debt that, if not addressed promptly, can have a signi… ▽ More The emergence of open-source ML libraries such as TensorFlow and Google Auto ML has enabled developers to harness state-of-the-art ML algorithms with minimal overhead. However, during this accelerated ML development process, said developers may often make sub-optimal design and implementation decisions, leading to the introduction of technical debt that, if not addressed promptly, can have a significant impact on the quality of the ML-based software. Developers frequently acknowledge these sub-optimal design and development choices through code comments during software development. These comments, which often highlight areas requiring additional work or refinement in the future, are known as self-admitted technical debt (SATD). This paper aims to investigate SATD in ML code by analyzing 318 open-source ML projects across five domains, along with 318 non-ML projects. We detected SATD in source code comments throughout the different project snapshots, conducted a manual analysis of the identified SATD sample to comprehend the nature of technical debt in the ML code, and performed a survival analysis of the SATD to understand the evolution of such debts. We observed: i) Machine learning projects have a median percentage of SATD that is twice the median percentage of SATD in non-machine learning projects. ii) ML pipeline components for data preprocessing and model generation logic are more susceptible to debt than model validation and deployment components. iii) SATDs appear in ML projects earlier in the development process compared to non-ML projects. iv) Long-lasting SATDs are typically introduced during extensive code changes that span multiple files exhibiting low complexity. △ Less

Submitted 9 June, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

arXiv:2308.10452 [pdf, other]

doi 10.6084/m9.figshare.20686324.v1

Comparing Measures of Linguistic Diversity Across Social Media Language Data and Census Data at Subnational Geographic Areas

Authors: Sidney G. -J. Wong, Jonathan Dunn, Benjamin Adams

Abstract: This paper describes a preliminary study on the comparative linguistic ecology of online spaces (i.e., social media language data) and real-world spaces in Aotearoa New Zealand (i.e., subnational administrative areas). We compare measures of linguistic diversity between these different spaces and discuss how social media users align with real-world populations. The results from the current study s… ▽ More This paper describes a preliminary study on the comparative linguistic ecology of online spaces (i.e., social media language data) and real-world spaces in Aotearoa New Zealand (i.e., subnational administrative areas). We compare measures of linguistic diversity between these different spaces and discuss how social media users align with real-world populations. The results from the current study suggests that there is potential to use online social media language data to observe spatial and temporal changes in linguistic diversity at subnational geographic areas; however, further work is required to understand how well social media represents real-world behaviour. △ Less

Submitted 20 August, 2023; originally announced August 2023.

arXiv:2308.10370 [pdf, ps, other]

cantnlp@LT-EDI-2023: Homophobia/Transphobia Detection in Social Media Comments using Spatio-Temporally Retrained Language Models

Authors: Sidney G. -J. Wong, Matthew Durward, Benjamin Adams, Jonathan Dunn

Abstract: This paper describes our multiclass classification system developed as part of the LTEDI@RANLP-2023 shared task. We used a BERT-based language model to detect homophobic and transphobic content in social media comments across five language conditions: English, Spanish, Hindi, Malayalam, and Tamil. We retrained a transformer-based crosslanguage pretrained language model, XLMRoBERTa, with spatially… ▽ More This paper describes our multiclass classification system developed as part of the LTEDI@RANLP-2023 shared task. We used a BERT-based language model to detect homophobic and transphobic content in social media comments across five language conditions: English, Spanish, Hindi, Malayalam, and Tamil. We retrained a transformer-based crosslanguage pretrained language model, XLMRoBERTa, with spatially and temporally relevant social media language data. We also retrained a subset of models with simulated script-mixed social media language data with varied performance. We developed the best performing seven-label classification system for Malayalam based on weighted macro averaged F1 score (ranked first out of six) with variable performance for other language and class-label conditions. We found the inclusion of this spatio-temporal data improved the classification performance for all language and task conditions when compared with the baseline. The results suggests that transformer-based language classification systems are sensitive to register-specific and language-specific retraining. △ Less

Submitted 24 August, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

arXiv:2307.13463 [pdf, other]

doi 10.1109/JPROC.2023.3273517

Unlocking the Emotional World of Visual Media: An Overview of the Science, Research, and Impact of Understanding Emotion

Authors: James Z. Wang, Sicheng Zhao, Chenyan Wu, Reginald B. Adams, Michelle G. Newman, Tal Shafir, Rachelle Tsachor

Abstract: The emergence of artificial emotional intelligence technology is revolutionizing the fields of computers and robotics, allowing for a new level of communication and understanding of human behavior that was once thought impossible. While recent advancements in deep learning have transformed the field of computer vision, automated understanding of evoked or expressed emotions in visual media remains… ▽ More The emergence of artificial emotional intelligence technology is revolutionizing the fields of computers and robotics, allowing for a new level of communication and understanding of human behavior that was once thought impossible. While recent advancements in deep learning have transformed the field of computer vision, automated understanding of evoked or expressed emotions in visual media remains in its infancy. This foundering stems from the absence of a universally accepted definition of "emotion", coupled with the inherently subjective nature of emotions and their intricate nuances. In this article, we provide a comprehensive, multidisciplinary overview of the field of emotion analysis in visual media, drawing on insights from psychology, engineering, and the arts. We begin by exploring the psychological foundations of emotion and the computational principles that underpin the understanding of emotions from images and videos. We then review the latest research and systems within the field, accentuating the most promising approaches. We also discuss the current technological challenges and limitations of emotion analysis, underscoring the necessity for continued investigation and innovation. We contend that this represents a "Holy Grail" research problem in computing and delineate pivotal directions for future inquiry. Finally, we examine the ethical ramifications of emotion-understanding technologies and contemplate their potential societal impacts. Overall, this article endeavors to equip readers with a deeper understanding of the domain of emotion analysis in visual media and to inspire further research and development in this captivating and rapidly evolving field. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: Proceedings of the IEEE 2023

arXiv:2306.15851 [pdf]

Image-based Communication on Social Coding Platforms

Authors: Maleknaz Nayebi, Bram Adams

Abstract: Visual content in the form of images and videos has taken over general-purpose social networks in a variety of ways, streamlining and enriching online communications. We are interested to understand if and to what extent the use of images is popular and helpful in social coding platforms. We mined nine years of data from two popular software developers' platforms: the Mozilla issue tracking system… ▽ More Visual content in the form of images and videos has taken over general-purpose social networks in a variety of ways, streamlining and enriching online communications. We are interested to understand if and to what extent the use of images is popular and helpful in social coding platforms. We mined nine years of data from two popular software developers' platforms: the Mozilla issue tracking system, i.e., Bugzilla, and the most well-known platform for developers' Q/A, i.e., Stack Overflow. We further triangulated and extended our mining results by performing a survey with 168 software developers. We observed that, between 2013 and 2022, the number of posts containing image data on Bugzilla and Stack Overflow doubled. Furthermore, we found that sharing images makes other developers engage more and faster with the content. In the majority of cases in which an image is included in a developer's post, the information in that image is complementary to the text provided. Finally, our results showed that when an image is shared, understanding the content without the information in the image is unlikely for 86.9\% of the cases. Based on these observations, we discuss the importance of considering visual content when analyzing developers and designing automation tools. △ Less

Submitted 27 June, 2023; originally announced June 2023.

arXiv:2305.09824 [pdf, other]

doi 10.1145/3639477.3639717

On the Costs and Benefits of Adopting Lifelong Learning for Software Analytics -- Empirical Study on Brown Build and Risk Prediction

Authors: Doriane Olewicki, Sarra Habchi, Mathieu Nayrolles, Mojtaba Faramarzi, Sarath Chandar, Bram Adams

Abstract: Nowadays, software analytics tools using machine learning (ML) models to, for example, predict the risk of a code change are well established. However, as the goals of a project shift over time, and developers and their habits change, the performance of said models tends to degrade (drift) over time. Current retraining practices typically require retraining a new model from scratch on a large upda… ▽ More Nowadays, software analytics tools using machine learning (ML) models to, for example, predict the risk of a code change are well established. However, as the goals of a project shift over time, and developers and their habits change, the performance of said models tends to degrade (drift) over time. Current retraining practices typically require retraining a new model from scratch on a large updated dataset when performance decay is observed, thus incurring a computational cost; also there is no continuity between the models as the past model is discarded and ignored during the new model training. Even though the literature has taken interest in online learning approaches, those have rarely been integrated and evaluated in industrial environments. This paper evaluates the use of lifelong learning (LL) for industrial use cases at Ubisoft, evaluating both the performance and the required computational effort in comparison to the retraining-from-scratch approaches commonly used by the industry. LL is used to continuously build and maintain ML-based software analytics tools using an incremental learner that progressively updates the old model using new data. To avoid so-called "catastrophic forgetting" of important older data points, we adopt a replay buffer of older data, which still allows us to drastically reduce the size of the overall training dataset, and hence model training time. △ Less

Submitted 12 February, 2024; v1 submitted 16 May, 2023; originally announced May 2023.

Journal ref: 46th International Conference on Software Engineering: Software Engineering in Practice 2024

arXiv:2305.06426 [pdf, other]

Planning a Community Approach to Diabetes Care in Low- and Middle-Income Countries Using Optimization

Authors: Katherine B. Adams, Justin J. Boutilier, Sarang Deo, Yonatan Mintz

Abstract: Diabetes is a global health priority, especially in low- and-middle-income countries, where over 50% of premature deaths are attributed to high blood glucose. Several studies have demonstrated the feasibility of using Community Health Worker (CHW) programs to provide affordable and culturally tailored solutions for early detection and management of diabetes. Yet, scalable models to design and impl… ▽ More Diabetes is a global health priority, especially in low- and-middle-income countries, where over 50% of premature deaths are attributed to high blood glucose. Several studies have demonstrated the feasibility of using Community Health Worker (CHW) programs to provide affordable and culturally tailored solutions for early detection and management of diabetes. Yet, scalable models to design and implement CHW programs while accounting for screening, management, and patient enrollment decisions have not been proposed. We introduce an optimization framework to determine personalized CHW visits that maximize glycemic control at a community-level. Our framework explicitly models the trade-off between screening new patients and providing management visits to individuals who are already enrolled in treatment. We account for patients' motivational states, which affect their decisions to enroll or drop out of treatment and, therefore, the effectiveness of the intervention. We incorporate these decisions by modeling patients as utility-maximizing agents within a bi-level provider problem that we solve using approximate dynamic programming. By estimating patients' health and motivational states, our model builds visit plans that account for patients' tradeoffs when deciding to enroll in treatment, leading to reduced dropout rates and improved resource allocation. We apply our approach to generate CHW visit plans using operational data from a social enterprise serving low-income neighborhoods in urban areas of India. Through extensive simulation experiments, we find that our framework requires up to 73.4% less capacity than the best naive policy to achieve the same performance in terms of glycemic control. Our experiments also show that our solution algorithm can improve upon naive policies by up to 124.5% using the same CHW capacity. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: 47 pages, 11 figures

arXiv:2304.08426 [pdf, other]

Understanding the Time to First Response In GitHub Pull Requests

Authors: Kazi Amit Hasan, Marcos Macedo, Yuan Tian, Bram Adams, Steven Ding

Abstract: The pull-based development is widely adopted in modern open-source software (OSS) projects, where developers propose changes to the codebase by submitting a pull request (PR). However, due to many reasons, PRs in OSS projects frequently experience delays across their lifespan, including prolonged waiting times for the first response. Such delays may significantly impact the efficiency and producti… ▽ More The pull-based development is widely adopted in modern open-source software (OSS) projects, where developers propose changes to the codebase by submitting a pull request (PR). However, due to many reasons, PRs in OSS projects frequently experience delays across their lifespan, including prolonged waiting times for the first response. Such delays may significantly impact the efficiency and productivity of the development process, as well as the retention of new contributors as long-term contributors. In this paper, we conduct an exploratory study on the time-to-first-response for PRs by analyzing 111,094 closed PRs from ten popular OSS projects on GitHub. We find that bots frequently generate the first response in a PR, and significant differences exist in the timing of bot-generated versus human-generated first responses. We then perform an empirical study to examine the characteristics of bot- and human-generated first responses, including their relationship with the PR's lifetime. Our results suggest that the presence of bots is an important factor contributing to the time-to-first-response in the pull-based development paradigm, and hence should be separately analyzed from human responses. We also report the characteristics of PRs that are more likely to experience long waiting for the first human-generated response. Our findings have practical implications for newcomers to understand the factors contributing to delays in their PRs. △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: 11 pages

arXiv:2204.00155 [pdf, other]

doi 10.1145/3524842.3527957

How heated is it? Understanding GitHub locked issues

Authors: Isabella Ferreira, Bram Adams, **ghui Cheng

Abstract: Although issues of open source software are created to discuss and solve technical problems, conversations can become heated, with discussants getting angry and/or agitated for a variety of reasons, such as poor suggestions or violation of community conventions. To prevent and mitigate discussions from getting heated, tools like GitHub have introduced the ability to lock issue discussions that vio… ▽ More Although issues of open source software are created to discuss and solve technical problems, conversations can become heated, with discussants getting angry and/or agitated for a variety of reasons, such as poor suggestions or violation of community conventions. To prevent and mitigate discussions from getting heated, tools like GitHub have introduced the ability to lock issue discussions that violate the code of conduct or other community guidelines. Despite some early research on locked issues, there is a lack of understanding of how communities use this feature and of potential threats to validity for researchers relying on a dataset of locked issues as an oracle for heated discussions. To address this gap, we (i) quantitatively analyzed 79 GitHub projects that have at least one issue locked as too heated, and (ii) qualitatively analyzed all issues locked as too heated of the 79 projects, a total of 205 issues comprising 5,511 comments. We found that projects have different behaviors when locking issues: while 54 locked less than 10% of their closed issues, 14 projects locked more than 90% of their closed issues. Additionally, locked issues tend to have a similar number of comments, participants, and emoji reactions to non-locked issues. For the 205 issues locked as too heated, we found that one-third do not contain any uncivil discourse, and only 8.82% of the analyzed comments are actually uncivil. Finally, we found that the locking justifications provided by maintainers do not always match the label used to lock the issue. Based on our results, we identified three pitfalls to avoid when using the GitHub locked issues data and we provide recommendations for researchers and practitioners. △ Less

Submitted 31 March, 2022; originally announced April 2022.

Journal ref: In 19th International Conference on Mining Software Repositories (MSR'22), May 23-24, 2022, Pittsburgh, PA, USA

arXiv:2203.11365 [pdf, other]

Towards a Change Taxonomy for Machine Learning Systems

Authors: Aaditya Bhatia, Ellis E. Eghan, Manel Grichi, William G. Cavanagh, Zhen Ming, Jiang, Bram Adams

Abstract: Machine Learning (ML) research publications commonly provide open-source implementations on GitHub, allowing their audience to replicate, validate, or even extend machine learning algorithms, data sets, and metadata. However, thus far little is known about the degree of collaboration activity happening on such ML research repositories, in particular regarding (1) the degree to which such reposit… ▽ More Machine Learning (ML) research publications commonly provide open-source implementations on GitHub, allowing their audience to replicate, validate, or even extend machine learning algorithms, data sets, and metadata. However, thus far little is known about the degree of collaboration activity happening on such ML research repositories, in particular regarding (1) the degree to which such repositories receive contributions from forks, (2) the nature of such contributions (i.e., the types of changes), and (3) the nature of changes that are not contributed back to forks, which might represent missed opportunities. In this paper, we empirically study contributions to 1,346 ML research repositories and their 67,369 forks, both quantitatively and qualitatively (by building on Hindle et al.'s seminal taxonomy of code changes). We found that while ML research repositories are heavily forked, only 9% of the forks made modifications to the forked repository. 42% of the latter sent changes to the parent repositories, half of which (52%) were accepted by the parent repositories. Our qualitative analysis on 539 contributed and 378 local (fork-only) changes, extends Hindle et al.'s taxonomy with one new top-level change category related to ML (Data), and 15 new sub-categories, including nine ML-specific ones (input data, output data, program data, sharing, change evaluation, parameter tuning, performance, pre-processing, model training). While the changes that are not contributed back by the forks mostly concern domain-specific customizations and local experimentation (e.g., parameter tuning), the origin ML repositories do miss out on a non-negligible 15.4% of Documentation changes, 13.6% of Feature changes and 11.4% of Bug fix changes. The findings in this paper will be useful for practitioners, researchers, toolsmiths, and educators. △ Less

Submitted 12 December, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

arXiv:2203.02708 [pdf, other]

High-resolution Coastline Extraction in SAR Images via MISP-GGD Superpixel Segmentation

Authors: Odysseas Pappas, Nantheera Anantrasirichai, Byron Adams, Alin Achim

Abstract: High accuracy coastline/shoreline extraction from SAR imagery is a crucial step in a number of maritime and coastal monitoring applications. We present a method based on image segmentation using the Generalised Gamma Mixture Model superpixel algorithm (MISP-GGD). MISP-GGD produces superpixels adhering with great accuracy to object edges in the image, such as the coastline. Unsupervised clustering… ▽ More High accuracy coastline/shoreline extraction from SAR imagery is a crucial step in a number of maritime and coastal monitoring applications. We present a method based on image segmentation using the Generalised Gamma Mixture Model superpixel algorithm (MISP-GGD). MISP-GGD produces superpixels adhering with great accuracy to object edges in the image, such as the coastline. Unsupervised clustering of the generated superpixels according to textural and radiometric features allows for generation of a land/water mask from which a highly accurate coastline can be extracted. We present results of our proposed method on a number of SAR images of varying characteristics. △ Less

Submitted 5 March, 2022; originally announced March 2022.

Comments: To appear in proceedings CIE RADAR 2021

arXiv:2202.08960 [pdf, other]

Toward a traceable, explainable, and fairJD/Resume recommendation system

Authors: Amine Barrak, Bram Adams, Amal Zouaq

Abstract: In the last few decades, companies are interested to adopt an online automated recruitment process in an international recruitment environment. The problem is that the recruitment of employees through the manual procedure is a time and money consuming process. As a result, processing a significant number of applications through conventional methods can lead to the recruitment of clumsy individuals… ▽ More In the last few decades, companies are interested to adopt an online automated recruitment process in an international recruitment environment. The problem is that the recruitment of employees through the manual procedure is a time and money consuming process. As a result, processing a significant number of applications through conventional methods can lead to the recruitment of clumsy individuals. Different JD/Resume matching model architectures have been proposed and reveal a high accuracy level in selecting relevant candidatesfor the required job positions. However, the development of an automatic recruitment system is still one of the main challenges. The reason is that the development of a fully automated recruitment system is a difficult task and poses different challenges. For example, providing a detailed matching explanation for the targeted stakeholders is needed to ensure a transparent recommendation. There are several knowledge bases that represent skills and competencies (e.g, ESCO, O*NET) that are used to identify the candidate and the required job skills for a matching purpose. Besides, modernpre-trained language models are fine-tuned for this context such as identifying lines where a specific feature was introduced. Typically, pre-trained language models use transfer-based machine learning models to be fine-tuned for a specific field. In this proposal, our aim is to explore how modern language models (based on transformers) can be combined with knowledge bases and ontologies to enhance the JD/Resume matching process. Our system aims at using knowledge bases and features to support the explainability of the JD/Resume matching. Finally, given that multiple software components, datasets, ontology, andmachine learning models will be explored, we aim at proposing a fair, ex-plainable, and traceable architecture for a Resume/JD matching purpose. △ Less

Submitted 2 February, 2022; originally announced February 2022.

arXiv:2109.07689 [pdf, other]

Quoka Atlas of Scholarly Knowledge Production: An Interactive Sensemaking Tool for Exploring the Outputs of Research Institutions

Authors: Benjamin Adams, Richard Hosking

Abstract: The vast amount of research produced at institutions world-wide is extremely diverse, and coarse-grained quantitative measures of impact often obscure the individual contributions of these institutions to specific research fields and topics. We show that by applying an information retrieval model to index research articles which are faceted by institution and time, we can develop tools to rank ins… ▽ More The vast amount of research produced at institutions world-wide is extremely diverse, and coarse-grained quantitative measures of impact often obscure the individual contributions of these institutions to specific research fields and topics. We show that by applying an information retrieval model to index research articles which are faceted by institution and time, we can develop tools to rank institutions given a keyword query. We present an interactive atlas, Quoka, designed to enable a user to explore these rankings contextually by geography and over time. Through a set of use cases we demonstrate that the atlas can be used to perform sensemaking tasks to learn and collect information about the relationships between institutions and scholarly knowledge production. △ Less

Submitted 15 September, 2021; originally announced September 2021.

Comments: 10 pages, 5 figures

ACM Class: H.3.3; H.3.7

arXiv:2108.09905 [pdf, other]

doi 10.1145/3479497

The "Shut the f**k up" Phenomenon: Characterizing Incivility in Open Source Code Review Discussions

Authors: Isabella Ferreira, **ghui Cheng, Bram Adams

Abstract: Code review is an important quality assurance activity for software development. Code review discussions among developers and maintainers can be heated and sometimes involve personal attacks and unnecessary disrespectful comments, demonstrating, therefore, incivility. Although incivility in public discussions has received increasing attention from researchers in different domains, the knowledge ab… ▽ More Code review is an important quality assurance activity for software development. Code review discussions among developers and maintainers can be heated and sometimes involve personal attacks and unnecessary disrespectful comments, demonstrating, therefore, incivility. Although incivility in public discussions has received increasing attention from researchers in different domains, the knowledge about the characteristics, causes, and consequences of uncivil communication is still very limited in the context of software development, and more specifically, code review. To address this gap in the literature, we leverage the mature social construct of incivility as a lens to understand confrontational conflicts in open source code review discussions. For that, we conducted a qualitative analysis on 1,545 emails from the Linux Kernel Mailing List (LKML) that were associated with rejected changes. We found that more than half 66.66% of the non-technical emails included uncivil features. Particularly, frustration, name calling, and impatience are the most frequent features in uncivil emails. We also found that there are civil alternatives to address arguments, while uncivil comments can potentially be made by any people when discussing any topic. Finally, we identified various causes and consequences of such uncivil communication. Our work serves as the first study about the phenomenon of in(civility) in open source software development, paving the road for a new field of research about collaboration and communication in the context of software engineering activities. △ Less

Submitted 22 August, 2021; originally announced August 2021.

arXiv:2107.10168 [pdf, other]

doi 10.1109/TEM.2021.3122012

Towards Using Package Centrality Trend to Identify Packages in Decline

Authors: Suhaib Mujahid, Diego Elias Costa, Rabe Abdalkareem, Emad Shihab, Mohamed Aymen Saied, Bram Adams

Abstract: Due to their increasing complexity, today's software systems are frequently built by leveraging reusable code in the form of libraries and packages. Software ecosystems (e.g., npm) are the primary enablers of this code reuse, providing developers with a platform to share their own and use others' code. These ecosystems evolve rapidly: developers add new packages every day to solve new problems or… ▽ More Due to their increasing complexity, today's software systems are frequently built by leveraging reusable code in the form of libraries and packages. Software ecosystems (e.g., npm) are the primary enablers of this code reuse, providing developers with a platform to share their own and use others' code. These ecosystems evolve rapidly: developers add new packages every day to solve new problems or provide alternative solutions, causing obsolete packages to decline in their importance to the community. Developers should avoid depending on packages in decline, as these packages are reused less over time and may become less frequently maintained. However, current popularity metrics (e.g., Stars, and Downloads) are not fit to provide this information to developers because their semantics do not aptly capture shifts in the community interest. In this paper, we propose a scalable approach that uses the package's centrality in the ecosystem to identify packages in decline. We evaluate our approach with the npm ecosystem and show that the trends of centrality over time can correctly distinguish packages in decline with an ROC-AUC of 0.9. The approach can capture 87% of the packages in decline, on average 18 months before the trend is shown in currently used package popularity metrics. We implement this approach in a tool that can be used to augment the npms metrics and help developers avoid packages in decline when reusing packages from npm. △ Less

Submitted 19 October, 2021; v1 submitted 21 July, 2021; originally announced July 2021.

Comments: Accepted in the Special Issue on Collaboration and Innovation Dynamics in Software Ecosystems

ACM Class: D.2.0; D.2.13

Journal ref: IEEE Transactions on Engineering Management Journal (TEM), 2021

arXiv:2104.13713 [pdf, other]

doi 10.1007/s10664-021-09977-1

Individual Differences Limit Predicting Well-being and Productivity Using Software Repositories: A Longitudinal Industrial Study

Authors: Miikka Kuutila, Mika Mäntylä, Maëlick, Claes, Marko Elovainio, Bram Adams

Abstract: Reports of poor work well-being and fluctuating productivity in software engineering have been reported in both academic and popular sources. Understanding and predicting these issues through repository analysis might help manage software developers' well-being. Our objective is to link data from software repositories, that is commit activity, communication, expressed sentiments, and job events, w… ▽ More Reports of poor work well-being and fluctuating productivity in software engineering have been reported in both academic and popular sources. Understanding and predicting these issues through repository analysis might help manage software developers' well-being. Our objective is to link data from software repositories, that is commit activity, communication, expressed sentiments, and job events, with measures of well-being obtained with a daily experience sampling questionnaire. To achieve our objective, we studied a single software project team for eight months in the software industry. Additionally, we performed semi-structured interviews to explain our results. The acquired quantitative data are analyzed with generalized linear mixed-effects models with autocorrelation structure. We find that individual variance accounts for most of the $R^2$ values in models predicting developers' experienced well-being and productivity. In other words, using software repository variables to predict developers' well-being or productivity is challenging due to individual differences. Prediction models developed for each developer individually work better, with fixed effects $R^2$ value of up to 0.24. The semi-structured interviews give insights into the well-being of software developers and the benefits of chat interaction. Our study suggests that individualized prediction models are needed for well-being and productivity prediction in software development. △ Less

Submitted 28 April, 2021; originally announced April 2021.

Comments: Accepted to Empirical Software Engineering journal

arXiv:2104.01290 [pdf, other]

doi 10.18653/v1/P17

Measuring Linguistic Diversity During COVID-19

Authors: Jonathan Dunn, Tom Coupe, Benjamin Adams

Abstract: Computational measures of linguistic diversity help us understand the linguistic landscape using digital language data. The contribution of this paper is to calibrate measures of linguistic diversity using restrictions on international travel resulting from the COVID-19 pandemic. Previous work has mapped the distribution of languages using geo-referenced social media and web data. The goal, howeve… ▽ More Computational measures of linguistic diversity help us understand the linguistic landscape using digital language data. The contribution of this paper is to calibrate measures of linguistic diversity using restrictions on international travel resulting from the COVID-19 pandemic. Previous work has mapped the distribution of languages using geo-referenced social media and web data. The goal, however, has been to describe these corpora themselves rather than to make inferences about underlying populations. This paper shows that a difference-in-differences method based on the Herfindahl-Hirschman Index can identify the bias in digital corpora that is introduced by non-local populations. These methods tell us where significant changes have taken place and whether this leads to increased or decreased diversity. This is an important step in aligning digital corpora like social media with the real-world populations that have produced them. △ Less

Submitted 2 April, 2021; originally announced April 2021.

Journal ref: Proceedings of the 4th Workshop on NLP and Computational Social Science (2020)

arXiv:2103.10615 [pdf, other]

The Impacts of Sentiments and Tones in Community-Generated Issue Discussions

Authors: Arghavan Sanei, **ghui Cheng, Bram Adams

Abstract: The diverse community members who contribute to the discussions on issue tracking systems of open-source software projects often exhibit complex affective states such as sentiments and tones. These affective states can significantly influence the effectiveness of the issue discussions in elaborating the initial ideas into actionable tasks that the development teams need to address. In this paper,… ▽ More The diverse community members who contribute to the discussions on issue tracking systems of open-source software projects often exhibit complex affective states such as sentiments and tones. These affective states can significantly influence the effectiveness of the issue discussions in elaborating the initial ideas into actionable tasks that the development teams need to address. In this paper, we present an extended empirical study to investigate the impacts of sentiments and tones in community-generated issue discussions. We created and validated a large dataset of sentiments and tones in the issues posts and comments created by diverse community members in three popular open source projects. Our analysis results drew a complex picture of the relationships between, on the one hand, the sentiments and tones in the issue discussions, and on the other hand, various discussion and development-related measures such as the discussion length and the issue resolution time. We also found that when factors such as the issue poster roles and the issue types were controlled, sentiments and tones had varied associations with the measures. Insights gained from these findings can support open source community members in making and moderating effective issue discussions and guide the design of tools to better support community engagement. △ Less

Submitted 18 March, 2021; originally announced March 2021.

Comments: 10 pages, 3 figures, CHASE2021

arXiv:2012.01403 [pdf, other]

Empirical Study on the Software Engineering Practices in Open Source ML Package Repositories

Authors: Minke Xiu, Ellis E. Eghan, Zhen Ming, Jiang, Bram Adams

Abstract: Recent advances in Artificial Intelligence (AI), especially in Machine Learning (ML), have introduced various practical applications (e.g., virtual personal assistants and autonomous cars) that enhance the experience of everyday users. However, modern ML technologies like Deep Learning require considerable technical expertise and resources to develop, train and deploy such models, making effective… ▽ More Recent advances in Artificial Intelligence (AI), especially in Machine Learning (ML), have introduced various practical applications (e.g., virtual personal assistants and autonomous cars) that enhance the experience of everyday users. However, modern ML technologies like Deep Learning require considerable technical expertise and resources to develop, train and deploy such models, making effective reuse of the ML models a necessity. Such discovery and reuse by practitioners and researchers are being addressed by public ML package repositories, which bundle up pre-trained models into packages for publication. Since such repositories are a recent phenomenon, there is no empirical data on their current state and challenges. Hence, this paper conducts an exploratory study that analyzes the structure and contents of two popular ML package repositories, TFHub and PyTorch Hub, comparing their information elements (features and policies), package organization, package manager functionalities and usage contexts against popular software package repositories (npm, PyPI, and CRAN). Through these studies, we have identified unique SE practices and challenges for sharing ML packages. These findings and implications would be useful for data scientists, researchers and software developers who intend to use these shared ML packages. △ Less

Submitted 8 December, 2020; v1 submitted 2 December, 2020; originally announced December 2020.

arXiv:2010.09139 [pdf]

We Need to Rethink How We Describe and Organize Spatial Information: Instrumenting and Observing the Community of Users to Improve Data Description and Discovery

Authors: Benjamin Adams, Mark Gahegan

Abstract: In Spatial Data Infrastructure or Cyber Infrastructure, the description of geographic data semantics is intended to support data discovery, reuse and integration. In the vast majority of cases the producers of these data generate descriptions based on particular understandings of what uses the data are good for. This producer-oriented perspective means that the descriptions often do not help to an… ▽ More In Spatial Data Infrastructure or Cyber Infrastructure, the description of geographic data semantics is intended to support data discovery, reuse and integration. In the vast majority of cases the producers of these data generate descriptions based on particular understandings of what uses the data are good for. This producer-oriented perspective means that the descriptions often do not help to answer the question of whether a data set is of use for a consumer who might want to apply it in a different context. In this paper, we discuss the role geographic information observatories can play in providing an infrastructure for observing the context of data use by consumers. These observations of data pragmatics lead to operational statistical methods that will support better fitness-for-use assessment. Finally, we highlight some of the challenges to building these observatories, and briefly discuss strategies to address those challenges. △ Less

Submitted 18 October, 2020; originally announced October 2020.

Comments: 6 pages, 2 figures

Journal ref: GEOProcessing 2016 : The Eighth International Conference on Advanced Geographic Information Systems, Applications, and Services (2016), 131-136

arXiv:2009.09019 [pdf, other]

On the Threat of npm Vulnerable Dependencies in Node.js Applications

Authors: Mahmoud Alfadel, Diego Elias Costa, Mouafak Mokhallalati, Emad Shihab, Bram Adams

Abstract: Software vulnerabilities have a large negative impact on the software systems that we depend on daily. Reports on software vulnerabilities always paint a grim picture, with some reports showing that 83% of organizations depend on vulnerable software. However, our experience leads us to believe that, in the grand scheme of things, these software vulnerabilities may have less impact than what is rep… ▽ More Software vulnerabilities have a large negative impact on the software systems that we depend on daily. Reports on software vulnerabilities always paint a grim picture, with some reports showing that 83% of organizations depend on vulnerable software. However, our experience leads us to believe that, in the grand scheme of things, these software vulnerabilities may have less impact than what is reported. Therefore, we perform a study to better understand the threat of npm vulnerable packages used in Node.js applications. We define three threat levels for vulnerabilities in packages, based on their lifecycle, where a package vulnerability is assigned a low threat level if it was hidden or still unknown at the time it was used in the dependent application (t), medium threat level if the vulnerability was reported but not yet published at t, and high if it was publicly announced at t. Then, we perform an empirical study involving 6,673 real-world, active, and mature open source Node.js applications. Our findings show that although 67.93% of the examined applications depend on at least one vulnerable package, 94.91% of the vulnerable packages in those affected applications are classified as having low threat. Moreover, we find that in the case of vulnerable packages classified as having high threat, it is the application's lack of updating that makes them vulnerable, i.e., it is not the existence of the vulnerability that is the real problem. Furthermore, we verify our findings at different stages of the application's lifetime and find that our findings still hold. Our study argues that when it comes to software vulnerabilities, things may not be as bad as they seem and that considering vulnerability threat is key. △ Less

Submitted 18 September, 2020; originally announced September 2020.

arXiv:2005.08738 [pdf, other]

A country comparison of place-based activity response to COVID-19 policies

Authors: Grant McKenzie, Benjamin Adams

Abstract: The emergence of the novel Coronavirus Disease in late 2019 (COVID-19) and subsequent pandemic led to an immense disruption in the daily lives of almost everyone on the planet. Faced with the consequences of inaction, most national governments responded with policies that restricted the activities conducted by their inhabitants. As schools and businesses shuttered, the mobility of these people dec… ▽ More The emergence of the novel Coronavirus Disease in late 2019 (COVID-19) and subsequent pandemic led to an immense disruption in the daily lives of almost everyone on the planet. Faced with the consequences of inaction, most national governments responded with policies that restricted the activities conducted by their inhabitants. As schools and businesses shuttered, the mobility of these people decreased. This reduction in mobility, and related activities, was recorded through ubiquitous location-enabled personal mobile devices. Patterns emerged that varied by place-based activity. In this work the differences in these place-based activity patterns are investigated across nations, specifically focusing on the relationship between government enacted policies and changes in community activity patterns. We show that people's activity response to government action varies widely both across nations as well as regionally within them. Three assessment measures are devised and the results correlate with a number of global indices. We discuss these findings and the relationship between government action and residents' response. △ Less

Submitted 18 May, 2020; originally announced May 2020.

Comments: 21 pages, 7 figures, 3 tables

arXiv:2004.00809 [pdf, other]

doi 10.17608/k6.auckland.9869252.v2

Map** Languages and Demographics with Georeferenced Corpora

Authors: Jonathan Dunn, Ben Adams

Abstract: This paper evaluates large georeferenced corpora, taken from both web-crawled and social media sources, against ground-truth population and language-census datasets. The goal is to determine (i) which dataset best represents population demographics; (ii) in what parts of the world the datasets are most representative of actual populations; and (iii) how to weight the datasets to provide more accur… ▽ More This paper evaluates large georeferenced corpora, taken from both web-crawled and social media sources, against ground-truth population and language-census datasets. The goal is to determine (i) which dataset best represents population demographics; (ii) in what parts of the world the datasets are most representative of actual populations; and (iii) how to weight the datasets to provide more accurate representations of underlying populations. The paper finds that the two datasets represent very different populations and that they correlate with actual populations with values of r=0.60 (social media) and r=0.49 (web-crawled). Further, Twitter data makes better predictions about the inventory of languages used in each country. △ Less

Submitted 2 April, 2020; originally announced April 2020.

Comments: Proceedings of GeoComputation 19

arXiv:1910.08876 [pdf]

Release Practices for Mobile Apps--What do Users and Developers Think?

Authors: Maleknaz Nayebi, Bram Adams, Guenther Ruhe

Abstract: Large software organizations such as Facebook or Netflix, who otherwise make daily or even hourly releases of their web applications using continuous delivery, have had to invest heavily into a customized release strategy for their mobile apps, because the vetting process of app stores introduces lag and uncertainty into the release process. Amidst these large, resourceful organizations, it is unk… ▽ More Large software organizations such as Facebook or Netflix, who otherwise make daily or even hourly releases of their web applications using continuous delivery, have had to invest heavily into a customized release strategy for their mobile apps, because the vetting process of app stores introduces lag and uncertainty into the release process. Amidst these large, resourceful organizations, it is unknown how the average mobile app developer organizes her app's releases, even though an incorrect strategy might bring a premature app update to the market that drives away customers towards the heavy market competition. To understand the common release strategies used for mobile apps, the rationale behind them and their perceived impact on users, we performed two surveys with users and developers. We found that half of the developers have a clear strategy for their mobile app releases, since especially the more experienced developers believe that it affects user feedback. We also found that users are aware of new app updates, yet only half of the surveyed users enables automatic updating of apps. While the release date and frequency is not a decisive factor to install an app, users prefer to install apps that were updated more recently and less frequently. Our study suggests that an app's release strategy is a factor that affects the ongoing success of mobile apps. △ Less

Submitted 19 October, 2019; originally announced October 2019.

Journal ref: 2016 ieee 23rd international conference on software analysis, evolution, and reengineering (saner) (Vol. 1, pp. 552-562). IEEE

arXiv:1910.06493 [pdf, other]

doi 10.17608/k6.auckland.9846323.v1

Understanding population fluctuations through volunteered geographic information and novel indicators: The experience of Rakiura, Stewart Island, New Zealand

Authors: Mathew Darling, Benjamin Adams, Caroline Orchiston, Thomas Wilson, Brendon Bradley

Abstract: In an era of heterogeneous data, novel methods and volunteered geographic information provide opportunities to understand how people interact with a place. However, it is not enough to simply have such heterogeneous data, instead an understanding of its usability and reliability needs to be undertaken. Here, we draw upon the case study of Rakiura, Stewart Island where manifested passenger numbers… ▽ More In an era of heterogeneous data, novel methods and volunteered geographic information provide opportunities to understand how people interact with a place. However, it is not enough to simply have such heterogeneous data, instead an understanding of its usability and reliability needs to be undertaken. Here, we draw upon the case study of Rakiura, Stewart Island where manifested passenger numbers across the Foveaux Strait are known. We have built a population model to ground truth such novel indicators. In our preliminary study, we find that a number of indicators offer the opportunity to understand fluctuations in populations. Some indicators (such as wastewater volumes) can suggest relative changes in populations in a raw form. While other indicators (such as TripAdvisor reviews or Instagram posts) require further data enrichment to get insights into population fluctuations. This research forms part of a larger research project looking to test and apply such novel indicators to inform disaster risk assessments. △ Less

Submitted 14 October, 2019; originally announced October 2019.

Comments: 8 pages, GeoComputation 2019

arXiv:1910.06484 [pdf]

Spatial Data Science: Closing the human-spatial computing-environment loop

Authors: Benjamin Adams

Abstract: Over the last decade, the term spatial computing has grown to have two different, though not entirely unrelated, definitions. The first definition of spatial computing stems from industry, where it refers primarily to new kinds of augmented, virtual, mixed-reality, and natural user interface technologies. A second definition coming out of academia takes a broader perspective that includes active r… ▽ More Over the last decade, the term spatial computing has grown to have two different, though not entirely unrelated, definitions. The first definition of spatial computing stems from industry, where it refers primarily to new kinds of augmented, virtual, mixed-reality, and natural user interface technologies. A second definition coming out of academia takes a broader perspective that includes active research in geographic information science as well as the aforementioned novel UI technologies. Both senses reflect an ongoing shift toward increased interaction with computing interfaces and sensors embedded in the environment and how the use of these technologies influence how we behave and make sense of and even change the world we live in. Regardless of the definition, research in spatial computing is humming along nicely without the need to identify new research agendas or new labels for communities of researchers. However, as a field of research, it could be helpful to view spatial data science as the glue that coheres spatial computing with problem-solving and learning in the real world into a more holistic discipline. △ Less

Submitted 14 October, 2019; originally announced October 2019.

Comments: 2 pages, Spatial Data Science Symposium

arXiv:1909.10496 [pdf, other]

doi 10.1109/LRA.2020.3006793

Swarm Relays: Distributed Self-Healing Ground-and-Air Connectivity Chains

Authors: Vivek Shankar Varadharajan, David St-Onge, Bram Adams, Giovanni Beltrame

Abstract: The coordination of robot swarms - large decentralized teams of robots - generally relies on robust and efficient inter-robot communication. Maintaining communication between robots is particularly challenging in field deployments. Unstructured environments, limited computational resources, low bandwidth, and robot failures all contribute to the complexity of connectivity maintenance. In this pape… ▽ More The coordination of robot swarms - large decentralized teams of robots - generally relies on robust and efficient inter-robot communication. Maintaining communication between robots is particularly challenging in field deployments. Unstructured environments, limited computational resources, low bandwidth, and robot failures all contribute to the complexity of connectivity maintenance. In this paper, we propose a novel lightweight algorithm to navigate a group of robots in complex environments while maintaining connectivity by building a chain of robots. The algorithm is robust to single robot failures and can heal broken communication links. The algorithm works in 3D environments: when a region is unreachable by wheeled robots, the chain is extended with flying robots. We test the performance of the algorithm using up to 100 robots in a physics-based simulator with three mazes and different robot failure scenarios. We then validate the algorithm with physical platforms: 7 wheeled robots and 6 flying ones, in homogeneous and heterogeneous scenarios. △ Less

Submitted 30 June, 2020; v1 submitted 23 September, 2019; originally announced September 2019.

Comments: 9 pages, 8 figures, Accepted for publication in Robotics and Automation Letters (RAL)

arXiv:1907.00863 [pdf, other]

Understanding GCC Builtins to Develop Better Tools

Authors: Manuel Rigger, Stefan Marr, Bram Adams, Hanspeter Mössenböck

Abstract: C programs can use compiler builtins to provide functionality that the C language lacks. On Linux, GCC provides several thousands of builtins that are also supported by other mature compilers, such as Clang and ICC. Maintainers of other tools lack guidance on whether and which builtins should be implemented to support popular projects. To assist tool developers who want to support GCC builtins, we… ▽ More C programs can use compiler builtins to provide functionality that the C language lacks. On Linux, GCC provides several thousands of builtins that are also supported by other mature compilers, such as Clang and ICC. Maintainers of other tools lack guidance on whether and which builtins should be implemented to support popular projects. To assist tool developers who want to support GCC builtins, we analyzed builtin use in 4,913 C projects from GitHub. We found that 37% of these projects relied on at least one builtin. Supporting an increasing proportion of projects requires support of an exponentially increasing number of builtins; however, implementing only 10 builtins already covers over 30% of the projects. Since we found that many builtins in our corpus remained unused, the effort needed to support 90% of the projects is moderate, requiring about 110 builtins to be implemented. For each project, we analyzed the evolution of builtin use over time and found that the majority of projects mostly added builtins. This suggests that builtins are not a legacy feature and must be supported in future tools. Systematic testing of builtin support in existing tools revealed that many lacked support for builtins either partially or completely; we also discovered incorrect implementations in various tools, including the formally verified CompCert compiler. △ Less

Submitted 1 July, 2019; originally announced July 2019.

Comments: Accepted at ESEC/FSE 2019 (see https://esec-fse19.ut.ee/program/research-papers/)

arXiv:1906.01498 [pdf, other]

Multimodal Ensemble Approach to Incorporate Various Types of Clinical Notes for Predicting Readmission

Authors: Bonggun Shin, Julien Hogan, Andrew B. Adams, Raymond J. Lynch, Rachel E. Patzer, **ho D. Choi

Abstract: Electronic Health Records (EHRs) have been heavily used to predict various downstream clinical tasks such as readmission or mortality. One of the modalities in EHRs, clinical notes, has not been fully explored for these tasks due to its unstructured and inexplicable nature. Although recent advances in deep learning (DL) enables models to extract interpretable features from unstructured data, they… ▽ More Electronic Health Records (EHRs) have been heavily used to predict various downstream clinical tasks such as readmission or mortality. One of the modalities in EHRs, clinical notes, has not been fully explored for these tasks due to its unstructured and inexplicable nature. Although recent advances in deep learning (DL) enables models to extract interpretable features from unstructured data, they often require a large amount of training data. However, many tasks in medical domains inherently consist of small sample data with lengthy documents; for a kidney transplant as an example, data from only a few thousand of patients are available and each patient's document consists of a couple of millions of words in major hospitals. Thus, complex DL methods cannot be applied to these kinds of domains. In this paper, we present a comprehensive ensemble model using vector space modeling and topic modeling. Our proposed model is evaluated on the readmission task of kidney transplant patients and improves 0.0211 in terms of c-statistics from the previous state-of-the-art approach using structured data, while typical DL methods fail to beat this approach. The proposed architecture provides the interpretable score for each feature from both modalities, structured and unstructured data, which is shown to be meaningful through a physician's evaluation. △ Less

Submitted 31 May, 2019; originally announced June 2019.

Comments: 4 pages, IEEE BHI 2019

Journal ref: Proceedings of the IEEE-EMBS International Conference on Biomedical and Health Informatics, 2019 (BHI'19)

arXiv:1905.10677 [pdf, ps, other]

An Exploratory Study on Machine Learning Model Stores

Authors: Minke Xiu, Zhen Ming, Jiang, Bram Adams

Abstract: Recent advances in Artificial Intelligence, especially in Machine Learning (ML), have brought applications previously considered as science fiction (e.g., virtual personal assistants and autonomous cars) into the reach of millions of everyday users. Since modern ML technologies like deep learning require considerable technical expertise and resource to build custom models, reusing existing models… ▽ More Recent advances in Artificial Intelligence, especially in Machine Learning (ML), have brought applications previously considered as science fiction (e.g., virtual personal assistants and autonomous cars) into the reach of millions of everyday users. Since modern ML technologies like deep learning require considerable technical expertise and resource to build custom models, reusing existing models trained by experts has become essential. This is why in the past year model stores have been introduced, which, similar to mobile app stores, offer organizations and developers access to pre-trained models and/or their code to train, evaluate, and predict samples. This paper conducts an exploratory study on three popular model stores (AWS marketplace, Wolfram neural net repository, and ModelDepot) that compares the information elements (features and policies) provided by model stores to those used by the two popular mobile app stores (Google Play and Apple's App Store). We have found that the model information elements vary among the different model stores, with 65% elements shared by all three studied stores. Model stores share five information elements with mobile app stores, while eight elements are unique to model stores and four elements unique to app stores. Only few models were available on multiple model stores. Our findings allow to better understand the differences between ML models and "regular" source code components or applications, and provide inspiration to identify software engineering practices (e.g., in requirements and delivery) specific to ML applications. △ Less

Submitted 25 February, 2020; v1 submitted 25 May, 2019; originally announced May 2019.

arXiv:1905.04771 [pdf, other]

Failure-Tolerant Connectivity Maintenance for Robot Swarms

Authors: Vivek Shankar Varadharajan, Bram Adams, Giovanni Beltrame

Abstract: Connectivity maintenance plays a key role in achieving a desired global behavior among a swarm of robots. However, connectivity maintenance in realistic environments is hampered by lack of computation resources, low communication bandwidth, robot failures, and unstable links. In this paper, we propose a novel decentralized connectivity-preserving algorithm that can be deployed on top of other beha… ▽ More Connectivity maintenance plays a key role in achieving a desired global behavior among a swarm of robots. However, connectivity maintenance in realistic environments is hampered by lack of computation resources, low communication bandwidth, robot failures, and unstable links. In this paper, we propose a novel decentralized connectivity-preserving algorithm that can be deployed on top of other behaviors to enforce connectivity constraints. The algorithm takes a set of targets to be reached while kee** a minimum number of redundant links between robots, with the goal of guaranteeing bandwidth and reliability. Robots then incrementally build and maintain a communication backbone with the specified number of links. We empirically study the performance of the algorithm, analyzing its time to convergence, as well as robustness to faults injected into the backbone robots. Our results statistically demonstrate the algorithm's ability to preserve the desired connectivity constraints and to reach the targets with up to 70 percent of individual robot failures in the communication backbone. △ Less

Submitted 12 May, 2019; originally announced May 2019.

Comments: 20 pages, 7 figures, Presented at ARMS Workshop at AAMAS

arXiv:1812.08836 [pdf, other]

doi 10.4230/DagRep.8.3.94

Automatic Quality Assurance and Release (Report from Dagstuhl Seminar 18122)

Authors: Bram Adams, Benoit Baudry, Sigrid Eldh, Andy Zaidman, Gerald Schermann

Abstract: This report documents the program and the outcomes of Dagstuhl Seminar 18122 "Automatic Quality Assurance and Release". The main goal of this seminar was to bridge the knowledge divide on how researchers and industry professionals reason about and implement DevOps for automatic quality assurance. Through the seminar, we have built up a common understanding of DevOps tools and practices, but we hav… ▽ More This report documents the program and the outcomes of Dagstuhl Seminar 18122 "Automatic Quality Assurance and Release". The main goal of this seminar was to bridge the knowledge divide on how researchers and industry professionals reason about and implement DevOps for automatic quality assurance. Through the seminar, we have built up a common understanding of DevOps tools and practices, but we have also identified major academic and educational challenges for this field of research. △ Less

Submitted 20 December, 2018; originally announced December 2018.

MSC Class: 18122

arXiv:1808.09568 [pdf, other]

doi 10.1007/s11263-019-01215-y

ARBEE: Towards Automated Recognition of Bodily Expression of Emotion In the Wild

Authors: Yu Luo, Jianbo Ye, Reginald B. Adams, Jr., Jia Li, Michelle G. Newman, James Z. Wang

Abstract: Humans are arguably innately prepared to comprehend others' emotional expressions from subtle body movements. If robots or computers can be empowered with this capability, a number of robotic applications become possible. Automatically recognizing human bodily expression in unconstrained situations, however, is daunting given the incomplete understanding of the relationship between emotional expre… ▽ More Humans are arguably innately prepared to comprehend others' emotional expressions from subtle body movements. If robots or computers can be empowered with this capability, a number of robotic applications become possible. Automatically recognizing human bodily expression in unconstrained situations, however, is daunting given the incomplete understanding of the relationship between emotional expressions and body movements. The current research, as a multidisciplinary effort among computer and information sciences, psychology, and statistics, proposes a scalable and reliable crowdsourcing approach for collecting in-the-wild perceived emotion data for computers to learn to recognize body languages of humans. To accomplish this task, a large and growing annotated dataset with 9,876 video clips of body movements and 13,239 human characters, named BoLD (Body Language Dataset), has been created. Comprehensive statistical analysis of the dataset revealed many interesting insights. A system to model the emotional expressions based on bodily movements, named ARBEE (Automated Recognition of Bodily Expression of Emotion), has also been developed and evaluated. Our analysis shows the effectiveness of Laban Movement Analysis (LMA) features in characterizing arousal, and our experiments using LMA features further demonstrate computability of bodily expression. We report and compare results of several other baseline methods which were developed for action recognition based on two different modalities, body skeleton, and raw image. The dataset and findings presented in this work will likely serve as a launchpad for future discoveries in body language understanding that will enable future robots to interact and collaborate more effectively with humans. △ Less

Submitted 9 July, 2019; v1 submitted 28 August, 2018; originally announced August 2018.

arXiv:1808.05409 [pdf, other]

doi 10.1145/3239235.3239245

Using Experience Sampling to link Software Repositories with Emotions and Work Well-Being

Authors: Miikka Kuutila, Mika Mäntylä, Maëlick Claes, Marko Elovainio, Bram Adams

Abstract: Background: The experience sampling method studies everyday experiences of humans in natural environments. In psychology it has been used to study the relationships between work well-being and productivity. To our best knowledge, daily experience sampling has not been previously used in software engineering. Aims: Our aim is to identify links between software developers self-reported affective sta… ▽ More Background: The experience sampling method studies everyday experiences of humans in natural environments. In psychology it has been used to study the relationships between work well-being and productivity. To our best knowledge, daily experience sampling has not been previously used in software engineering. Aims: Our aim is to identify links between software developers self-reported affective states and work well-being and measures obtained from software repositories. Method: We perform an experience sampling study in a software company for a period of eight months, we use logistic regression to link the well-being measures with development activities, i.e. number of commits and chat messages. Results: We find several significant relationships between questionnaire variables and software repository variables. To our surprise relationship between hurry and number of commits is negative, meaning more perceived hurry is linked with a smaller number of commits. We also find a negative relationship between social interaction and hindered work well-being. Conclusions: The negative link between commits and hurry is counter-intuitive and goes against previous lab-experiments in software engineering that show increased efficiency under time pressure. Overall, our work is an initial step in using experience sampling in software engineering and validating theories on work well-being from other fields in the domain of software engineering. △ Less

Submitted 4 September, 2018; v1 submitted 16 August, 2018; originally announced August 2018.

Comments: International Symposium on Empirical Software Engineering and Measurement (ESEM), 10 pages

arXiv:1807.00518 [pdf, other]

doi 10.1109/MS.2017.46

App Store 2.0: From Crowd Information to Actionable Feedback in Mobile Ecosystems

Authors: María Gómez, Bram Adams, Walid Maalej, Martin Monperrus, Romain Rouvoy

Abstract: Given the increasing competition in mobile app ecosystems, improving the experience of users has become a major goal for app vendors. This article introduces a visionary app store, called APP STORE 2.0, which exploits crowdsourced information about apps, devices and users to increase the overall quality of the delivered mobile apps. We sketch a blueprint architecture of the envisioned app stores a… ▽ More Given the increasing competition in mobile app ecosystems, improving the experience of users has become a major goal for app vendors. This article introduces a visionary app store, called APP STORE 2.0, which exploits crowdsourced information about apps, devices and users to increase the overall quality of the delivered mobile apps. We sketch a blueprint architecture of the envisioned app stores and discuss the different kinds of actionable feedbacks that app stores can generate using crowdsourced information. △ Less

Submitted 2 July, 2018; originally announced July 2018.

Journal ref: IEEE Software, Institute of Electrical and Electronics Engineers, 2017, 34, pp.81-89

arXiv:1804.08491 [pdf]

It is Free and Always Will Be - Trading Personal Information and Privacy for the Convenience of Online Services

Authors: Brandon Adams, Aaron Clark, Josh Craven

Abstract: Internet users today are constantly giving away their personal information and privacy through social media, tracking cookies, 'free' email, and single sign-on authentication in order to access convenient online services. Unfortunately, the elected officials who are supposed to be regulating these technologies often know less about informed consent and data ownership than the users themselves. Thi… ▽ More Internet users today are constantly giving away their personal information and privacy through social media, tracking cookies, 'free' email, and single sign-on authentication in order to access convenient online services. Unfortunately, the elected officials who are supposed to be regulating these technologies often know less about informed consent and data ownership than the users themselves. This is why without changes, internet users may continue to be exploited by companies offering free and convenient online services. △ Less

Submitted 23 April, 2018; originally announced April 2018.

Comments: 12 Pages, 3 Pages of Citations

arXiv:1802.05084 [pdf, other]

doi 10.1145/3180155.3180193

Do Programmers Work at Night or During the Weekend?

Authors: Maëlick Claes, Mika Mäntylä, Miikka Kuutila, Bram Adams

Abstract: Abnormal working hours can reduce work health, general well-being, and productivity, independent from a profession. To inform future approaches for automatic stress and overload detection, this paper establishes empirically collected measures of the work patterns of software engineers. To this aim, we perform the first large-scale study of software engineers' working hours by investigating the tim… ▽ More Abnormal working hours can reduce work health, general well-being, and productivity, independent from a profession. To inform future approaches for automatic stress and overload detection, this paper establishes empirically collected measures of the work patterns of software engineers. To this aim, we perform the first large-scale study of software engineers' working hours by investigating the time stamps of commit activities of 86 large open source software projects, both containing hired and volunteer developers. We find that two thirds of software engineers mainly follow typical office hours, empirically established to be from 10h to 18h, and do not usually work during nights and weekends. Large variations between projects and individuals exist. Surprisingly, we found no support that project maturation would decrease abnormal working hours. In the Firefox case study, we found that hired developers work more during office hours while seniority, either in terms of number of commits or job status, did not impact working hours. We conclude that the use of working hours or timestamps of work products for stress detection requires establishing baselines at the level of individuals. △ Less

Submitted 14 February, 2018; originally announced February 2018.

Journal ref: 40th International Conference on Software Engineering, 2018

arXiv:1711.04532 [pdf, ps, other]

Towards an interdisciplinary, socio-technical analysis of software ecosystem health

Authors: Tom Mens, Bram Adams, Josianne Marsan

Abstract: This extended abstract presents the research goals and preliminary research results of the interdisciplinary research project SECOHealth, an ongoing collaboration between research teams of Polytechnique Montreal (Canada), the University of Mons (Belgium) and Laval University (Canada). SECOHealth aims to contribute to research and practice in software engineering by delivering a validated interdisc… ▽ More This extended abstract presents the research goals and preliminary research results of the interdisciplinary research project SECOHealth, an ongoing collaboration between research teams of Polytechnique Montreal (Canada), the University of Mons (Belgium) and Laval University (Canada). SECOHealth aims to contribute to research and practice in software engineering by delivering a validated interdisciplinary scientific methodology and a catalog of guidelines and recommendation tools for improving software ecosystem health. △ Less

Submitted 13 November, 2017; originally announced November 2017.

Comments: 3 pages; presented at BENEVOL 2017, the BElgian-NEtherlands software eVOLution symposium, December 2017, Antwerp, Belgium

arXiv:1704.03652 [pdf, other]

doi 10.1109/MSR.2017.3

Abnormal Working Hours: Effect of Rapid Releases and Implications to Work Content

Authors: Maëlick Claes, Mika Mäntylä, Miikka Kuutila, Bram Adams

Abstract: During the past years, overload at work leading to psychological diseases, such as burnouts, have drawn more public attention. This paper is a preliminary step toward an analysis of the work patterns and possible indicators of overload and time pressure on software developers with mining software repositories approach. We explore the working pattern of developers in the context of Mozilla Firefox,… ▽ More During the past years, overload at work leading to psychological diseases, such as burnouts, have drawn more public attention. This paper is a preliminary step toward an analysis of the work patterns and possible indicators of overload and time pressure on software developers with mining software repositories approach. We explore the working pattern of developers in the context of Mozilla Firefox, a large and long-lived open source project. To that end we investigate the impact of the move from traditional to rapid release cycle on work pattern. Moreover we compare Mozilla Firefox work pattern with another Mozilla product, Firefox OS, which has a different release cycle than Firefox. We find that both projects exhibit healthy working patterns, i.e. lower activity during the weekends and outside of office hours. Firefox experiences proportionally more activity on weekends than Firefox OS (Cohen's d = 0.94). We find that switching to rapid releases has reduced weekend work (Cohen's d = 1.43) and working during the night (Cohen's d = 0.45). This result holds even when we limit the analyzes on the hired resources, i.e. considering only individuals with Mozilla foundation email address, although, the effect sizes are smaller for weekends (Cohen's d = 0.64) and nights (Cohen's d = 0.23). Moreover, we use dissimilarity word clouds and find that work during the weekend is more technical while work during the week expresses more positive sentiment with words like "good" and "nice". Our results suggest that moving to rapid releases have positive impact on the work health and work-life-balance of software engineers. However, caution is needed as our results are based on a limited set of quantitative data from a single organization. △ Less

Submitted 12 April, 2017; originally announced April 2017.

Comments: MSR 2017 conference, short paper

Showing 1–50 of 54 results for author: Adams, B