Search | arXiv e-print repository

On AI-Inspired UI-Design

Authors: Jialiang Wei, Anne-Lise Courbis, Thomas Lambolais, Gérard Dray, Walid Maalej

Abstract: Graphical User Interface (or simply UI) is a primary mean of interaction between users and their device. In this paper, we discuss three major complementary approaches on how to use Artificial Intelligence (AI) to support app designers create better, more diverse, and creative UI of mobile apps. First, designers can prompt a Large Language Model (LLM) like GPT to directly generate and adjust one o… ▽ More Graphical User Interface (or simply UI) is a primary mean of interaction between users and their device. In this paper, we discuss three major complementary approaches on how to use Artificial Intelligence (AI) to support app designers create better, more diverse, and creative UI of mobile apps. First, designers can prompt a Large Language Model (LLM) like GPT to directly generate and adjust one or multiple UIs. Second, a Vision-Language Model (VLM) enables designers to effectively search a large screenshot dataset, e.g. from apps published in app stores. The third approach is to train a Diffusion Model (DM) specifically designed to generate app UIs as inspirational images. We discuss how AI should be used, in general, to inspire and assist creative app design rather than automating it. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2405.00145 [pdf, other]

GUing: A Mobile GUI Search Engine using a Vision-Language Model

Authors: Jialiang Wei, Anne-Lise Courbis, Thomas Lambolais, Binbin Xu, Pierre Louis Bernard, Gérard Dray, Walid Maalej

Abstract: App developers use the Graphical User Interface (GUI) of other apps as an important source of inspiration to design and improve their own apps. In recent years, research suggested various approaches to retrieve GUI designs that fit a certain text query from screenshot datasets acquired through automated GUI exploration. However, such text-to-GUI retrieval approaches only leverage the textual infor… ▽ More App developers use the Graphical User Interface (GUI) of other apps as an important source of inspiration to design and improve their own apps. In recent years, research suggested various approaches to retrieve GUI designs that fit a certain text query from screenshot datasets acquired through automated GUI exploration. However, such text-to-GUI retrieval approaches only leverage the textual information of the GUI elements in the screenshots, neglecting visual information such as icons or background images. In addition, the retrieved screenshots are not steered by app developers and often lack important app features, e.g. whose UI pages require user authentication. To overcome these limitations, this paper proposes GUing, a GUI search engine based on a vision-language model called UIClip, which we trained specifically for the app GUI domain. For this, we first collected app introduction images from Google Play, which usually display the most representative screenshots selected and often captioned (i.e. labeled) by app vendors. Then, we developed an automated pipeline to classify, crop, and extract the captions from these images. This finally results in a large dataset which we share with this paper: including 303k app screenshots, out of which 135k have captions. We used this dataset to train a novel vision-language model, which is, to the best of our knowledge, the first of its kind in GUI retrieval. We evaluated our approach on various datasets from related work and in manual experiment. The results demonstrate that our model outperforms previous approaches in text-to-GUI retrieval achieving a Recall@10 of up to 0.69 and a HIT@10 of 0.91. We also explored the performance of UIClip for other GUI tasks including GUI classification and Sketch-to-GUI retrieval with encouraging results. △ Less

Submitted 30 April, 2024; originally announced May 2024.

arXiv:2403.05716 [pdf, other]

Mining Issue Trackers: Concepts and Techniques

Authors: Lloyd Montgomery, Clara Lüders, Walid Maalej

Abstract: An issue tracker is a software tool used by organisations to interact with users and manage various aspects of the software development lifecycle. With the rise of agile methodologies, issue trackers have become popular in open and closed-source settings alike. Internal and external stakeholders report, manage, and discuss "issues", which represent different information such as requirements and ma… ▽ More An issue tracker is a software tool used by organisations to interact with users and manage various aspects of the software development lifecycle. With the rise of agile methodologies, issue trackers have become popular in open and closed-source settings alike. Internal and external stakeholders report, manage, and discuss "issues", which represent different information such as requirements and maintenance tasks. Issue trackers can quickly become complex ecosystems, with dozens of projects, hundreds of users, thousands of issues, and often millions of issue evolutions. Finding and understanding the relevant issues for the task at hand and kee** an overview becomes difficult with time. Moreover, managing issue workflows for diverse projects becomes more difficult as organisations grow, and more stakeholders get involved. To help address these difficulties, software and requirements engineering research have suggested automated techniques based on mining issue tracking data. Given the vast amount of textual data in issue trackers, many of these techniques leverage natural language processing. This chapter discusses four major use cases for algorithmically analysing issue data to assist stakeholders with the complexity and heterogeneity of information in issue trackers. The chapter is accompanied by a follow-along demonstration package with JupyterNotebooks. △ Less

Submitted 11 July, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

Comments: 21 pages

arXiv:2312.00582 [pdf, other]

Design Patterns for Machine Learning Based Systems with Human-in-the-Loop

Authors: Jakob Smedegaard Andersen, Walid Maalej

Abstract: The development and deployment of systems using supervised machine learning (ML) remain challenging: mainly due to the limited reliability of prediction models and the lack of knowledge on how to effectively integrate human intelligence into automated decision-making. Humans involvement in the ML process is a promising and powerful paradigm to overcome the limitations of pure automated predictions… ▽ More The development and deployment of systems using supervised machine learning (ML) remain challenging: mainly due to the limited reliability of prediction models and the lack of knowledge on how to effectively integrate human intelligence into automated decision-making. Humans involvement in the ML process is a promising and powerful paradigm to overcome the limitations of pure automated predictions and improve the applicability of ML in practice. We compile a catalog of design patterns to guide developers select and implement suitable human-in-the-loop (HiL) solutions. Our catalog takes into consideration key requirements as the cost of human involvement and model retraining. It includes four training patterns, four deployment patterns, and two orthogonal cooperation patterns. △ Less

Submitted 1 December, 2023; originally announced December 2023.

arXiv:2307.12036 [pdf, other]

Exploring the Relationship Between Personality Traits and User Feedback

Authors: Volodymyr Biryuk, Walid Maalej

Abstract: Previous research has studied the impact of developer personality in different software engineering scenarios, such as team dynamics and programming education. However, little is known about how user personality affect software engineering, particularly user-developer collaboration. Along this line, we present a preliminary study about the effect of personality traits on user feedback. 56 universi… ▽ More Previous research has studied the impact of developer personality in different software engineering scenarios, such as team dynamics and programming education. However, little is known about how user personality affect software engineering, particularly user-developer collaboration. Along this line, we present a preliminary study about the effect of personality traits on user feedback. 56 university students provided feedback on different software features of an e-learning tool used in the course. They also filled out a questionnaire for the Five Factor Model (FFM) personality test. We observed some isolated effects of neuroticism on user feedback: most notably a significant correlation between neuroticism and feedback elaborateness; and between neuroticism and the rating of certain features. The results suggest that sensitivity to frustration and lower stress tolerance may negatively impact the feedback of users. This and possibly other personality characteristics should be considered when leveraging feedback analytics for software requirements engineering. △ Less

Submitted 22 July, 2023; originally announced July 2023.

arXiv:2304.09308 [pdf, other]

From RSSE to BotSE: Potentials and Challenges Revisited after 15 Years

Authors: Walid Maalej

Abstract: Both recommender systems and bots should proactively and smartly answer the questions of software developers or other project stakeholders to assist them in performing their tasks more efficiently. This paper reflects on the achievements from the more mature area of Recommendation Systems in Software Engineering (RSSE) as well as the rising area of Bots in Software Engineering (BotSE). We discuss… ▽ More Both recommender systems and bots should proactively and smartly answer the questions of software developers or other project stakeholders to assist them in performing their tasks more efficiently. This paper reflects on the achievements from the more mature area of Recommendation Systems in Software Engineering (RSSE) as well as the rising area of Bots in Software Engineering (BotSE). We discuss the similarities and differences, briefly review current state of the art, and highlight three particular areas, in which the full potential is yet to be tapped: a more socio-technical context awareness, assisting knowledge sharing in addition to knowledge access, as well as covering repetitive or stimulative scenarios related to requirements and user-developer interaction. △ Less

Submitted 18 April, 2023; originally announced April 2023.

arXiv:2304.09301 [pdf, other]

Developers' Visuo-spatial Mental Model and Program Comprehension

Authors: Abir Bouraffa, Gian-Luca Fuhrmann, Walid Maalej

Abstract: Previous works from research and industry have proposed a spatial representation of code in a canvas, arguing that a navigational code space confers developers the freedom to organise elements according to their understanding. By allowing developers to translate logical relatedness into spatial proximity, this code representation could aid in code navigation and comprehension. However, the associa… ▽ More Previous works from research and industry have proposed a spatial representation of code in a canvas, arguing that a navigational code space confers developers the freedom to organise elements according to their understanding. By allowing developers to translate logical relatedness into spatial proximity, this code representation could aid in code navigation and comprehension. However, the association between developers' code comprehension and their visuo-spatial mental model of the code is not yet well understood. This mental model is affected on the one hand by the spatial code representation and on the other by the visuo-spatial working memory of developers. We address this knowledge gap by conducting an online experiment with 20 developers following a between-subject design. The control group used a conventional tab-based code visualization, while the experimental group used a code canvas to complete three code comprehension tasks. Furthermore, we measure the participants' visuo-spatial working memory using a Corsi Block test at the end of the tasks. Our results suggest that, overall, neither the spatial representation of code nor the visuo-spatial working memory of developers has a significant impact on comprehension performance. However, we identified significant differences in the time dedicated to different comprehension activities such as navigation, annotation, and UI interactions. △ Less

Submitted 18 April, 2023; originally announced April 2023.

Comments: To appear in 2023 International Conference on Software Engineering (ICSE 2023). Authors' version of the work

arXiv:2303.14253 [pdf]

Testability Refactoring in Pull Requests: Patterns and Trends

Authors: Pavel Reich, Walid Maalej

Abstract: To create unit tests, it may be necessary to refactor the production code, e.g. by widening access to specific methods or by decomposing classes into smaller units that are easier to test independently. We report on an extensive study to understand such composite refactoring procedures for the purpose of improving testability. We collected and studied 346,841 java pull requests from 621 GitHub pro… ▽ More To create unit tests, it may be necessary to refactor the production code, e.g. by widening access to specific methods or by decomposing classes into smaller units that are easier to test independently. We report on an extensive study to understand such composite refactoring procedures for the purpose of improving testability. We collected and studied 346,841 java pull requests from 621 GitHub projects. First, we compared the atomic refactorings in two populations: pull requests with changed test-pairs (i.e. with co-changes in production and test code and thus potentially including testability refactoring) and pull requests without test-pairs. We found significantly more atomic refactorings in test-pairs pull requests, such as Change Variable Type Operation or Change Parameter Type. Second, we manually analyzed the code changes of 200 pull requests, where developers explicitly mention the terms "testability" or "refactor + test". We identified ten composite refactoring procedures for the purpose of testability, which we call testability refactoring patterns. Third, we manually analyzed additional 524 test-pairs pull requests: both randomly selected and where we assumed to find testability refactorings, e.g. in pull requests about dependency or concurrency issues. About 25% of all analyzed pull requests actually included testability refactoring patterns. The most frequent were extract a method for override or for invocation, widen access to a method for invocation, and extract a class for invocation. We also report on frequent atomic refactorings which co-occur with the patterns and discuss the implications of our findings for research, practice, and education △ Less

Submitted 24 March, 2023; originally announced March 2023.

Comments: ICSE2023

arXiv:2302.10816 [pdf, other]

doi 10.1109/MC.2023.3243182

Tailoring Requirements Engineering for Responsible AI

Authors: Walid Maalej, Yen Dieu Pham, Larissa Chazette

Abstract: Requirements Engineering (RE) is the discipline for identifying, analyzing, as well as ensuring the implementation and delivery of user, technical, and societal requirements. Recently reported issues concerning the acceptance of Artificial Intelligence (AI) solutions after deployment, e.g. in the medical, automotive, or scientific domains, stress the importance of RE for designing and delivering R… ▽ More Requirements Engineering (RE) is the discipline for identifying, analyzing, as well as ensuring the implementation and delivery of user, technical, and societal requirements. Recently reported issues concerning the acceptance of Artificial Intelligence (AI) solutions after deployment, e.g. in the medical, automotive, or scientific domains, stress the importance of RE for designing and delivering Responsible AI systems. In this paper, we argue that RE should not only be carefully conducted but also tailored for Responsible AI. We outline related challenges for research and practice. △ Less

Submitted 21 February, 2023; originally announced February 2023.

Comments: To appear in IEEE Computer, Special Issue on Software Engineering for Responsible AI

arXiv:2208.01317 [pdf, other]

An Exploratory Study of Documentation Strategies for Product Features in Popular GitHub Projects

Authors: Tim Puhlfürß, Lloyd Montgomery, Walid Maalej

Abstract: [Background] In large open-source software projects, development knowledge is often fragmented across multiple artefacts and contributors such that individual stakeholders are generally unaware of the full breadth of the product features. However, users want to know what the software is capable of, while contributors need to know where to fix, update, and add features. [Objective] This work aims a… ▽ More [Background] In large open-source software projects, development knowledge is often fragmented across multiple artefacts and contributors such that individual stakeholders are generally unaware of the full breadth of the product features. However, users want to know what the software is capable of, while contributors need to know where to fix, update, and add features. [Objective] This work aims at understanding how feature knowledge is documented in GitHub projects and how it is linked (if at all) to the source code. [Method] We conducted an in-depth qualitative exploratory content analysis of 25 popular GitHub repositories that provided the documentation artefacts recommended by GitHub's Community Standards indicator. We first extracted strategies used to document software features in textual artefacts and then strategies used to link the feature documentation with source code. [Results] We observed feature documentation in all studied projects in artefacts such as READMEs, wikis, and website resource files. However, the features were often described in an unstructured way. Additionally, tracing techniques to connect feature documentation and source code were rarely used. [Conclusions] Our results suggest a lacking (or a low-prioritised) feature documentation in open-source projects, little use of normalised structures, and a rare explicit referencing to source code. As a result, product feature traceability is likely to be very limited, and maintainability to suffer over time. △ Less

Submitted 2 August, 2022; originally announced August 2022.

Comments: Accepted for the New Ideas and Emerging Results (NIER) track of the 38th IEEE International Conference on Software Maintenance and Evolution (ICSME)

arXiv:2206.07182 [pdf, other]

Automated Detection of Typed Links in Issue Trackers

Authors: Clara Marie Lüders, Tim Pietz, Walid Maalej

Abstract: Stakeholders in software projects use issue trackers like JIRA to capture and manage issues, including requirements and bugs. To ease issue navigation and structure project knowledge, stakeholders manually connect issues via links of certain types that reflect different dependencies, such as Epic-, Block-, Duplicate-, or Relate- links. Based on a large dataset of 15 JIRA repositories, we study how… ▽ More Stakeholders in software projects use issue trackers like JIRA to capture and manage issues, including requirements and bugs. To ease issue navigation and structure project knowledge, stakeholders manually connect issues via links of certain types that reflect different dependencies, such as Epic-, Block-, Duplicate-, or Relate- links. Based on a large dataset of 15 JIRA repositories, we study how well state-of-the-art machine learning models can automatically detect common link types. We found that a pure BERT model trained on titles and descriptions of linked issues significantly outperforms other optimized deep learning models, achieving an encouraging average macro F1-score of 0.64 for detecting 9 popular link types across all repositories (weighted F1-score of 0.73). For the specific Subtask- and Epic- links, the model achieved top F1-scores of 0.89 and 0.97, respectively. Our model does not simply learn the textual similarity of the issues. In general, shorter issue text seems to improve the prediction accuracy with a strong negative correlation of -0.70. We found that Relate-links often get confused with the other links, which suggests that they are likely used as default links in unclear cases. We also observed significant differences across the repositories, depending on how they are used and by whom. △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: Accepted at RE2022, eCF Paper Id: 1655146264348

arXiv:2204.12893 [pdf, other]

doi 10.1145/3524842.3528457

Beyond Duplicates: Towards Understanding and Predicting Link Types in Issue Tracking Systems

Authors: Clara Marie Lüders, Abir Bouraffa, Walid Maalej

Abstract: Software projects use Issue Tracking Systems (ITS) like JIRA to track issues and organize the workflows around them. Issues are often inter-connected via different links such as the default JIRA link types Duplicate, Relate, Block, or Subtask. While previous research has mostly focused on analyzing and predicting duplication links, this work aims at understanding the various other link types, thei… ▽ More Software projects use Issue Tracking Systems (ITS) like JIRA to track issues and organize the workflows around them. Issues are often inter-connected via different links such as the default JIRA link types Duplicate, Relate, Block, or Subtask. While previous research has mostly focused on analyzing and predicting duplication links, this work aims at understanding the various other link types, their prevalence, and characteristics towards a more reliable link type prediction. For this, we studied 607,208 links connecting 698,790 issues in 15 public JIRA repositories. Besides the default types, the custom types Depend, Incorporate, Split, and Cause were also common. We manually grouped all 75 link types used in the repositories into five general categories: General Relation, Duplication, Composition, Temporal / Causal, and Workflow. Comparing the structures of the corresponding graphs, we observed several trends. For instance, Duplication links tend to represent simpler issue graphs often with two components and Composition links present the highest amount of hierarchical tree structures (97.7%). Surprisingly, General Relation links have a significantly higher transitivity score than Duplication and Temporal / Causal links. Motivated by the differences between the link types and by their popularity, we evaluated the robustness of two state-of-the-art duplicate detection approaches from the literature on the JIRA dataset. We found that current deep-learning approaches confuse between Duplication and other links in almost all repositories. On average, the classification accuracy dropped by 6% for one approach and 12% for the other. Extending the training sets with other link types seems to partly solve this issue. We discuss our findings and their implications for research and practice. △ Less

Submitted 27 April, 2022; originally announced April 2022.

Comments: 19th International Conference on Mining Software Repositories (MSR '22), May 23--24, 2022, Pittsburgh, PA, USA acmDOI: 10.1145/3524842.3528457

arXiv:2204.01334 [pdf, other]

Efficient, Uncertainty-based Moderation of Neural Networks Text Classifiers

Authors: Jakob Smedegaard Andersen, Walid Maalej

Abstract: To maximize the accuracy and increase the overall acceptance of text classifiers, we propose a framework for the efficient, in-operation moderation of classifiers' output. Our framework focuses on use cases in which F1-scores of modern Neural Networks classifiers (ca.~90%) are still inapplicable in practice. We suggest a semi-automated approach that uses prediction uncertainties to pass unconfiden… ▽ More To maximize the accuracy and increase the overall acceptance of text classifiers, we propose a framework for the efficient, in-operation moderation of classifiers' output. Our framework focuses on use cases in which F1-scores of modern Neural Networks classifiers (ca.~90%) are still inapplicable in practice. We suggest a semi-automated approach that uses prediction uncertainties to pass unconfident, probably incorrect classifications to human moderators. To minimize the workload, we limit the human moderated data to the point where the accuracy gains saturate and further human effort does not lead to substantial improvements. A series of benchmarking experiments based on three different datasets and three state-of-the-art classifiers show that our framework can improve the classification F1-scores by 5.1 to 11.2% (up to approx.~98 to 99%), while reducing the moderation load up to 73.3% compared to a random moderation. △ Less

Submitted 4 April, 2022; originally announced April 2022.

arXiv:2201.08368 [pdf, other]

doi 10.1145/3524842.3528486

An Alternative Issue Tracking Dataset of Public Jira Repositories

Authors: Lloyd Montgomery, Clara Lüders, Walid Maalej

Abstract: Organisations use issue tracking systems (ITSs) to track and document their projects' work in units called issues. This style of documentation encourages evolutionary refinement, as each issue can be independently improved, commented on, linked to other issues, and progressed through the organisational workflow. Commonly studied ITSs so far include GitHub, GitLab, and Bugzilla, while Jira, one of… ▽ More Organisations use issue tracking systems (ITSs) to track and document their projects' work in units called issues. This style of documentation encourages evolutionary refinement, as each issue can be independently improved, commented on, linked to other issues, and progressed through the organisational workflow. Commonly studied ITSs so far include GitHub, GitLab, and Bugzilla, while Jira, one of the most popular ITS in practice with a wealth of additional information, has yet to receive similar attention. Unfortunately, diverse public Jira datasets are rare, likely due to the difficulty in finding and accessing these repositories. With this paper, we release a dataset of 16 public Jiras with 1822 projects, spanning 2.7 million issues with a combined total of 32 million changes, 9 million comments, and 1 million issue links. We believe this Jira dataset will lead to many fruitful research projects investigating issue evolution, issue linking, cross-project analysis, as well as cross-tool analysis when combined with existing well-studied ITS datasets. △ Less

Submitted 25 March, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

Comments: 5 pages

arXiv:2108.08543 [pdf, other]

Unsupervised Topic Discovery in User Comments

Authors: Christoph Stanik, Tim Pietz, Walid Maalej

Abstract: On social media platforms like Twitter, users regularly share their opinions and comments with software vendors and service providers. Popular software products might get thousands of user comments per day. Research has shown that such comments contain valuable information for stakeholders, such as feature ideas, problem reports, or support inquiries. However, it is hard to manually manage and gra… ▽ More On social media platforms like Twitter, users regularly share their opinions and comments with software vendors and service providers. Popular software products might get thousands of user comments per day. Research has shown that such comments contain valuable information for stakeholders, such as feature ideas, problem reports, or support inquiries. However, it is hard to manually manage and grasp a large amount of user comments, which can be redundant and of a different quality. Consequently, researchers suggested automated approaches to extract valuable comments, e.g., through problem report classifiers. However, these approaches do not aggregate semantically similar comments into specific aspects to provide insights like how often users reported a certain problem. We introduce an approach for automatically discovering topics composed of semantically similar user comments based on deep bidirectional natural language processing algorithms. Stakeholders can use our approach without the need to configure critical parameters like the number of clusters. We present our approach and report on a rigorous multiple-step empirical evaluation to assess how cohesive and meaningful the resulting clusters are. Each evaluation step was peer-coded and resulted in inter-coder agreements of up to 98%, giving us high confidence in the approach. We also report a thematic analysis on the topics discovered from tweets in the telecommunication domain. △ Less

Submitted 19 August, 2021; originally announced August 2021.

Comments: Accepted for 29th IEEE International Requirements Engineering Conference

arXiv:2108.05622 [pdf, other]

doi 10.1109/RE51729.2021.00034

Lessons Learned from Customizing and Applying ACTA to Design a Novel Device for Emergency Medical Care

Authors: Christoph Stanik, Tim Puhlfürß, Anne Mahler, Phillip Brenya Sasu, Wikhart Reip, Walid Maalej

Abstract: Preclinical patient care is both mentally and physically challenging and exhausting for emergency teams. The teams intensively use medical technology to help the patient on site. However, they must carry and handle multiple heavy medical devices such as a monitor for the patient's vital signs, a ventilator to support an unconscious patient, and a resuscitation device. In an industry project, we ai… ▽ More Preclinical patient care is both mentally and physically challenging and exhausting for emergency teams. The teams intensively use medical technology to help the patient on site. However, they must carry and handle multiple heavy medical devices such as a monitor for the patient's vital signs, a ventilator to support an unconscious patient, and a resuscitation device. In an industry project, we aim at develo** a combined device that lowers the emergency teams' mental and physical load caused by multiple screens, devices, and their high weight. The focus of this paper is to describe our ideation and requirements elicitation process regarding the user interface design of the combined device. For one year, we applied a fully digital customized version of the Applied Cognitive Task Analysis (ACTA) method to systematically elicit the requirements. Domain and requirements engineering experts created a detailed hierarchical task diagram of an extensive emergency scenario, conducted eleven interviews with subject matter experts (SMEs), and executed two design workshops, which led to 34 sketches and three mockups of the combined device's user interface. Cross-functional teams accompanied the entire process and brought together expertise in preclinical patient care, requirements engineering, and medical product development. We report on the lessons learned for each of the four consecutive stages of our customized ACTA process. △ Less

Submitted 4 August, 2022; v1 submitted 12 August, 2021; originally announced August 2021.

Comments: Accepted for publication at the 29th IEEE International Requirements Engineering Conference

arXiv:2102.07134 [pdf, other]

Automatically Matching Bug Reports With Related App Reviews

Authors: Marlo Häring, Christoph Stanik, Walid Maalej

Abstract: App stores allow users to give valuable feedback on apps, and developers to find this feedback and use it for the software evolution. However, finding user feedback that matches existing bug reports in issue trackers is challenging as users and developers often use a different language. In this work, we introduce DeepMatcher, an automatic approach using state-of-the-art deep learning methods to ma… ▽ More App stores allow users to give valuable feedback on apps, and developers to find this feedback and use it for the software evolution. However, finding user feedback that matches existing bug reports in issue trackers is challenging as users and developers often use a different language. In this work, we introduce DeepMatcher, an automatic approach using state-of-the-art deep learning methods to match problem reports in app reviews to bug reports in issue trackers. We evaluated DeepMatcher with four open-source apps quantitatively and qualitatively. On average, DeepMatcher achieved a hit ratio of 0.71 and a Mean Average Precision of 0.55. For 91 problem reports, DeepMatcher did not find any matching bug report. When manually analyzing these 91 problem reports and the issue trackers of the studied apps, we found that in 47 cases, users actually described a problem before developers discovered and documented it in the issue tracker. We discuss our findings and different use cases for DeepMatcher. △ Less

Submitted 14 February, 2021; originally announced February 2021.

Comments: Accepted for publication to the 43rd International Conference on Software Engineering (ICSE21)

arXiv:2010.14212 [pdf, other]

doi 10.1109/REW.2019.00008

Renovating Requirements Engineering: First Thoughts to Shape Requirements Engineering as a Profession

Authors: Yen Dieu Pham, Lloyd Montgomery, Walid Maalej

Abstract: Legacy software systems typically include vital data for organizations that use them and should thus to be regularly maintained. Ideally, organizations should rely on Requirements Engineers to understand and manage changes of stakeholder needs and system constraints. However, due to time and cost pressure, and with a heavy focus on implementation, organizations often choose to forgo Requirements E… ▽ More Legacy software systems typically include vital data for organizations that use them and should thus to be regularly maintained. Ideally, organizations should rely on Requirements Engineers to understand and manage changes of stakeholder needs and system constraints. However, due to time and cost pressure, and with a heavy focus on implementation, organizations often choose to forgo Requirements Engineers and rather focus on ad-hoc bug fixing and maintenance. This position paper discusses what Requirements Engineers could possibly learn from other similar roles to become crucial for the evolution of legacy systems. Particularly, we compare the roles of Requirements Engineers (according to IREB), Building Architects (according to the German regulations), and Product Owners (according to "The Scrum-Guide"). We discuss overlaps along four dimensions: liability, self-portrayal, core activities, and artifacts. Finally we draw insights from these related fields to foster the concept of a Requirements Engineer as a distinguished profession. △ Less

Submitted 26 October, 2020; originally announced October 2020.

Comments: 5 pages, 1 figure, 1 table, accepted at the 2019 IEEE 3rd International Workshop on Learning from other Disciplines for RE (D4RE) at ICSE

arXiv:1909.07699 [pdf, other]

OpenReq Issue Link Map: A Tool to Visualize Issue Links in Jira

Authors: Clara Marie Lüders, Mikko Raatikainen, Joaquim Motger, Walid Maalej

Abstract: Managing software projects gets more and more complicated with an increasing project and product size. To cope with this complexity, many organizations use issue tracking systems, where tasks, bugs, and requirements are stored as issues. Unfortunately, managing software projects might remain chaotic even when using issue trackers. Particularly for long lasting projects with a large number of issue… ▽ More Managing software projects gets more and more complicated with an increasing project and product size. To cope with this complexity, many organizations use issue tracking systems, where tasks, bugs, and requirements are stored as issues. Unfortunately, managing software projects might remain chaotic even when using issue trackers. Particularly for long lasting projects with a large number of issues and links between them, it is often hard to maintain an overview of the dependencies, especially when dozens of new issues get reported every day. We present a Jira plug-in that supports developers, project managers, and product owners in managing and overviewing issues and their dependencies. Our tool visualizes the issue links, helps to find missing or unknown links between issues, and detects inconsistencies. △ Less

Submitted 17 September, 2019; originally announced September 2019.

arXiv:1909.05740 [pdf, other]

Requirements Intelligence with OpenReq Analytics

Authors: Christoph Stanik, Walid Maalej

Abstract: With the rise of social media like Twitter and distribution platforms like app stores, users have various ways to express their opinions about software products. Popular software vendors get user feedback thousandfold per day. Research has shown that such feedback contains valuable information for software development teams. However, a manual analysis of user feedback is cumbersome and hard to man… ▽ More With the rise of social media like Twitter and distribution platforms like app stores, users have various ways to express their opinions about software products. Popular software vendors get user feedback thousandfold per day. Research has shown that such feedback contains valuable information for software development teams. However, a manual analysis of user feedback is cumbersome and hard to manage. We present OpenReq Analytics, a software requirements intelligence service, that collects, processes, analyzes, and visualizes user feedback. △ Less

Submitted 12 September, 2019; originally announced September 2019.

Comments: tool paper

arXiv:1909.05504 [pdf, other]

Classifying Multilingual User Feedback using Traditional Machine Learning and Deep Learning

Authors: Christoph Stanik, Marlo Haering, Walid Maalej

Abstract: With the rise of social media like Twitter and of software distribution platforms like app stores, users got various ways to express their opinion about software products. Popular software vendors get user feedback thousandfold per day. Research has shown that such feedback contains valuable information for software development teams such as problem reports or feature and support inquires. Since t… ▽ More With the rise of social media like Twitter and of software distribution platforms like app stores, users got various ways to express their opinion about software products. Popular software vendors get user feedback thousandfold per day. Research has shown that such feedback contains valuable information for software development teams such as problem reports or feature and support inquires. Since the manual analysis of user feedback is cumbersome and hard to manage many researchers and tool vendors suggested to use automated analyses based on traditional supervised machine learning approaches. In this work, we compare the results of traditional machine learning and deep learning in classifying user feedback in English and Italian into problem reports, inquiries, and irrelevant. Our results show that using traditional machine learning, we can still achieve comparable results to deep learning, although we collected thousands of labels. △ Less

Submitted 12 September, 2019; originally announced September 2019.

arXiv:1907.13395 [pdf, other]

Extracting and Analyzing Context Information in User-Support Conversations on Twitter

Authors: Daniel Martens, Walid Maalej

Abstract: While many apps include built-in options to report bugs or request features, users still provide an increasing amount of feedback via social media, like Twitter. Compared to traditional issue trackers, the reporting process in social media is unstructured and the feedback often lacks basic context information, such as the app version or the device concerned when experiencing the issue. To make thi… ▽ More While many apps include built-in options to report bugs or request features, users still provide an increasing amount of feedback via social media, like Twitter. Compared to traditional issue trackers, the reporting process in social media is unstructured and the feedback often lacks basic context information, such as the app version or the device concerned when experiencing the issue. To make this feedback actionable to developers, support teams engage in recurring, effortful conversations with app users to clarify missing context items. This paper introduces a simple approach that accurately extracts basic context information from unstructured, informal user feedback on mobile apps, including the platform, device, app version, and system version. Evaluated against a truthset of 3014 tweets from official Twitter support accounts of the 3 popular apps Netflix, Snapchat, and Spotify, our approach achieved precisions from 81% to 99% and recalls from 86% to 98% for the different context item types. Combined with a chatbot that automatically requests missing context items from reporting users, our approach aims at auto-populating issue trackers with structured bug reports. △ Less

Submitted 31 July, 2019; originally announced July 2019.

arXiv:1907.09807 [pdf, other]

doi 10.1145/3338906.3338943

On Using Machine Learning to Identify Knowledge in API Reference Documentation

Authors: Davide Fucci, Alireza Mollaalizadehbahnemiri, Walid Maalej

Abstract: Using API reference documentation like JavaDoc is an integral part of software development. Previous research introduced a grounded taxonomy that organizes API documentation knowledge in 12 types, including knowledge about the Functionality, Structure, and Quality of an API. We study how well modern text classification approaches can automatically identify documentation containing specific knowled… ▽ More Using API reference documentation like JavaDoc is an integral part of software development. Previous research introduced a grounded taxonomy that organizes API documentation knowledge in 12 types, including knowledge about the Functionality, Structure, and Quality of an API. We study how well modern text classification approaches can automatically identify documentation containing specific knowledge types. We compared conventional machine learning (k-NN and SVM) and deep learning approaches trained on manually annotated Java and .NET API documentation (n = 5,574). When classifying the knowledge types individually (i.e., multiple binary classifiers) the best AUPRC was up to 87%. The deep learning and SVM classifiers seem complementary. For four knowledge types (Concept, Control, Pattern, and Non-Information), SVM clearly outperforms deep learning which, on the other hand, is more accurate for identifying the remaining types. When considering multiple knowledge types at once (i.e., multi-label classification) deep learning outperforms naïve baselines and traditional machine learning achieving a MacroAUC up to 79%. We also compared classifiers using embeddings pre-trained on generic text corpora and StackOverflow but did not observe significant improvements. Finally, to assess the generalizability of the classifiers, we re-tested them on a different, unseen Python documentation dataset. Classifiers for Functionality, Concept, Purpose, Pattern, and Directive seem to generalize from Java and .NET to Python documentation. The accuracy related to the remaining types seems API-specific. We discuss our results and how they inform the development of tools for supporting developers sharing and accessing API knowledge. Published article: https://doi.org/10.1145/3338906.3338943 △ Less

Submitted 23 July, 2019; originally announced July 2019.

Journal ref: ESEC/FSE2019

arXiv:1906.06403 [pdf, other]

Release early, release often, and watch your users' emotions

Authors: Daniel Martens, Walid Maalej

Abstract: App stores are highly competitive markets, sometimes offering dozens of apps for a single use case. Unexpected app changes such as a feature removal might incite even loyal users to explore alternative apps. Sentiment analysis tools can help monitor users' emotions expressed, e.g., in app reviews or tweets. We found that these emotions include four recurring patterns corresponding to the app relea… ▽ More App stores are highly competitive markets, sometimes offering dozens of apps for a single use case. Unexpected app changes such as a feature removal might incite even loyal users to explore alternative apps. Sentiment analysis tools can help monitor users' emotions expressed, e.g., in app reviews or tweets. We found that these emotions include four recurring patterns corresponding to the app releases. Based on these patterns and online reports about popular apps, we derived five release lessons to assist app vendors maintain positive emotions and gain competitive advantages. △ Less

Submitted 14 June, 2019; originally announced June 2019.

arXiv:1904.12607 [pdf, other]

doi 10.1007/s10664-019-09706-9

Towards Understanding and Detecting Fake Reviews in App Stores

Authors: Daniel Martens, Walid Maalej

Abstract: App stores include an increasing amount of user feedback in form of app ratings and reviews. Research and recently also tool vendors have proposed analytics and data mining solutions to leverage this feedback to developers and analysts, e.g., for supporting release decisions. Research also showed that positive feedback improves apps' downloads and sales figures and thus their success. As a side ef… ▽ More App stores include an increasing amount of user feedback in form of app ratings and reviews. Research and recently also tool vendors have proposed analytics and data mining solutions to leverage this feedback to developers and analysts, e.g., for supporting release decisions. Research also showed that positive feedback improves apps' downloads and sales figures and thus their success. As a side effect, a market for fake, incentivized app reviews emerged with yet unclear consequences for developers, app users, and app store operators. This paper studies fake reviews, their providers, characteristics, and how well they can be automatically detected. We conducted disguised questionnaires with 43 fake review providers and studied their review policies to understand their strategies and offers. By comparing 60,000 fake reviews with 62 million reviews from the Apple App Store we found significant differences, e.g., between the corresponding apps, reviewers, rating distribution, and frequency. This inspired the development of a simple classifier to automatically detect fake reviews in app stores. On a labelled and imbalanced dataset including one-tenth of fake reviews, as reported in other domains, our classifier achieved a recall of 91% and an AUC/ROC value of 98%. We discuss our findings and their impact on software engineering, app users, and app store operators. △ Less

Submitted 11 April, 2019; originally announced April 2019.

arXiv:1810.01114 [pdf, other]

doi 10.1145/3274336

Who is Addressed in this Comment? Automatically Classifying Meta-Comments in News Comments

Authors: Marlo Häring, Wiebke Loosen, Walid Maalej

Abstract: User comments have become an essential part of online journalism. However, newsrooms are often overwhelmed by the vast number of diverse comments, for which a manual analysis is barely feasible. Identifying meta-comments that address or mention newsrooms, individual journalists, or moderators and that may call for reactions is particularly critical. In this paper, we present an automated approach… ▽ More User comments have become an essential part of online journalism. However, newsrooms are often overwhelmed by the vast number of diverse comments, for which a manual analysis is barely feasible. Identifying meta-comments that address or mention newsrooms, individual journalists, or moderators and that may call for reactions is particularly critical. In this paper, we present an automated approach to identify and classify meta-comments. We compare comment classification based on manually extracted features with an end-to-end learning approach. We develop, optimize, and evaluate multiple classifiers on a comment dataset of the large German online newsroom SPIEGEL Online and the 'One Million Posts' corpus of DER STANDARD, an Austrian newspaper. Both optimized classification approaches achieved encouraging $F_{0.5}$ values between 76% and 91%. We report on the most significant classification features with the results of a qualitative analysis and discuss how our work contributes to making participation in online journalism more constructive. △ Less

Submitted 2 October, 2018; originally announced October 2018.

Comments: Accepted for publication to the 21st ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW18)

arXiv:1808.02284 [pdf, other]

Needs and Challenges for a Platform to Support Large-scale Requirements Engineering. A Multiple Case Study

Authors: Davide Fucci, Cristina Palomares, Dolors Costal, Xavier Franch, Mikko Raatikainen, Martin Stettinger, Zijad Kurtanovic, Tero Kojo, Lars Koenig, Andreas Falkner, Gottfried Schenner, Fabrizio Brasca, Tomi Männistö, Alexander Felfernig, Walid Maalej

Abstract: Background: Requirement engineering is often considered a critical activity in system development projects. The increasing complexity of software, as well as number and heterogeneity of stakeholders, motivate the development of methods and tools for improving large-scale requirement engineering. Aims: The empirical study presented in this paper aims to identify and understand the characteristics a… ▽ More Background: Requirement engineering is often considered a critical activity in system development projects. The increasing complexity of software, as well as number and heterogeneity of stakeholders, motivate the development of methods and tools for improving large-scale requirement engineering. Aims: The empirical study presented in this paper aims to identify and understand the characteristics and challenges of a platform, as desired by experts, to support requirement engineering for individual stakeholders, based on the current pain-points of their organizations when dealing with a large number requirements. Method: We conducted a multiple case study with three companies in different domains. We collected data through ten semi-structured interviews with experts from these companies. Results: The main pain-point for stakeholders is handling the vast amount of data from different sources. The foreseen platform should leverage such data to manage changes in requirements according to customers' and users' preferences. It should also offer stakeholders an estimation of how long a requirements engineering task will take to complete, along with an easier requirements dependency identification and requirements reuse strategy. Conclusions: The findings provide empirical evidence about how practitioners wish to improve their requirement engineering processes and tools. The insights are a starting point for in-depth investigations into the problems and solutions presented. Practitioners can use the results to improve existing or design new practices and tools. △ Less

Submitted 6 September, 2018; v1 submitted 7 August, 2018; originally announced August 2018.

Comments: Accepted for publication to the 12th International Symposium on Empirical Software Engineering and Measurement (ESEM18)

arXiv:1807.00518 [pdf, other]

doi 10.1109/MS.2017.46

App Store 2.0: From Crowd Information to Actionable Feedback in Mobile Ecosystems

Authors: María Gómez, Bram Adams, Walid Maalej, Martin Monperrus, Romain Rouvoy

Abstract: Given the increasing competition in mobile app ecosystems, improving the experience of users has become a major goal for app vendors. This article introduces a visionary app store, called APP STORE 2.0, which exploits crowdsourced information about apps, devices and users to increase the overall quality of the delivered mobile apps. We sketch a blueprint architecture of the envisioned app stores a… ▽ More Given the increasing competition in mobile app ecosystems, improving the experience of users has become a major goal for app vendors. This article introduces a visionary app store, called APP STORE 2.0, which exploits crowdsourced information about apps, devices and users to increase the overall quality of the delivered mobile apps. We sketch a blueprint architecture of the envisioned app stores and discuss the different kinds of actionable feedbacks that app stores can generate using crowdsourced information. △ Less

Submitted 2 July, 2018; originally announced July 2018.

Journal ref: IEEE Software, Institute of Electrical and Electronics Engineers, 2017, 34, pp.81-89

arXiv:1806.02592 [pdf, other]

A Simple NLP-based Approach to Support Onboarding and Retention in Open Source Communities

Authors: Christoph Stanik, Lloyd Montgomery, Daniel Martens, Davide Fucci, Walid Maalej

Abstract: Successful open source communities are constantly looking for new members and hel** them become active developers. A common approach for developer onboarding in open source projects is to let newcomers focus on relevant yet easy-to-solve issues to familiarize themselves with the code and the community. The goal of this research is twofold. First, we aim at automatically identifying issues that n… ▽ More Successful open source communities are constantly looking for new members and hel** them become active developers. A common approach for developer onboarding in open source projects is to let newcomers focus on relevant yet easy-to-solve issues to familiarize themselves with the code and the community. The goal of this research is twofold. First, we aim at automatically identifying issues that newcomers can resolve by analyzing the history of resolved issues by simply using the title and description of issues. Second, we aim at automatically identifying issues, that can be resolved by newcomers who later become active developers. We mined the issue trackers of three large open source projects and extracted natural language features from the title and description of resolved issues. In a series of experiments, we optimized and compared the accuracy of four supervised classifiers to address our research goals. Random Forest, achieved up to 91% precision (F1-score 72%) towards the first goal while for the second goal, Decision Tree achieved a precision of 92% (F1-score 91%). A qualitative evaluation gave insights on what information in the issue description is helpful for newcomers. Our approach can be used to automatically identify, label, and recommend issues for newcomers in open source software projects based only on the text of the issues. △ Less

Submitted 16 August, 2018; v1 submitted 7 June, 2018; originally announced June 2018.

arXiv:1803.10587 [pdf]

A First Implementation of a Design Thinking Workshop During a Mobile App Development Project Course

Authors: Yen Dieu Pham, Davide Fucci, Walid Maalej

Abstract: Due to their characteristics, millennials prefer learning-by-doing and social learning, such as project-based learning. However, software development projects require not only technical skills but also creativity; Design Thinking can serve such purpose. We conducted a workshop following the Design Thinking approach of the d.school, to help students generating ideas for a mobile app development pro… ▽ More Due to their characteristics, millennials prefer learning-by-doing and social learning, such as project-based learning. However, software development projects require not only technical skills but also creativity; Design Thinking can serve such purpose. We conducted a workshop following the Design Thinking approach of the d.school, to help students generating ideas for a mobile app development project course. On top of the details for implementing the workshop, we report our observations, lessons learned, and provide suggestions for further implementation. △ Less

Submitted 28 March, 2018; originally announced March 2018.

Comments: Second IEEE/ACM International Workshop on Software Engineering Education for Millennials

arXiv:1803.01661 [pdf, other]

ReviewChain: Untampered Product Reviews on the Blockchain

Authors: Daniel Martens, Walid Maalej

Abstract: Online portals include an increasing amount of user feedback in form of ratings and reviews. Recent research highlighted the importance of this feedback and confirmed that positive feedback improves product sales figures and thus its success. However, online portals' operators act as central authorities throughout the overall review process. In the worst case, operators can exclude users from subm… ▽ More Online portals include an increasing amount of user feedback in form of ratings and reviews. Recent research highlighted the importance of this feedback and confirmed that positive feedback improves product sales figures and thus its success. However, online portals' operators act as central authorities throughout the overall review process. In the worst case, operators can exclude users from submitting reviews, modify existing reviews, and introduce fake reviews by fictional consumers. This paper presents ReviewChain, a decentralized review approach. Our approach avoids central authorities by using blockchain technologies, decentralized apps and storage. Thereby, we enable users to submit and retrieve untampered reviews. We highlight the implementation challenges encountered when realizing our approach on the public Ethereum blockchain. For each implementation challange, we discuss possible design alternatives and their trade-offs regarding costs, security, and trustworthiness. Finally, we analyze which design decision should be chosen to support specific trade-offs and present resulting combinations of decentralized blockchain technologies, also with conventional centralized technologies. △ Less

Submitted 5 March, 2018; originally announced March 2018.

arXiv:1707.08824 [pdf, other]

doi 10.1145/3121257.3121260

Find, Understand, and Extend Development Screencasts on YouTube

Authors: Mathias Ellmann, Alexander Oeser, Davide Fucci, Walid Maalej

Abstract: A software development screencast is a video that captures the screen of a developer working on a particular task while explaining its implementation details. Due to the increased popularity of software development screencasts (e.g., available on YouTube), we study how and to what extent they can be used as additional source of knowledge to answer developer's questions about, for example, the use… ▽ More A software development screencast is a video that captures the screen of a developer working on a particular task while explaining its implementation details. Due to the increased popularity of software development screencasts (e.g., available on YouTube), we study how and to what extent they can be used as additional source of knowledge to answer developer's questions about, for example, the use of a specific API. We first differentiate between development and other types of screencasts using video frame analysis. By using the Cosine algorithm, developers can expect ten development screencasts in the top 20 out of 100 different YouTube videos. We then extracted popular development topics on which screencasts are reporting on YouTube: database operations, system set-up, plug-in development, game development, and testing. Besides, we found six recurring tasks performed in development screencasts, such as object usage and UI operations. Finally, we conducted a similarity analysis by considering only the spoken words (i.e., the screencast transcripts but not the text that might appear in a scene) to link API documents, such as the Javadoc, to the appropriate screencasts. By using Cosine similarity, we identified 38 relevant documents in the top 20 out of 9455 API documents. △ Less

Submitted 27 July, 2017; originally announced July 2017.

Showing 1–32 of 32 results for author: Maalej, W