Search | arXiv e-print repository

Navigating Fairness: Practitioners' Understanding, Challenges, and Strategies in AI/ML Development

Authors: Aastha Pant, Rashina Hoda, Chakkrit Tantithamthavorn, Burak Turhan

Abstract: The rise in the use of AI/ML applications across industries has sparked more discussions about the fairness of AI/ML in recent times. While prior research on the fairness of AI/ML exists, there is a lack of empirical studies focused on understanding the views and experiences of AI practitioners in develo** a fair AI/ML. Understanding AI practitioners' views and experiences on the fairness of AI/… ▽ More The rise in the use of AI/ML applications across industries has sparked more discussions about the fairness of AI/ML in recent times. While prior research on the fairness of AI/ML exists, there is a lack of empirical studies focused on understanding the views and experiences of AI practitioners in develo** a fair AI/ML. Understanding AI practitioners' views and experiences on the fairness of AI/ML is important because they are directly involved in its development and deployment and their insights can offer valuable real-world perspectives on the challenges associated with ensuring fairness in AI/ML. We conducted semi-structured interviews with 22 AI practitioners to investigate their understanding of what a 'fair AI/ML' is, the challenges they face in develo** a fair AI/ML, the consequences of develo** an unfair AI/ML, and the strategies they employ to ensure AI/ML fairness. We developed a framework showcasing the relationship between AI practitioners' understanding of 'fair AI/ML' and (i) their challenges in its development, (ii) the consequences of develo** an unfair AI/ML, and (iii) strategies used to ensure AI/ML fairness. Additionally, we also identify areas for further investigation and offer recommendations to aid AI practitioners and AI companies in navigating fairness. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 31 pages, 8 figures, 2 tables

arXiv:2402.16546 [pdf, other]

doi 10.1145/3638245

Beyond Accuracy: An Empirical Study on Unit Testing in Open-source Deep Learning Projects

Authors: Han Wang, Sijia Yu, Chunyang Chen, Burak Turhan, Xiaodong Zhu

Abstract: Deep Learning (DL) models have rapidly advanced, focusing on achieving high performance through testing model accuracy and robustness. However, it is unclear whether DL projects, as software systems, are tested thoroughly or functionally correct when there is a need to treat and test them like other software systems. Therefore, we empirically study the unit tests in open-source DL projects, analyz… ▽ More Deep Learning (DL) models have rapidly advanced, focusing on achieving high performance through testing model accuracy and robustness. However, it is unclear whether DL projects, as software systems, are tested thoroughly or functionally correct when there is a need to treat and test them like other software systems. Therefore, we empirically study the unit tests in open-source DL projects, analyzing 9,129 projects from GitHub. We find that: 1) unit tested DL projects have positive correlation with the open-source project metrics and have a higher acceptance rate of pull requests, 2) 68% of the sampled DL projects are not unit tested at all, 3) the layer and utilities (utils) of DL models have the most unit tests. Based on these findings and previous research outcomes, we built a map** taxonomy between unit tests and faults in DL projects. We discuss the implications of our findings for developers and researchers and highlight the need for unit testing in open-source DL projects to ensure their reliability and stability. The study contributes to this community by raising awareness of the importance of unit testing in DL projects and encouraging further research in this area. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: ACM Transactions on Software Engineering and Methodology (2023)

arXiv:2402.05016 [pdf]

PhosNetVis: a web-based tool for fast kinase-substrate enrichment analysis and interactive 2D/3D network visualizations of phosphoproteomics data

Authors: Osho Rawal, Berk Turhan, Irene Font Peradejordi, Shreya Chandrasekar, Selim Kalayci, Sacha Gnjatic, Jeffrey Johnson, Mehdi Bouhaddou, Zeynep H. Gümüş

Abstract: Protein phosphorylation involves the reversible modification of a protein (substrate) residue by another protein (kinase). Liquid chromatography-mass spectrometry studies are rapidly generating massive protein phosphorylation datasets across multiple conditions. Researchers then must infer kinases responsible for changes in phosphosites of each substrate. However, tools that infer kinase-substrate… ▽ More Protein phosphorylation involves the reversible modification of a protein (substrate) residue by another protein (kinase). Liquid chromatography-mass spectrometry studies are rapidly generating massive protein phosphorylation datasets across multiple conditions. Researchers then must infer kinases responsible for changes in phosphosites of each substrate. However, tools that infer kinase-substrate interactions (KSIs) are not optimized to interactively explore the resulting large and complex networks, significant phosphosites, and states. There is thus an unmet need for a tool that facilitates user-friendly analysis, interactive exploration, visualization, and communication of phosphoproteomics datasets. We present PhosNetVis, a web-based tool for researchers of all computational skill levels to easily infer, generate and interactively explore KSI networks in 2D or 3D by streamlining phosphoproteomics data analysis steps within a single tool. PhostNetVis lowers barriers for researchers in rapidly generating high-quality visualizations to gain biological insights from their phosphoproteomics datasets. It is available at: https://gumuslab.github.io/PhosNetVis/ △ Less

Submitted 4 July, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Comments: Added new author, added references, changed fig1 and fig4

arXiv:2312.14648 [pdf, ps, other]

Inconsistency of Score-Elevated Reserve Policy for Indian Affirmative Action

Authors: Orhan Aygn, Bertan Turhan

Abstract: India has enacted an intricate affirmative action program through a reservation system since the 1950s. Notably, in 2008, a historic judgment by the Supreme Court of India (SCI) in the case of Ashoka Kumar Thakur vs. Union of India mandated a 27 percent reservation to the Other Backward Classes (OBC). The SCI's ruling suggested implementing the OBC reservation as a soft reserve without defining a… ▽ More India has enacted an intricate affirmative action program through a reservation system since the 1950s. Notably, in 2008, a historic judgment by the Supreme Court of India (SCI) in the case of Ashoka Kumar Thakur vs. Union of India mandated a 27 percent reservation to the Other Backward Classes (OBC). The SCI's ruling suggested implementing the OBC reservation as a soft reserve without defining a procedural framework. The SCI recommended a maximum of 10 points difference between the cutoff scores of the open-category and OBC positions. We show that this directive conflicts with India's fundamental Supreme Court mandates on reservation policy. Moreover, we show that the score-elevated reserve policy proposed by Sönmez and Yenmez (2022) is inconsistent with this directive. △ Less

Submitted 26 December, 2023; v1 submitted 22 December, 2023; originally announced December 2023.

arXiv:2310.02660 [pdf, ps, other]

Affirmative Action in India: Restricted Strategy Space, Complex Constraints, and Direct Mechanism Design

Authors: Orhan Aygün, Bertan Turhan

Abstract: Since 1950, India has instituted an intricate affirmative action program through a meticulously designed reservation system. This system incorporates vertical and horizontal reservations to address historically marginalized groups' socioeconomic imbalances. Vertical reservations designate specific quotas of available positions in publicly funded educational institutions and government employment f… ▽ More Since 1950, India has instituted an intricate affirmative action program through a meticulously designed reservation system. This system incorporates vertical and horizontal reservations to address historically marginalized groups' socioeconomic imbalances. Vertical reservations designate specific quotas of available positions in publicly funded educational institutions and government employment for Scheduled Castes, Scheduled Tribes, Other Backward Classes, and Economically Weaker Sections. Concurrently, horizontal reservations are employed within each vertical category to allocate positions for additional subgroups, such as women and individuals with disabilities. In educational admissions, the legal framework recommended that unfilled positions reserved for the OBC category revert to unreserved status. Moreover, we document that individuals from vertically reserved categories have more complicated preferences over institution-vertical category position pairs, even though authorities only elicit their preferences over institutions. To address these challenges, the present paper proposes a novel class of choice rules, termed the Generalized Lexicographic (GL) choice rules. This class is comprehensive, subsuming the most salient priority structures discussed in the extant matching literature. Utilizing the GL choice rules and the deferred acceptance mechanism, we present a robust framework that generates equitable and effective solutions for resource allocation problems in the Indian context. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2307.10057 [pdf, other]

Ethics in the Age of AI: An Analysis of AI Practitioners' Awareness and Challenges

Authors: Aastha Pant, Rashina Hoda, Simone V. Spiegler, Chakkrit Tantithamthavorn, Burak Turhan

Abstract: Ethics in AI has become a debated topic of public and expert discourse in recent years. But what do people who build AI - AI practitioners - have to say about their understanding of AI ethics and the challenges associated with incorporating it in the AI-based systems they develop? Understanding AI practitioners' views on AI ethics is important as they are the ones closest to the AI systems and can… ▽ More Ethics in AI has become a debated topic of public and expert discourse in recent years. But what do people who build AI - AI practitioners - have to say about their understanding of AI ethics and the challenges associated with incorporating it in the AI-based systems they develop? Understanding AI practitioners' views on AI ethics is important as they are the ones closest to the AI systems and can bring about changes and improvements. We conducted a survey aimed at understanding AI practitioners' awareness of AI ethics and their challenges in incorporating ethics. Based on 100 AI practitioners' responses, our findings indicate that majority of AI practitioners had a reasonable familiarity with the concept of AI ethics, primarily due to workplace rules and policies. Privacy protection and security was the ethical principle that majority of them were aware of. Formal education/training was considered somewhat helpful in preparing practitioners to incorporate AI ethics. The challenges that AI practitioners faced in the development of ethical AI-based systems included (i) general challenges, (ii) technology-related challenges and (iii) human-related challenges. We also identified areas needing further investigation and provided recommendations to assist AI practitioners and companies in incorporating ethics into AI development. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: 36 pages, 8 figures, 4 tables

arXiv:2305.11758 [pdf, ps, other]

The Over-and-Above Implementation of Reserve Policy in India

Authors: Orhan Aygün, Bertan Turhan

Abstract: The over-and-above choice rule is the prominent selection procedure to implement affirmative action. In India, it is legally mandated to allocate public school seats and government job positions. This paper presents an axiomatic characterization of the over-and-above choice rule by rigorously stating policy goals as formal axioms. Moreover, we characterize the deferred acceptance mechanism coupled… ▽ More The over-and-above choice rule is the prominent selection procedure to implement affirmative action. In India, it is legally mandated to allocate public school seats and government job positions. This paper presents an axiomatic characterization of the over-and-above choice rule by rigorously stating policy goals as formal axioms. Moreover, we characterize the deferred acceptance mechanism coupled with the over-and-above choice rules for centralized marketplaces. △ Less

Submitted 22 December, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

arXiv:2211.07142 [pdf, other]

Automated Detection, Categorisation and Developers' Experience with the Violations of Honesty in Mobile Apps

Authors: Humphrey O. Obie, Hung Du, Kashumi Madampe, Mojtaba Shahin, Idowu Ilekura, John Grundy, Li Li, Jon Whittle, Burak Turhan, Hourieh Khalajzadeh

Abstract: Human values such as honesty, social responsibility, fairness, privacy, and the like are things considered important by individuals and society. Software systems, including mobile software applications (apps), may ignore or violate such values, leading to negative effects in various ways for individuals and society. While some works have investigated different aspects of human values in software e… ▽ More Human values such as honesty, social responsibility, fairness, privacy, and the like are things considered important by individuals and society. Software systems, including mobile software applications (apps), may ignore or violate such values, leading to negative effects in various ways for individuals and society. While some works have investigated different aspects of human values in software engineering, this mixed-methods study focuses on honesty as a critical human value. In particular, we studied (i) how to detect honesty violations in mobile apps, (ii) the types of honesty violations in mobile apps, and (iii) the perspectives of app developers on these detected honesty violations. We first develop and evaluate 7 machine learning (ML) models to automatically detect violations of the value of honesty in app reviews from an end user perspective. The most promising was a Deep Neural Network model with F1 score of 0.921. We then conducted a manual analysis of 401 reviews containing honesty violations and characterised honest violations in mobile apps into 10 categories: unfair cancellation and refund policies; false advertisements; delusive subscriptions; cheating systems; inaccurate information; unfair fees; no service; deletion of reviews; impersonation; and fraudulent looking apps. A developer survey and interview study with mobile developers then identified 7 key causes behind honesty violations in mobile apps and 8 strategies to avoid or fix such violations. The findings of our developer study also articulate the negative consequences that honesty violations might bring for businesses, developers, and users. Finally, the app developers' feedback shows that our prototype ML-based models can have promising benefits in practice. △ Less

Submitted 14 November, 2022; originally announced November 2022.

Comments: Submitted Empirical Software Engineering Journal. arXiv admin note: substantial text overlap with arXiv:2203.07547

arXiv:2206.09514 [pdf, other]

Ethics in AI through the Practitioner's View: A Grounded Theory Literature Review

Authors: Aastha Pant, Rashina Hoda, Chakkrit Tantithamthavorn, Burak Turhan

Abstract: The term ethics is widely used, explored, and debated in the context of develo** Artificial Intelligence (AI) based software systems. In recent years, numerous incidents have raised the profile of ethical issues in AI development and led to public concerns about the proliferation of AI technology in our everyday lives. But what do we know about the views and experiences of those who develop thes… ▽ More The term ethics is widely used, explored, and debated in the context of develo** Artificial Intelligence (AI) based software systems. In recent years, numerous incidents have raised the profile of ethical issues in AI development and led to public concerns about the proliferation of AI technology in our everyday lives. But what do we know about the views and experiences of those who develop these systems -- the AI practitioners? We conducted a grounded theory literature review (GTLR) of 38 primary empirical studies that included AI practitioners' views on ethics in AI and analysed them to derive five categories: practitioner awareness, perception, need, challenge, and approach. These are underpinned by multiple codes and concepts that we explain with evidence from the included studies. We present a taxonomy of ethics in AI from practitioners' viewpoints to assist AI practitioners in identifying and understanding the different aspects of AI ethics. The taxonomy provides a landscape view of the key aspects that concern AI practitioners when it comes to ethics in AI. We also share an agenda for future research studies and recommendations for practitioners, managers, and organisations to help in their efforts to better consider and implement ethics in AI. △ Less

Submitted 19 February, 2024; v1 submitted 19 June, 2022; originally announced June 2022.

Comments: 57 pages, 6 figures, 3 tables

arXiv:2204.08674 [pdf, other]

Software Engineers Response to Public Crisis: Lessons Learnt from Spontaneously Building an Informative COVID-19 Dashboard

Authors: Han Wang, Chao Wu, Chunyang Chen, Burak Turhan, Shi** Chen, Jon Whittle

Abstract: The Coronavirus disease 2019 (COVID-19) outbreak quickly spread around the world, resulting in over 240 million infections and 4 million deaths by Oct 2021. While the virus is spreading from person to person silently, fear has also been spreading around the globe. The COVID-19 information from the Australian Government is convincing but not timely or detailed, and there is much information on soci… ▽ More The Coronavirus disease 2019 (COVID-19) outbreak quickly spread around the world, resulting in over 240 million infections and 4 million deaths by Oct 2021. While the virus is spreading from person to person silently, fear has also been spreading around the globe. The COVID-19 information from the Australian Government is convincing but not timely or detailed, and there is much information on social networks with both facts and rumors. As software engineers, we have spontaneously and rapidly constructed a COVID-19 information dashboard aggregating reliable information semi-automatically checked from different sources for providing one-stop information sharing site about the latest status in Australia. Inspired by the John Hopkins University COVID-19 Map, our dashboard contains the case statistics, case distribution, government policy, latest news, with interactive visualization. In this paper, we present a participant's in-person observations in which the authors acted as founders of https://covid-19-au.com/ serving more than 830K users with 14M page views since March 2020. According to our first-hand experience, we summarize 9 lessons for developers, researchers and instructors. These lessons may inspire the development, research and teaching in software engineer aspects for co** with similar public crises in the future. △ Less

Submitted 19 April, 2022; originally announced April 2022.

arXiv:2203.07547 [pdf]

On the Violation of Honesty in Mobile Apps: Automated Detection and Categories

Authors: Humphrey O. Obie, Idowu Ilekura, Hung Du, Mojtaba Shahin, John Grundy, Li Li, Jon Whittle, Burak Turhan

Abstract: Human values such as integrity, privacy, curiosity, security, and honesty are guiding principles for what people consider important in life. Such human values may be violated by mobile software applications (apps), and the negative effects of such human value violations can be seen in various ways in society. In this work, we focus on the human value of honesty. We present a model to support the a… ▽ More Human values such as integrity, privacy, curiosity, security, and honesty are guiding principles for what people consider important in life. Such human values may be violated by mobile software applications (apps), and the negative effects of such human value violations can be seen in various ways in society. In this work, we focus on the human value of honesty. We present a model to support the automatic identification of violations of the value of honesty from app reviews from an end-user perspective. Beyond the automatic detection of honesty violations by apps, we also aim to better understand different categories of honesty violations expressed by users in their app reviews. The result of our manual analysis of our honesty violations dataset shows that honesty violations can be characterised into ten categories: unfair cancellation and refund policies; false advertisements; delusive subscriptions; cheating systems; inaccurate information; unfair fees; no service; deletion of reviews; impersonation; and fraudulent-looking apps. Based on these results, we argue for a conscious effort in develo** more honest software artefacts including mobile apps, and the promotion of honesty as a key value in software development practices. Furthermore, we discuss the role of app distribution platforms as enforcers of ethical systems supporting human values, and highlight some proposed next steps for human values in software engineering (SE) research. △ Less

Submitted 14 March, 2022; originally announced March 2022.

Comments: 12 pages, Accepted for publication in 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR)

arXiv:2110.09366 [pdf]

doi 10.1109/TSE.2021.3113558

Use and Misuse of the Term Experiment in Mining Software Repositories Research

Authors: Claudia Ayala, Burak Turhan, Xavier Franch, Natalia Juristo

Abstract: The significant momentum and importance of Mining Software Repositories (MSR) in Software Engineering (SE) has fostered new opportunities and challenges for extensive empirical research. However, MSR researchers seem to struggle to characterize the empirical methods they use into the existing empirical SE body of knowledge. This is especially the case of MSR experiments. To provide evidence on the… ▽ More The significant momentum and importance of Mining Software Repositories (MSR) in Software Engineering (SE) has fostered new opportunities and challenges for extensive empirical research. However, MSR researchers seem to struggle to characterize the empirical methods they use into the existing empirical SE body of knowledge. This is especially the case of MSR experiments. To provide evidence on the special characteristics of MSR experiments and their differences with experiments traditionally acknowledged in SE so far, we elicited the hallmarks that differentiate an experiment from other types of empirical studies and characterized the hallmarks and types of experiments in MSR. We analyzed MSR literature obtained from a small-scale systematic map** study to assess the use of the term experiment in MSR. We found that 19% of the papers claiming to be an experiment are indeed not an experiment at all but also observational studies, so they use the term in a misleading way. From the remaining 81% of the papers, only one of them refers to a genuine controlled experiment while the others stand for experiments with limited control. MSR researchers tend to overlook such limitations, compromising the interpretation of the results of their studies. We provide recommendations and insights to support the improvement of MSR experiments. △ Less

Submitted 18 October, 2021; originally announced October 2021.

arXiv:2110.02682 [pdf, other]

How good does a Defect Predictor need to be to guide Search-Based Software Testing?

Authors: Anjana Perera, Burak Turhan, Aldeida Aleti, Marcel Böhme

Abstract: Defect predictors, static bug detectors and humans inspecting the code can locate the parts of the program that are buggy before they are discovered through testing. Automated test generators such as search-based software testing (SBST) techniques can use this information to direct their search for test cases to likely buggy code, thus speeding up the process of detecting existing bugs. However, o… ▽ More Defect predictors, static bug detectors and humans inspecting the code can locate the parts of the program that are buggy before they are discovered through testing. Automated test generators such as search-based software testing (SBST) techniques can use this information to direct their search for test cases to likely buggy code, thus speeding up the process of detecting existing bugs. However, often the predictions given by these tools or humans are imprecise, which can misguide the SBST technique and may deteriorate its performance. In this paper, we study the impact of imprecision in defect prediction on the bug detection effectiveness of SBST. Our study finds that the recall of the defect predictor, i.e., the probability of correctly identifying buggy code, has a significant impact on bug detection effectiveness of SBST with a large effect size. On the other hand, the effect of precision, a measure for false alarms, is not of meaningful practical significance as indicated by a very small effect size. In particular, the SBST technique finds 7.5 less bugs on average (out of 420 bugs) for every 5% decrements of the recall. In the context of combining defect prediction and SBST, our recommendation for practice is to increase the recall of defect predictors at the expense of precision, while maintaining a precision of at least 75%. To account for the imprecision of defect predictors, in particular low recall values, SBST techniques should be designed to search for test cases that also cover the predicted non-buggy parts of the program, while prioritising the parts that have been predicted as buggy. △ Less

Submitted 6 October, 2021; originally announced October 2021.

Comments: 12 pages, 4 figures

ACM Class: D.2.5

arXiv:2110.01832 [pdf, ps, other]

Does Domain Change the Opinion of Individuals on Human Values? A Preliminary Investigation on eHealth Apps End-users

Authors: Humphrey Obie, Mojtaba Shahin, John Grundy, Burak Turhan, Li Li, Waqar Hussain, Jon Whittle

Abstract: The elicitation of end-users' human values - such as freedom, honesty, transparency, etc. - is important in the development of software systems. We carried out two preliminary Q-studies to understand (a) the general human value opinion types of eHealth applications (apps) end-users (b) the eHealth domain human value opinion types of eHealth apps end-users (c) whether there are differences between… ▽ More The elicitation of end-users' human values - such as freedom, honesty, transparency, etc. - is important in the development of software systems. We carried out two preliminary Q-studies to understand (a) the general human value opinion types of eHealth applications (apps) end-users (b) the eHealth domain human value opinion types of eHealth apps end-users (c) whether there are differences between the general and eHealth domain opinion types. Our early results show three value opinion types using generic value instruments: (1) fun-loving, success-driven and independent end-user, (2) security-conscious, socially-concerned, and success-driven end-user, and (3) benevolent, success-driven, and conformist end-user Our results also show two value opinion types using domain-specific value instruments: (1) security-conscious, reputable, and honest end-user, and (2) success-driven, reputable and pain-avoiding end-user. Given these results, consideration should be given to domain context in the design and application of values elicitation instruments. △ Less

Submitted 5 October, 2021; originally announced October 2021.

Comments: Preprint accepted to appear in 28th Asia-Pacific Software Engineering Conference (APSEC 2021). 5 Pages

arXiv:2109.12645 [pdf, other]

doi 10.1145/3324884.3416612

Defect Prediction Guided Search-Based Software Testing

Authors: Anjana Perera, Aldeida Aleti, Marcel Böhme, Burak Turhan

Abstract: Today, most automated test generators, such as search-based software testing (SBST) techniques focus on achieving high code coverage. However, high code coverage is not sufficient to maximise the number of bugs found, especially when given a limited testing budget. In this paper, we propose an automated test generation technique that is also guided by the estimated degree of defectiveness of the s… ▽ More Today, most automated test generators, such as search-based software testing (SBST) techniques focus on achieving high code coverage. However, high code coverage is not sufficient to maximise the number of bugs found, especially when given a limited testing budget. In this paper, we propose an automated test generation technique that is also guided by the estimated degree of defectiveness of the source code. Parts of the code that are likely to be more defective receive more testing budget than the less defective parts. To measure the degree of defectiveness, we leverage Schwa, a notable defect prediction technique. We implement our approach into EvoSuite, a state of the art SBST tool for Java. Our experiments on the Defects4J benchmark demonstrate the improved efficiency of defect prediction guided test generation and confirm our hypothesis that spending more time budget on likely defective parts increases the number of bugs found in the same time budget. △ Less

Submitted 26 September, 2021; originally announced September 2021.

Comments: 13 pages, 8 figures

ACM Class: D.2.5

Journal ref: In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE '20), 2020

arXiv:2104.01024 [pdf, other]

doi 10.1145/3412841.3442020

A Comparison of Similarity Based Instance Selection Methods for Cross Project Defect Prediction

Authors: Seyedrebvar Hosseini, Burak Turhan

Abstract: Context: Previous studies have shown that training data instance selection based on nearest neighborhood (NN) information can lead to better performance in cross project defect prediction (CPDP) by reducing heterogeneity in training datasets. However, neighborhood calculation is computationally expensive and approximate methods such as Locality Sensitive Hashing (LSH) can be as effective as exact… ▽ More Context: Previous studies have shown that training data instance selection based on nearest neighborhood (NN) information can lead to better performance in cross project defect prediction (CPDP) by reducing heterogeneity in training datasets. However, neighborhood calculation is computationally expensive and approximate methods such as Locality Sensitive Hashing (LSH) can be as effective as exact methods. Aim: We aim at comparing instance selection methods for CPDP, namely LSH, NN-filter, and Genetic Instance Selection (GIS). Method: We conduct experiments with five base learners, optimizing their hyper parameters, on 13 datasets from PROMISE repository in order to compare the performance of LSH with benchmark instance selection methods NN-Filter and GIS. Results: The statistical tests show six distinct groups for F-measure performance. The top two group contains only LSH and GIS benchmarks whereas the bottom two groups contain only NN-Filter variants. LSH and GIS favor recall more than precision. In fact, for precision performance only three significantly distinct groups are detected by the tests where the top group is comprised of NN-Filter variants only. Recall wise, 16 different groups are identified where the top three groups contain only LSH methods, four of the next six are GIS only and the bottom five contain only NN-Filter. Finally, NN-Filter benchmarks never outperform the LSH counterparts with the same base learner, tuned or non-tuned. Further, they never even belong to the same rank group, meaning that LSH is always significantly better than NN-Filter with the same learner and settings. Conclusions: The increase in performance and the decrease in computational overhead and runtime make LSH a promising approach. However, the performance of LSH is based on high recall and in environments where precision is considered more important NN-Filter should be considered. △ Less

Submitted 2 April, 2021; originally announced April 2021.

Comments: The 36th ACM/SIGAPP Symposium on Applied Computing (SAC'21), 10 pages

arXiv:2103.05899 [pdf, ps, other]

How to De-Reserves Reserves: Admissions to Technical Colleges in India

Authors: Orhan Aygün, Bertan Turhan

Abstract: We study joint implementation of reservation and de-reservation policies in India that has been enforcing a comprehensive affirmative action since 1950. The landmark judgement of the Supreme Court of India in 2008 mandated that whenever OBC category (with 27 percent reservation) has unfilled positions they must be reverted to general category applicants in admissions to public schools without spec… ▽ More We study joint implementation of reservation and de-reservation policies in India that has been enforcing a comprehensive affirmative action since 1950. The landmark judgement of the Supreme Court of India in 2008 mandated that whenever OBC category (with 27 percent reservation) has unfilled positions they must be reverted to general category applicants in admissions to public schools without specifying how to implement it. We disclose the drawbacks of recently reformed allocation procedure in admissions to technical colleges and offer a solution through de-reservation via choice rules. We propose a novel priority design, Backward Transfers (BT) choice rule, for institutions and the deferred acceptance mechanism under these rules (DA-BT) for centralized clearinghouses. We show that DA-BT corrects the shortcomings of existing mechanisms. By formulating the legal requirements and policy goals in India as formal axioms, we show that the DA-BT mechanism is the unique mechanism for concurrent implementation of reservation and de-reservation policies. △ Less

Submitted 22 May, 2022; v1 submitted 10 March, 2021; originally announced March 2021.

arXiv:2012.10095 [pdf, other]

A First Look at Human Values-Violation in App Reviews

Authors: Humphrey O. Obie, Waqar Hussain, Xin Xia, John Grundy, Li Li, Burak Turhan, Jon Whittle, Mojtaba Shahin

Abstract: Ubiquitous technologies such as mobile software applications (mobile apps) have a tremendous influence on the evolution of the social, cultural, economic, and political facets of life in society. Mobile apps fulfil many practical purposes for users including entertainment, transportation, financial management, etc. Given the ubiquity of mobile apps in the lives of individuals and the consequent ef… ▽ More Ubiquitous technologies such as mobile software applications (mobile apps) have a tremendous influence on the evolution of the social, cultural, economic, and political facets of life in society. Mobile apps fulfil many practical purposes for users including entertainment, transportation, financial management, etc. Given the ubiquity of mobile apps in the lives of individuals and the consequent effect of these technologies on society, it is essential to consider the relationship between human values and the development and deployment of mobile apps. The many negative consequences of violating human values such as privacy, fairness or social justice by technology have been documented in recent times. If we can detect these violations in a timely manner, developers can look to better address them. To understand the violation of human values in a range of common mobile apps, we analysed 22,119 app reviews from Google Play Store using natural language processing techniques. We base our values violation detection approach on a widely accepted model of human values; the Schwartz theory of basic human values. The results of our analysis show that 26.5% of the reviews contained text indicating user perceived violations of human values. We found that benevolence and self-direction were the most violated value categories, and conformity and tradition were the least violated categories. Our results also highlight the need for a proactive approach to the alignment of values amongst stakeholders and the use of app reviews as a valuable additional source for mining values requirements. △ Less

Submitted 18 December, 2020; originally announced December 2020.

Comments: 10 pages, Accepted for publication in IEEE/ACM 43nd International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS), IEEE, 2021

arXiv:2012.01011 [pdf, ps, other]

Assignment Maximization

Authors: Mustafa Oğuz Afacan, Inácio Bó, Bertan Turhan

Abstract: We evaluate the goal of maximizing the number of individuals matched to acceptable outcomes. We show that it implies incentive, fairness, and implementation impossibilities. Despite that, we present two classes of mechanisms that maximize assignments. The first are Pareto efficient, and undominated -- in terms of number of assignments -- in equilibrium. The second are fair for unassigned students… ▽ More We evaluate the goal of maximizing the number of individuals matched to acceptable outcomes. We show that it implies incentive, fairness, and implementation impossibilities. Despite that, we present two classes of mechanisms that maximize assignments. The first are Pareto efficient, and undominated -- in terms of number of assignments -- in equilibrium. The second are fair for unassigned students and assign weakly more students than stable mechanisms in equilibrium. △ Less

Submitted 2 December, 2020; originally announced December 2020.

arXiv:2011.11942 [pdf, other]

A Family of Experiments on Test-Driven Development

Authors: Adrian Santos, Sira Vegas, Oscar Dieste, Fernando Uyaguari, Aysee Tosun, Davide Fucci, Burak Turhan, Giuseppe Scanniello, Simone Romano, Itir Karac, Marco Kuhrmann, Vladimir Mandic, Robert Ramac, Dietmar Pfahl, Christian Engblom, Jarno Kyykka, Kerli Rungi, Carolina Palomeque, Jaroslav Spisak, Markku Oivo, Natalia Juristo

Abstract: Context: Test-driven development (TDD) is an agile software development approach that has been widely claimed to improve software quality. However, the extent to which TDD improves quality appears to be largely dependent upon the characteristics of the study in which it is evaluated (e.g., the research method, participant type, programming environment, etc.). The particularities of each study make… ▽ More Context: Test-driven development (TDD) is an agile software development approach that has been widely claimed to improve software quality. However, the extent to which TDD improves quality appears to be largely dependent upon the characteristics of the study in which it is evaluated (e.g., the research method, participant type, programming environment, etc.). The particularities of each study make the aggregation of results untenable. Objectives: The goal of this paper is to: increase the accuracy and generalizability of the results achieved in isolated experiments on TDD, provide joint conclusions on the performance of TDD across different industrial and academic settings, and assess the extent to which the characteristics of the experiments affect the quality-related performance of TDD. Method: We conduct a family of 12 experiments on TDD in academia and industry. We aggregate their results by means of meta-analysis. We perform exploratory analyses to identify variables impacting the quality-related performance of TDD. Results: TDD novices achieve a slightly higher code quality with iterative test-last development (i.e., ITL, the reverse approach of TDD) than with TDD. The task being developed largely determines quality. The programming environment, the order in which TDD and ITL are applied, or the learning effects from one development approach to another do not appear to affect quality. The quality-related performance of professionals using TDD drops more than for students. We hypothesize that this may be due to their being more resistant to change and potentially less motivated than students. Conclusion: Previous studies seem to provide conflicting results on TDD performance (i.e., positive vs. negative, respectively). We hypothesize that these conflicting results may be due to different study durations, experiment participants being unfamiliar with the TDD process... △ Less

Submitted 24 November, 2020; originally announced November 2020.

arXiv:2011.06244 [pdf, other]

A Fine-grained Data Set and Analysis of Tangling in Bug Fixing Commits

Authors: Steffen Herbold, Alexander Trautsch, Benjamin Ledel, Alireza Aghamohammadi, Taher Ahmed Ghaleb, Kuljit Kaur Chahal, Tim Bossenmaier, Bhaveet Nagaria, Philip Makedonski, Matin Nili Ahmadabadi, Kristof Szabados, Helge Spieker, Matej Madeja, Nathaniel Hoy, Valentina Lenarduzzi, Shangwen Wang, Gema Rodríguez-Pérez, Ricardo Colomo-Palacios, Roberto Verdecchia, Paramvir Singh, Yihao Qin, Debasish Chakroborti, Willard Davis, Vijay Walunj, Hongjun Wu , et al. (23 additional authors not shown)

Abstract: Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Metho… ▽ More Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Methods: We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus. Results: We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to 87%. We find that about 11% of lines are hard to label leading to active disagreements between participants. Due to confirmed tangling and the uncertainty in our data, we estimate that 3% to 47% of data is noisy without manual untangling, depending on the use case. Conclusion: Tangled commits have a high prevalence in bug fixes and can lead to a large amount of noise in the data. Prior research indicates that this noise may alter results. As researchers, we should be skeptics and assume that unvalidated data is likely very noisy, until proven otherwise. △ Less

Submitted 13 October, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

Comments: Status: Accepted at Empirical Software Engineering

arXiv:2010.03525 [pdf]

Empirical Standards for Software Engineering Research

Authors: Paul Ralph, Nauman bin Ali, Sebastian Baltes, Domenico Bianculli, Jessica Diaz, Yvonne Dittrich, Neil Ernst, Michael Felderer, Robert Feldt, Antonio Filieri, Breno Bernard Nicolau de França, Carlo Alberto Furia, Greg Gay, Nicolas Gold, Daniel Graziotin, Pinjia He, Rashina Hoda, Natalia Juristo, Barbara Kitchenham, Valentina Lenarduzzi, Jorge Martínez, Jorge Melegati, Daniel Mendez, Tim Menzies, Jefferson Molleri , et al. (18 additional authors not shown)

Abstract: Empirical Standards are natural-language models of a scientific community's expectations for a specific kind of study (e.g. a questionnaire survey). The ACM SIGSOFT Paper and Peer Review Quality Initiative generated empirical standards for research methods commonly used in software engineering. These living documents, which should be continuously revised to reflect evolving consensus around resear… ▽ More Empirical Standards are natural-language models of a scientific community's expectations for a specific kind of study (e.g. a questionnaire survey). The ACM SIGSOFT Paper and Peer Review Quality Initiative generated empirical standards for research methods commonly used in software engineering. These living documents, which should be continuously revised to reflect evolving consensus around research best practices, will improve research quality and make peer review more effective, reliable, transparent and fair. △ Less

Submitted 4 March, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

Comments: For the complete standards, supplements and other resources, see https://github.com/acmsigsoft/EmpiricalStandards

arXiv:2008.12528 [pdf, ps, other]

Researcher Bias in Software Engineering Experiments: a Qualitative Investigation

Authors: Simone Romano, Davide Fucci, Giuseppe Scanniello, Maria Teresa Baldassarre, Burak Turhan, Natalia Juristo

Abstract: Researcher Bias (RB) occurs when researchers influence the results of an empirical study based on their expectations.RB might be due to the use of Questionable Research Practices(QRPs). In research fields like medicine, blinding techniques have been applied to counteract RB. We conducted an explorative qualitative survey to investigate RB in Software Engineering (SE)experiments, with respect to (i… ▽ More Researcher Bias (RB) occurs when researchers influence the results of an empirical study based on their expectations.RB might be due to the use of Questionable Research Practices(QRPs). In research fields like medicine, blinding techniques have been applied to counteract RB. We conducted an explorative qualitative survey to investigate RB in Software Engineering (SE)experiments, with respect to (i) QRPs potentially leading to RB, (ii) causes behind RB, and (iii) possible actions to counteract including blinding techniques. Data collection was based on semi-structured interviews. We interviewed nine active experts in the empirical SE community. We then analyzed the transcripts of these interviews through thematic analysis. We found that some QRPs are acceptable in certain cases. Also, it appears that the presence of RB is perceived in SE and, to counteract RB, a number of solutions have been highlighted: some are intended for SE researchers and others for the boards of SE research outlets. △ Less

Submitted 28 August, 2020; originally announced August 2020.

Comments: Published at SEAA2020

arXiv:2005.01127 [pdf, other]

doi 10.1007/s10664-020-09875-y

Pandemic Programming: How COVID-19 affects software developers and how their organizations can help

Authors: Paul Ralph, Sebastian Baltes, Gianisa Adisaputri, Richard Torkar, Vladimir Kovalenko, Marcos Kalinowski, Nicole Novielli, Shin Yoo, Xavier Devroey, Xin Tan, Minghui Zhou, Burak Turhan, Rashina Hoda, Hideaki Hata, Gregorio Robles, Amin Milani Fard, Rana Alkadhi

Abstract: Context. As a novel coronavirus swept the world in early 2020, thousands of software developers began working from home. Many did so on short notice, under difficult and stressful conditions. Objective. This study investigates the effects of the pandemic on developers' wellbeing and productivity. Method. A questionnaire survey was created mainly from existing, validated scales and translated into… ▽ More Context. As a novel coronavirus swept the world in early 2020, thousands of software developers began working from home. Many did so on short notice, under difficult and stressful conditions. Objective. This study investigates the effects of the pandemic on developers' wellbeing and productivity. Method. A questionnaire survey was created mainly from existing, validated scales and translated into 12 languages. The data was analyzed using non-parametric inferential statistics and structural equation modeling. Results. The questionnaire received 2225 usable responses from 53 countries. Factor analysis supported the validity of the scales and the structural model achieved a good fit (CFI = 0.961, RMSEA = 0.051, SRMR = 0.067). Confirmatory results include: (1) the pandemic has had a negative effect on developers' wellbeing and productivity; (2) productivity and wellbeing are closely related; (3) disaster preparedness, fear related to the pandemic and home office ergonomics all affect wellbeing or productivity. Exploratory analysis suggests that: (1) women, parents and people with disabilities may be disproportionately affected; (2) different people need different kinds of support. Conclusions. To improve employee productivity, software companies should focus on maximizing employee wellbeing and improving the ergonomics of employees' home offices. Women, parents and disabled persons may require extra support. △ Less

Submitted 20 July, 2020; v1 submitted 3 May, 2020; originally announced May 2020.

Comments: 34 pages, 7 tables, 5 figures, to appear in Empirical Software Engineering

Journal ref: Empirical Software Engineering, 2020

arXiv:2005.01103 [pdf, ps, other]

Dynamic Reserves in Matching Markets

Authors: Orhan Aygün, Bertan Turhan

Abstract: We study a school choice problem under affirmative action policies where authorities reserve a certain fraction of the slots at each school for specific student groups, and where students have preferences not only over the schools they are matched to but also the type of slots they receive. Such reservation policies might cause waste in instances of low demand from some student groups. To propose… ▽ More We study a school choice problem under affirmative action policies where authorities reserve a certain fraction of the slots at each school for specific student groups, and where students have preferences not only over the schools they are matched to but also the type of slots they receive. Such reservation policies might cause waste in instances of low demand from some student groups. To propose a solution to this issue, we construct a family of choice functions, dynamic reserves choice functions, for schools that respect within-group fairness and allow the transfer of otherwise vacant slots from low-demand groups to high-demand groups. We propose the cumulative offer mechanism (COM) as an allocation rule where each school uses a dynamic reserves choice function and show that it is stable with respect to schools' choice functions, is strategy-proof, and respects improvements. Furthermore, we show that transferring more of the otherwise vacant slots leads to strategy-proof Pareto improvement under the COM. △ Less

Submitted 3 May, 2020; originally announced May 2020.

arXiv:2004.13265 [pdf, ps, other]

Slot-specific Priorities with Capacity Transfers

Authors: Michelle Avataneo, Bertan Turhan

Abstract: In many real-world matching applications, there are restrictions for institutions either on priorities of their slots or on the transferability of unfilled slots over others (or both). Motivated by the need in such real-life matching problems, this paper formulates a family of practical choice rules, slot-specific priorities with capacity transfers (SSPwCT). These practical rules invoke both slot-… ▽ More In many real-world matching applications, there are restrictions for institutions either on priorities of their slots or on the transferability of unfilled slots over others (or both). Motivated by the need in such real-life matching problems, this paper formulates a family of practical choice rules, slot-specific priorities with capacity transfers (SSPwCT). These practical rules invoke both slot-specific priorities structure and transferability of vacant slots. We show that the cumulative offer mechanism (COM) is stable, strategy-proof and respects improvements with regards to SSPwCT choice rules. Transferring the capacity of one more unfilled slot, while all else is constant, leads to strategy-proof Pareto improvement of the COM. Following Kominer's (2020) formulation, we also provide comparative static results for expansion of branch capacity and addition of new contracts in the SSPwCT framework. Our results have implications for resource allocation problems with diversity considerations. △ Less

Submitted 21 September, 2020; v1 submitted 27 April, 2020; originally announced April 2020.

arXiv:2004.13264

Designing Direct Matching Mechanism for India with Comprehensive Affirmative Action

Authors: Orhan Aygün, Bertan Turhan

Abstract: Since 1950, India has been implementing the most comprehensive affirmative action program in the world. Vertical reservations are provided to members of historically discriminated Scheduled Castes (SC), Scheduled Tribes (ST), and Other Backward Classes (OBC). Horizontal reservations are provided for other disadvantaged groups, such as women and disabled people, within each vertical category. There… ▽ More Since 1950, India has been implementing the most comprehensive affirmative action program in the world. Vertical reservations are provided to members of historically discriminated Scheduled Castes (SC), Scheduled Tribes (ST), and Other Backward Classes (OBC). Horizontal reservations are provided for other disadvantaged groups, such as women and disabled people, within each vertical category. There is no well-defined procedure to implement horizontal reservations jointly with vertical reservation and OBC de-reservations. Sequential processes currently in use for OBC de-reservations and meritorious reserve candidates lead to severe shortcomings. Most importantly, indirect mechanisms currently used in practice do not allow reserve category applicants to fully express their preferences. To overcome these and other related issues, we design several different choice rules for institutions that take meritocracy, vertical and horizontal reservations, and OBC de-reservations into account. We propose a centralized mechanism to satisfactorily clear matching markets in India. △ Less

Submitted 26 December, 2021; v1 submitted 27 April, 2020; originally announced April 2020.

Comments: This paper is merged with another paper

arXiv:2004.13261

Matching with Generalized Lexicographic Choice Rules

Authors: Orhan Aygün, Bertan Turhan

Abstract: Motivated by the need for real-world matching problems, this paper formulates a large class of practical choice rules, Generalized Lexicographic Choice Rules (GLCR), for institutions that consist of multiple divisions. Institutions fill their divisions sequentially, and each division is endowed with a sub-choice rule that satisfies classical substitutability and size monotonicity in conjunction wi… ▽ More Motivated by the need for real-world matching problems, this paper formulates a large class of practical choice rules, Generalized Lexicographic Choice Rules (GLCR), for institutions that consist of multiple divisions. Institutions fill their divisions sequentially, and each division is endowed with a sub-choice rule that satisfies classical substitutability and size monotonicity in conjunction with a new property that we introduce, quota monotonicity. We allow rich interactions between divisions in the form of capacity transfers. The overall choice rule of an institution is defined as the union of the sub-choices of its divisions. The cumulative offer mechanism (COM) with respect to GLCR is the unique stable and strategy-proof mechanism. We define a choice-based improvement notion and show that the COM respects improvements. We employ the theory developed in this paper in our companion paper, Aygün and Turhan (2020), to design satisfactory matching mechanisms for India with comprehensive affirmative action constraints. △ Less

Submitted 28 July, 2020; v1 submitted 27 April, 2020; originally announced April 2020.

Comments: Exposition needs major editing and new results will be added

arXiv:2004.05335 [pdf, ps, other]

Increasing Validity Through Replication: An Illustrative TDD Case

Authors: Adrian Santos, Sira Vegas, Fernando Uyaguari, Oscar Dieste, Burak Turhan, Natalia Juristo

Abstract: Context: Software Engineering (SE) experiments suffer from threats to validity that may impact their results. Replication allows researchers building on top of previous experiments' weaknesses and increasing the reliability of the findings. Objective: Illustrating the benefits of replication to increase the reliability of the findings and uncover moderator variables. Method: We replicate an experi… ▽ More Context: Software Engineering (SE) experiments suffer from threats to validity that may impact their results. Replication allows researchers building on top of previous experiments' weaknesses and increasing the reliability of the findings. Objective: Illustrating the benefits of replication to increase the reliability of the findings and uncover moderator variables. Method: We replicate an experiment on Test-Driven-Development (TDD) and address some of its threats to validity and those of a previous replication. We compare the replications' results and hypothesize on plausible moderators impacting results. Results: Differences across TDD replications' results might be due to the operationalization of the response variables, the allocation of subjects to treatments, the allowance to work outside the laboratory, the provision of stubs, or the task. Conclusion: Replications allow examining the robustness of the findings, hypothesizing on plausible moderators influencing results, and strengthening the evidence obtained. △ Less

Submitted 11 April, 2020; originally announced April 2020.

arXiv:1909.05042

Iterative versus Exhaustive Data Selection for Cross Project Defect Prediction: An Extended Replication Study

Authors: Seyedrebvar Hosseini, Burak Turhan

Abstract: Context: The effectiveness of data selection approaches in improving the performance of cross project defect prediction(CPDP) has been shown in multiple previous studies. Beside that, replication studies play an important role in the support of any valid study. Repeating a study using the same or different subjects can lead to better understandings of the nature of the problem. Objective: We use… ▽ More Context: The effectiveness of data selection approaches in improving the performance of cross project defect prediction(CPDP) has been shown in multiple previous studies. Beside that, replication studies play an important role in the support of any valid study. Repeating a study using the same or different subjects can lead to better understandings of the nature of the problem. Objective: We use an iterative dataset selection (IDS) approach to generate training datasets and evaluate them on a set of randomly created validation datasets in the context of CPDP while considering a higher range of flexibility which makes the approach more feasible in practice. Method: We replicate an earlier study and present some insights into the achieved results while pointing out some of the shortcomings of the original study. Using the lessons learned, we propose to use an alternative training/validation dataset generation approaches which not only is more feasible in practice, but also achieves slightly better performances. We compare the results of our experiments to those from scenarios A, B, C and D from the original study. Results:Our experiments reveal that IDS is heavily recall based. The average recall performance for all test sets is 0.933 which is significantly higher than that from the replicated method. This in turn comes with a loss in precision. IDS has the lowest precision among the compared scenarios that use Decision Table learner. IDS however, achieves comparable or better F-measure performances. IDS achieves higher mean, median and min F-measure values while being more stable generally, in comparison with the replicated method. Conclusions: We conclude that datasets obtained from iterative/search-based approaches is a promising way to tackle CPDP. Especially, the performance increase in terms of both time and performance encourages further investigation of our approach. △ Less

Submitted 21 April, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

Comments: Conducting a major revision based on the feedback from the Empirical Software Engineering Journal

arXiv:1902.11278 [pdf]

doi 10.1109/TSE.2019.2909033

Requirements Framing Affects Design Creativity

Authors: Rahul Mohanani, Burak Turhan, Paul Ralph

Abstract: Design creativity, the originality and practicality of a solution concept is critical for the success of many software projects. However, little research has investigated the relationship between the way desiderata are presented and design creativity. This study therefore investigates the impact of presenting desiderata as ideas, requirements or prioritized requirements on design creativity. Two b… ▽ More Design creativity, the originality and practicality of a solution concept is critical for the success of many software projects. However, little research has investigated the relationship between the way desiderata are presented and design creativity. This study therefore investigates the impact of presenting desiderata as ideas, requirements or prioritized requirements on design creativity. Two between-subjects randomized controlled experiments were conducted with 42 and 34 participants. Participants were asked to create design concepts from a list of desiderata. Participants who received desiderata framed as requirements or prioritized requirements created designs that are, on average, less original but more practical than the designs created by participants who received desiderata framed as ideas. This suggests that more formal, structured presentations of desiderata are less appropriate where a creative solution is desired. The results also show that design performance is highly susceptible to minor changes in the vernacular used to communicate desiderata. △ Less

Submitted 28 February, 2019; originally announced February 2019.

arXiv:1810.12589 [pdf]

Key Stakeholders' Value Propositions for Feature Selection in Software-intensive Products: An Industrial Case Study

Authors: Pilar Rodríguez, Emilia Mendes, Burak Turhan

Abstract: Numerous software companies are adopting value-based decision making. However, what does value mean for key stakeholders making decisions? How do different stakeholder groups understand value? Without an explicit understanding of what value means, decisions are subject to ambiguity and vagueness, which are likely to bias them. This case study provides an in-depth analysis of key stakeholders' valu… ▽ More Numerous software companies are adopting value-based decision making. However, what does value mean for key stakeholders making decisions? How do different stakeholder groups understand value? Without an explicit understanding of what value means, decisions are subject to ambiguity and vagueness, which are likely to bias them. This case study provides an in-depth analysis of key stakeholders' value propositions when selecting features for a large telecommunications company's software-intensive product. Stakeholders' value propositions were elicited via interviews, which were analyzed using Grounded Theory coding techniques (open and selective coding). Thirty-six value propositions were identified and classified into six dimensions: customer value, market competitiveness, economic value/profitability, cost efficiency, technology & architecture, and company strategy. Our results show that although propositions in the customer value dimension were those mentioned the most, the concept of value for feature selection encompasses a wide range of value propositions. Moreover, stakeholder groups focused on different and complementary value dimensions, calling to the importance of involving all key stakeholders in the decision making process. Although our results are particularly relevant to companies similar to the one described herein, they aim to generate a learning process on value-based feature selection for practitioners and researchers in general. △ Less

Submitted 30 October, 2018; originally announced October 2018.

arXiv:1809.01510 [pdf]

On the Need of Preserving Order of Data When Validating Within-Project Defect Classifiers

Authors: Davide Falessi, Jacky Huang, Likhita Narayana, Jennifer Fong Thai, Burak Turhan

Abstract: [Context] The use of defect prediction models, such as classifiers, can support testing resource allocations by using data of the previous releases of the same project for predicting which software components are likely to be defective. A validation technique, hereinafter technique defines a specific way to split available data in training and test sets to measure a classifier accuracy. Time-serie… ▽ More [Context] The use of defect prediction models, such as classifiers, can support testing resource allocations by using data of the previous releases of the same project for predicting which software components are likely to be defective. A validation technique, hereinafter technique defines a specific way to split available data in training and test sets to measure a classifier accuracy. Time-series techniques have the unique ability to preserve the temporal order of data; i.e., preventing the testing set to have data antecedent to the training set. [Aim] The aim of this paper is twofold: first we check if there is a difference in the classifiers accuracy measured by time-series versus non-time-series techniques. Afterward, we check for a possible reason for this difference, i.e., if defect rates change across releases of a project. [Method] Our method consists of measuring the accuracy, i.e., AUC, of 10 classifiers on 13 open and two closed projects by using three validation techniques, namely cross validation, bootstrap, and walk-forward, where only the latter is a time-series technique. [Results] We find that the AUC of the same classifier used on the same project and measured by 10-fold varies compared to when measured by walk-forward in the range [-0.20, 0.22], and it is statistically different in 45% of the cases. Similarly, the AUC measured by bootstrap varies compared to when measured by walk-forward in the range [-0.17, 0.43], and it is statistically different in 56% of the cases. [Conclusions] We recommend choosing the technique to be used by carefully considering the conclusions to draw, the property of the available datasets, and the level of realism with the classifier usage scenario. △ Less

Submitted 31 July, 2020; v1 submitted 5 September, 2018; originally announced September 2018.

arXiv:1807.04100 [pdf, other]

The Effect of Noise on Sofware Engineers' Performance

Authors: Simone Romano, Giuseppe Scanniello, Davide Fucci, Natalia Juristo, Burak Turhan

Abstract: Background: Noise, defined as an unwanted sound, is one of the commonest factors that could affect people's performance in their daily work activities. The software engineering research community has marginally investigated the effects of noise on software engineers' performance. Aims: We studied if noise affects software engineers' performance in (i) comprehending functional requirements and (ii)… ▽ More Background: Noise, defined as an unwanted sound, is one of the commonest factors that could affect people's performance in their daily work activities. The software engineering research community has marginally investigated the effects of noise on software engineers' performance. Aims: We studied if noise affects software engineers' performance in (i) comprehending functional requirements and (ii) fixing faults in the source code. Method: We conducted two experiments with final-year undergraduate students in Computer Science. In the first experiment, we asked 55 students to comprehend functional requirements exposing them or not to noise, while in the second experiment 42 students were asked to fix faults in Java code. Results: The participants in the second experiment, when exposed to noise, had significantly worse performance in fixing faults in the source code. On the other hand, we did not observe any statistically significant difference in the first experiment. Conclusions: Fixing faults in source code seems to be more vulnerable to noise than comprehending functional requirements. △ Less

Submitted 11 July, 2018; originally announced July 2018.

Comments: ESEM18, Oulu (Finland), October 2018

arXiv:1707.03869 [pdf]

doi 10.1109/TSE.2018.2877759

Cognitive Biases in Software Engineering: A Systematic Map** Study

Authors: Rahul Mohanani, Iflaah Salman, Burak Turhan, Pilar Rodriguez, Paul Ralph

Abstract: One source of software project challenges and failures is the systematic errors introduced by human cognitive biases. Although extensively explored in cognitive psychology, investigations concerning cognitive biases have only recently gained popularity in software engineering (SE) research. This paper therefore systematically maps, aggregates and synthesizes the literature on cognitive biases in s… ▽ More One source of software project challenges and failures is the systematic errors introduced by human cognitive biases. Although extensively explored in cognitive psychology, investigations concerning cognitive biases have only recently gained popularity in software engineering (SE) research. This paper therefore systematically maps, aggregates and synthesizes the literature on cognitive biases in software engineering to generate a comprehensive body of knowledge, understand state of the art research and provide guidelines for future research and practise. Focusing on bias antecedents, effects and mitigation techniques, we identified 65 articles, which investigate 37 cognitive biases, published between 1990 and 2016. Despite strong and increasing interest, the results reveal a scarcity of research on mitigation techniques and poor theoretical foundations in understanding and interpreting cognitive biases. Although bias-related research has generated many new insights in the software engineering community, specific bias mitigation techniques are still needed for software professionals to overcome the deleterious effects of cognitive biases on their work. △ Less

Submitted 23 October, 2018; v1 submitted 12 July, 2017; originally announced July 2017.

Comments: Pre-print submitted to IEEE Transactions on Software Engineering

Journal ref: IEEE Transactions on Software Engineering, 46(12), 1318-1339 (2018)

arXiv:1611.05994 [pdf, other]

doi 10.1109/TSE.2016.2616877

A Dissection of the Test-Driven Development Process: Does It Really Matter to Test-First or to Test-Last?

Authors: Davide Fucci, Hakan Erdogmus, Burak Turhan, Markku Oivo, Natalia Juristo

Abstract: Background: Test-driven development (TDD) is a technique that repeats short coding cycles interleaved with testing. The developer first writes a unit test for the desired functionality, followed by the necessary production code, and refactors the code. Many empirical studies neglect unique process characteristics related to TDD iterative nature. Aim: We formulate four process characteristic: seque… ▽ More Background: Test-driven development (TDD) is a technique that repeats short coding cycles interleaved with testing. The developer first writes a unit test for the desired functionality, followed by the necessary production code, and refactors the code. Many empirical studies neglect unique process characteristics related to TDD iterative nature. Aim: We formulate four process characteristic: sequencing, granularity, uniformity, and refactoring effort. We investigate how these characteristics impact quality and productivity in TDD and related variations. Method: We analyzed 82 data points collected from 39 professionals, each capturing the process used while performing a specific development task. We built regression models to assess the impact of process characteristics on quality and productivity. Quality was measured by functional correctness. Result: Quality and productivity improvements were primarily positively associated with the granularity and uniformity. Sequencing, the order in which test and production code are written, had no important influence. Refactoring effort was negatively associated with both outcomes. We explain the unexpected negative correlation with quality by possible prevalence of mixed refactoring. Conclusion: The claimed benefits of TDD may not be due to its distinctive test-first dynamic, but rather due to the fact that TDD-like processes encourage fine-grained, steady steps that improve focus and flow. △ Less

Submitted 18 November, 2016; originally announced November 2016.

Showing 1–36 of 36 results for author: Turhan, B