Skip to main content

Showing 1–37 of 37 results for author: Tantithamthavorn, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.04839  [pdf

    cs.SE cs.AI

    AI for DevSecOps: A Landscape and Future Opportunities

    Authors: Michael Fu, Jirat Pasuksmit, Chakkrit Tantithamthavorn

    Abstract: DevOps has emerged as one of the most rapidly evolving software development paradigms. With the growing concerns surrounding security in software systems, the DevSecOps paradigm has gained prominence, urging practitioners to incorporate security practices seamlessly into the DevOps workflow. However, integrating security into the DevOps workflow can impact agility and impede delivery speed. Recent… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  2. arXiv:2403.15481  [pdf, other

    cs.CY cs.AI cs.SE

    Navigating Fairness: Practitioners' Understanding, Challenges, and Strategies in AI/ML Development

    Authors: Aastha Pant, Rashina Hoda, Chakkrit Tantithamthavorn, Burak Turhan

    Abstract: The rise in the use of AI/ML applications across industries has sparked more discussions about the fairness of AI/ML in recent times. While prior research on the fairness of AI/ML exists, there is a lack of empirical studies focused on understanding the views and experiences of AI practitioners in develo** a fair AI/ML. Understanding AI practitioners' views and experiences on the fairness of AI/… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 31 pages, 8 figures, 2 tables

  3. arXiv:2402.11910  [pdf, other

    cs.SE

    Enhancing Large Language Models for Text-to-Testcase Generation

    Authors: Saranya Alagarsamy, Chakkrit Tantithamthavorn, Chetan Arora, Aldeida Aleti

    Abstract: Context: Test-driven development (TDD) is a widely employed software development practice that involves develo** test cases based on requirements prior to writing the code. Although various methods for automated test case generation have been proposed, they are not specifically tailored for TDD, where requirements instead of code serve as input. Objective: In this paper, we introduce a text-to-t… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  4. arXiv:2402.09651  [pdf, other

    cs.SE cs.LG

    Practitioners' Challenges and Perceptions of CI Build Failure Predictions at Atlassian

    Authors: Yang Hong, Chakkrit Tantithamthavorn, Jirat Pasuksmit, Patanamon Thongtanunam, Arik Friedman, Xing Zhao, Anton Krasikov

    Abstract: Continuous Integration (CI) build failures could significantly impact the software development process and teams, such as delaying the release of new features and reducing developers' productivity. In this work, we report on an empirical study that investigates CI build failures throughout product development at Atlassian. Our quantitative analysis found that the repository dimension is the key fa… ▽ More

    Submitted 14 May, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  5. arXiv:2402.00905  [pdf, other

    cs.SE

    Fine-Tuning and Prompt Engineering for Large Language Models-based Code Review Automation

    Authors: Chanathip Pornprasit, Chakkrit Tantithamthavorn

    Abstract: Context: The rapid evolution of Large Language Models (LLMs) has sparked significant interest in leveraging their capabilities for automating code review processes. Prior studies often focus on develo** LLMs for code review automation, yet require expensive resources, which is infeasible for organizations with limited budgets and resources. Thus, fine-tuning and prompt engineering are the two co… ▽ More

    Submitted 16 June, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: 13 pages. Submit to IST journal

  6. arXiv:2401.07576  [pdf, other

    cs.SE

    TDD Without Tears: Towards Test Case Generation from Requirements through Deep Reinforcement Learning

    Authors: Wannita Takerngsaksiri, Rujikorn Charakorn, Chakkrit Tantithamthavorn, Yuan-Fang Li

    Abstract: Test-driven development (TDD) is a widely-employed software development practice that mandates writing test cases based on requirements before writing the actual code. While writing test cases is the centerpiece of TDD, it is time-consuming, expensive, and often shunned by developers. To address these issues associated with TDD, automated test case generation approaches have recently been investig… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: 21 pages, 5 figures

  7. arXiv:2311.00177  [pdf, other

    cs.SE

    Students' Perspective on AI Code Completion: Benefits and Challenges

    Authors: Wannita Takerngsaksiri, Cleshan Warusavitarne, Christian Yaacoub, Matthew Hee Keng Hou, Chakkrit Tantithamthavorn

    Abstract: AI Code Completion (e.g., GitHub's Copilot) has revolutionized how computer science students interact with programming languages. However, AI code completion has been studied from the developers' perspectives, not the students' perspectives who represent the future generation of our digital world. In this paper, we investigated the benefits, challenges, and expectations of AI code completion from… ▽ More

    Submitted 31 May, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

    Comments: Accepted at COMPSAC 2024 Workshop (The 7th IEEE International Workshop on Advances in Artificial Intelligence and Machine Learning: AI & ML for a Sustainable and Better Future)

  8. arXiv:2310.17903  [pdf, other

    cs.SE cs.AI

    Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey

    Authors: Xinyu She, Yue Liu, Yanjie Zhao, Yiling He, Li Li, Chakkrit Tantithamthavorn, Zhan Qin, Haoyu Wang

    Abstract: Modern language models (LMs) have been successfully employed in source code generation and understanding, leading to a significant increase in research focused on learning-based code intelligence, such as automated bug repair, and test case generation. Despite their great potential, language models for code intelligence (LM4Code) are susceptible to potential pitfalls, which hinder realistic perfor… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

  9. arXiv:2310.09810  [pdf, other

    cs.SE cs.CR

    ChatGPT for Vulnerability Detection, Classification, and Repair: How Far Are We?

    Authors: Michael Fu, Chakkrit Tantithamthavorn, Van Nguyen, Trung Le

    Abstract: Large language models (LLMs) like ChatGPT (i.e., gpt-3.5-turbo and gpt-4) exhibited remarkable advancement in a range of software engineering tasks associated with source code such as code review and code generation. In this paper, we undertake a comprehensive study by instructing ChatGPT for four prevalent vulnerability tasks: function and line-level vulnerability prediction, vulnerability classi… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

    Comments: Accepted at the 30th Asia-Pacific Software Engineering Conference (APSEC 2023)

  10. arXiv:2310.06308  [pdf, other

    cs.SE

    Unit Testing Challenges with Automated Marking

    Authors: Chakkrit Tantithamthavorn, Norman Chen

    Abstract: Teaching software testing presents difficulties due to its abstract and conceptual nature. The lack of tangible outcomes and limited emphasis on hands-on experience further compound the challenge, often leading to difficulties in comprehension for students. This can result in waning engagement and diminishing motivation over time. In this paper, we introduce online unit testing challenges with aut… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: 5 pages, accepted at the 30th Asia-Pacific Software Engineering Conference (APSEC 2023)

  11. arXiv:2307.12596  [pdf, other

    cs.SE

    Refining ChatGPT-Generated Code: Characterizing and Mitigating Code Quality Issues

    Authors: Yue Liu, Thanh Le-Cong, Ratnadira Widyasari, Chakkrit Tantithamthavorn, Li Li, Xuan-Bach D. Le, David Lo

    Abstract: We systematically study the quality of 4,066 ChatGPT-generated code implemented in two popular programming languages, i.e., Java and Python, for 2,033 programming tasks. The goal of this work is three folds. First, we analyze the correctness of ChatGPT on code generation tasks and uncover the factors that influence its effectiveness, including task difficulty, programming language, time that tasks… ▽ More

    Submitted 14 December, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

  12. arXiv:2307.10057  [pdf, other

    cs.CY cs.AI cs.SE

    Ethics in the Age of AI: An Analysis of AI Practitioners' Awareness and Challenges

    Authors: Aastha Pant, Rashina Hoda, Simone V. Spiegler, Chakkrit Tantithamthavorn, Burak Turhan

    Abstract: Ethics in AI has become a debated topic of public and expert discourse in recent years. But what do people who build AI - AI practitioners - have to say about their understanding of AI ethics and the challenges associated with incorporating it in the AI-based systems they develop? Understanding AI practitioners' views on AI ethics is important as they are the ones closest to the AI systems and can… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

    Comments: 36 pages, 8 figures, 4 tables

  13. arXiv:2306.06109  [pdf, other

    cs.CR cs.AI cs.LG

    Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities

    Authors: Michael Fu, Trung Le, Van Nguyen, Chakkrit Tantithamthavorn, Dinh Phung

    Abstract: Deep learning (DL) models have become increasingly popular in identifying software vulnerabilities. Prior studies found that vulnerabilities across different vulnerable programs may exhibit similar vulnerable scopes, implicitly forming discernible vulnerability patterns that can be learned by DL models through supervised training. However, vulnerable scopes still manifest in various spatial locati… ▽ More

    Submitted 26 May, 2023; originally announced June 2023.

  14. arXiv:2305.16615  [pdf, other

    cs.SE cs.CR

    AIBugHunter: A Practical Tool for Predicting, Classifying and Repairing Software Vulnerabilities

    Authors: Michael Fu, Chakkrit Tantithamthavorn, Trung Le, Yuki Kume, Van Nguyen, Dinh Phung, John Grundy

    Abstract: Many ML-based approaches have been proposed to automatically detect, localize, and repair software vulnerabilities. While ML-based methods are more effective than program analysis-based vulnerability analysis tools, few have been integrated into modern IDEs, hindering practical adoption. To bridge this critical gap, we propose AIBugHunter, a novel ML-based software vulnerability analysis tool for… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: 34 pages, Accepted at Empirical Software Engineering Journal

    Journal ref: Empirical Software Engineering (EMSE), 2023

  15. arXiv:2302.10352  [pdf, other

    cs.SE

    A3Test: Assertion-Augmented Automated Test Case Generation

    Authors: Saranya Alagarsamy, Chakkrit Tantithamthavorn, Aldeida Aleti

    Abstract: Test case generation is an important activity, yet a time-consuming and laborious task. Recently, AthenaTest -- a deep learning approach for generating unit test cases -- is proposed. However, AthenaTest can generate less than one-fifth of the test cases correctly, due to a lack of assertion knowledge and test signature verification. In this paper, we propose A3Test, a DL-based test case generatio… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

    Comments: Under Review at ACM Transactions on Software Engineering and Methodology

  16. arXiv:2302.09587  [pdf, other

    cs.SE

    On the Reliability and Explainability of Language Models for Program Generation

    Authors: Yue Liu, Chakkrit Tantithamthavorn, Yonghui Liu, Li Li

    Abstract: Recent studies have adopted pre-trained language models, such as CodeT5 and CodeGPT, for automated program generation tasks like code generation, repair, and translation. Numerous language model-based approaches have been proposed and evaluated on various benchmark datasets, demonstrating promising performance. However, there is still uncertainty about the reliability of these models, particularly… ▽ More

    Submitted 8 January, 2024; v1 submitted 19 February, 2023; originally announced February 2023.

    Comments: Accepted by ACM Transactions on Software Engineering and Methodology (TOSEM)

  17. arXiv:2302.06065  [pdf, other

    cs.SE

    A Systematic Literature Review of Explainable AI for Software Engineering

    Authors: Ahmad Haji Mohammadkhani, Nitin Sai Bommi, Mariem Daboussi, Onkar Sabnis, Chakkrit Tantithamthavorn, Hadi Hemmati

    Abstract: Context: In recent years, leveraging machine learning (ML) techniques has become one of the main solutions to tackle many software engineering (SE) tasks, in research studies (ML4SE). This has been achieved by utilizing state-of-the-art models that tend to be more complex and black-box, which is led to less explainable solutions that reduce trust and uptake of ML4SE solutions by professionals in t… ▽ More

    Submitted 12 February, 2023; originally announced February 2023.

  18. arXiv:2211.12821  [pdf, other

    cs.SE

    Explainable AI for Pre-Trained Code Models: What Do They Learn? When They Do Not Work?

    Authors: Ahmad Haji Mohammadkhani, Chakkrit Tantithamthavorn, Hadi Hemmati

    Abstract: In recent years, there has been a wide interest in designing deep neural network-based models that automate downstream software engineering tasks on source code, such as code document generation, code search, and program repair. Although the main objective of these studies is to improve the effectiveness of the downstream task, many studies only attempt to employ the next best neural network model… ▽ More

    Submitted 28 August, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

    Comments: 10 pages, 7 figures, Accepted at SCAM 2023

  19. arXiv:2211.04673  [pdf, other

    cs.SE cs.AI

    Syntax-Aware On-the-Fly Code Completion

    Authors: Wannita Takerngsaksiri, Chakkrit Tantithamthavorn, Yuan-Fang Li

    Abstract: Code completion aims to help improve developers' productivity by suggesting the next code tokens from a given context. Various approaches have been proposed to incorporate abstract syntax tree (AST) information for model training, ensuring that code completion is aware of the syntax of the programming languages. However, existing syntax-aware code completion approaches are not on-the-fly, as we fo… ▽ More

    Submitted 1 May, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

    Comments: 17 pages, Under Review at IEEE Transactions on Software Engineering

  20. arXiv:2209.10414  [pdf, other

    cs.CR cs.AI cs.LG

    Statement-Level Vulnerability Detection: Learning Vulnerability Patterns Through Information Theory and Contrastive Learning

    Authors: Van Nguyen, Trung Le, Chakkrit Tantithamthavorn, Michael Fu, John Grundy, Hung Nguyen, Seyit Camtepe, Paul Quirk, Dinh Phung

    Abstract: Software vulnerabilities are a serious and crucial concern. Typically, in a program or function consisting of hundreds or thousands of source code statements, there are only a few statements causing the corresponding vulnerabilities. Most current approaches to vulnerability labelling are done on a function or program level by experts with the assistance of machine learning tools. Extending this ap… ▽ More

    Submitted 11 June, 2024; v1 submitted 19 September, 2022; originally announced September 2022.

  21. arXiv:2209.10406  [pdf, other

    cs.CR cs.AI cs.LG

    Cross Project Software Vulnerability Detection via Domain Adaptation and Max-Margin Principle

    Authors: Van Nguyen, Trung Le, Chakkrit Tantithamthavorn, John Grundy, Hung Nguyen, Dinh Phung

    Abstract: Software vulnerabilities (SVs) have become a common, serious and crucial concern due to the ubiquity of computer software. Many machine learning-based approaches have been proposed to solve the software vulnerability detection (SVD) problem. However, there are still two open and significant issues for SVD in terms of i) learning automatic representations to improve the predictive performance of SV… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

  22. arXiv:2209.07048  [pdf, other

    cs.SE

    Automatically Recommend Code Updates: Are We There Yet?

    Authors: Yue Liu, Chakkrit Tantithamthavorn, Yonghui Liu, Patanamon Thongtanunam, Li Li

    Abstract: In recent years, large pre-trained Language Models of Code (CodeLMs) have shown promising results on various software engineering tasks. One such task is automatic code update recommendation, which transforms outdated code snippets into their approved and revised counterparts. Although many CodeLM-based approaches have been proposed, claiming high accuracy, their effectiveness and reliability on r… ▽ More

    Submitted 12 May, 2024; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: Under review at a SE journal

  23. arXiv:2209.00812  [pdf, other

    cs.CR cs.SE

    Explainable AI for Android Malware Detection: Towards Understanding Why the Models Perform So Well?

    Authors: Yue Liu, Chakkrit Tantithamthavorn, Li Li, Yepang Liu

    Abstract: Machine learning (ML)-based Android malware detection has been one of the most popular research topics in the mobile security community. An increasing number of research studies have demonstrated that machine learning is an effective and promising approach for malware detection, and some works have even claimed that their proposed models could achieve 99\% detection accuracy, leaving little room f… ▽ More

    Submitted 2 September, 2022; originally announced September 2022.

    Comments: Accepted by the 33rd IEEE International Symposium on Software Reliability Engineering (ISSRE 2022)

  24. arXiv:2206.09514  [pdf, other

    cs.SE

    Ethics in AI through the Practitioner's View: A Grounded Theory Literature Review

    Authors: Aastha Pant, Rashina Hoda, Chakkrit Tantithamthavorn, Burak Turhan

    Abstract: The term ethics is widely used, explored, and debated in the context of develo** Artificial Intelligence (AI) based software systems. In recent years, numerous incidents have raised the profile of ethical issues in AI development and led to public concerns about the proliferation of AI technology in our everyday lives. But what do we know about the views and experiences of those who develop thes… ▽ More

    Submitted 19 February, 2024; v1 submitted 19 June, 2022; originally announced June 2022.

    Comments: 57 pages, 6 figures, 3 tables

  25. Software Engineering in Australasia

    Authors: Sherlock A. Licorish, Christoph Treude, John Grundy, Chakkrit Tantithamthavorn, Kelly Blincoe, Stephen MacDonell, Li Li, Jean-Guy Schneider

    Abstract: Six months ago an important call was made for researchers globally to provide insights into the way Software Engineering is done in their region. Heeding this call we hereby outline the position Software Engineering in Australasia (New Zealand and Australia). This article first considers the software development methods practices and tools that are popular in the Australasian software engineering… ▽ More

    Submitted 10 June, 2022; originally announced June 2022.

    Comments: Journal article, 1 figure, 3 pages

    Journal ref: Software Engineering in Australasia, SIGSOFT Softw. Eng. Notes 46, 2(April 2021), pp. 16-17

  26. arXiv:2103.07068  [pdf, other

    cs.SE

    JITLine: A Simpler, Better, Faster, Finer-grained Just-In-Time Defect Prediction

    Authors: Chanathip Pornprasit, Chakkrit Tantithamthavorn

    Abstract: A Just-In-Time (JIT) defect prediction model is a classifier to predict if a commit is defect-introducing. Recently, CC2Vec -- a deep learning approach for Just-In-Time defect prediction -- has been proposed. However, CC2Vec requires the whole dataset (i.e., training + testing) for model training, assuming that all unlabelled testing datasets would be available beforehand, which does not follow th… ▽ More

    Submitted 16 March, 2021; v1 submitted 11 March, 2021; originally announced March 2021.

    Comments: 11 pages, accepted at 2021 International Conference on Mining Software Repositories (MSR'21)

  27. arXiv:2103.05292  [pdf, other

    cs.CR cs.LG cs.SE

    Deep Learning for Android Malware Defenses: a Systematic Literature Review

    Authors: Yue Liu, Chakkrit Tantithamthavorn, Li Li, Yepang Liu

    Abstract: Malicious applications (particularly those targeting the Android platform) pose a serious threat to developers and end-users. Numerous research efforts have been devoted to develo** effective approaches to defend against Android malware. However, given the explosive growth of Android malware and the continuous advancement of malicious evasion technologies like obfuscation and reflection, Android… ▽ More

    Submitted 9 August, 2022; v1 submitted 9 March, 2021; originally announced March 2021.

    Comments: Accepted by ACM Computing Surveys

  28. arXiv:2102.12007  [pdf, other

    cs.SE

    Practitioners' Perceptions of the Goals and Visual Explanations of Defect Prediction Models

    Authors: Jirayus Jiarpakdee, Chakkrit Tantithamthavorn, John Grundy

    Abstract: Software defect prediction models are classifiers that are constructed from historical software data. Such software defect prediction models have been proposed to help developers optimize the limited Software Quality Assurance (SQA) resources and help managers develop SQA plans. Prior studies have different goals for their defect prediction models and use different techniques for generating visual… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

    Comments: Accepted for publication at the International Conference on Mining Software Repositories (MSR'21) (10 pages + 2 references)

  29. arXiv:2102.09687  [pdf, other

    cs.SE cs.LG

    SQAPlanner: Generating Data-Informed Software Quality Improvement Plans

    Authors: Dilini Rajapaksha, Chakkrit Tantithamthavorn, Jirayus Jiarpakdee, Christoph Bergmeir, John Grundy, Wray Buntine

    Abstract: Software Quality Assurance (SQA) planning aims to define proactive plans, such as defining maximum file size, to prevent the occurrence of software defects in future releases. To aid this, defect prediction models have been proposed to generate insights as the most important factors that are associated with software quality. Such insights that are derived from traditional defect models are far fro… ▽ More

    Submitted 27 March, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

    Comments: This work has been Accepted by the IEEE Transactions on Software Engineering. Copyright may be transferred without notice, after which this version may no longer be accessible 24 pages

  30. arXiv:2101.04837  [pdf, other

    cs.SE

    Assessing the Students' Understanding and their Mistakes in Code Review Checklists -- An Experience Report of 1,791 Code Review Checklist Questions from 394 Students

    Authors: Chun Yong Chong, Patanamon Thongtanunam, Chakkrit Tantithamthavorn

    Abstract: Code review is a widely-used practice in software development companies to identify defects. Hence, code review has been included in many software engineering curricula at universities worldwide. However, teaching code review is still a challenging task because the code review effectiveness depends on the code reading and analytical skills of a reviewer. While several studies have investigated the… ▽ More

    Submitted 12 January, 2021; originally announced January 2021.

    Comments: 10 pages, accepted at the International Conference on Software Engineering: Joint Track on Software Engineering Education and Training Track (ICSE'21-JSEET)

  31. arXiv:2012.01614  [pdf, other

    cs.SE cs.AI cs.CY

    Explainable AI for Software Engineering

    Authors: Chakkrit Tantithamthavorn, Jirayus Jiarpakdee, John Grundy

    Abstract: Artificial Intelligence/Machine Learning techniques have been widely used in software engineering to improve developer productivity, the quality of software systems, and decision-making. However, such AI/ML models for software engineering are still impractical, not explainable, and not actionable. These concerns often hinder the adoption of AI/ML models in software engineering practices. In this a… ▽ More

    Submitted 2 December, 2020; originally announced December 2020.

    Comments: Under Review at IEEE Computer Magazine

  32. Predicting Defective Lines Using a Model-Agnostic Technique

    Authors: Supatsara Wattanakriengkrai, Patanamon Thongtanunam, Chakkrit Tantithamthavorn, Hideaki Hata, Kenichi Matsumoto

    Abstract: Defect prediction models are proposed to help a team prioritize source code areas files that need Software QualityAssurance (SQA) based on the likelihood of having defects. However, developers may waste their unnecessary effort on the whole filewhile only a small fraction of its source code lines are defective. Indeed, we find that as little as 1%-3% of lines of a file are defective. Hence, in thi… ▽ More

    Submitted 8 September, 2020; originally announced September 2020.

  33. arXiv:1806.09791  [pdf, other

    cs.SE cs.LG

    AutoSpearman: Automatically Mitigating Correlated Metrics for Interpreting Defect Models

    Authors: Jirayus Jiarpakdee, Chakkrit Tantithamthavorn, Christoph Treude

    Abstract: The interpretation of defect models heavily relies on software metrics that are used to construct them. However, such software metrics are often correlated to defect models. Prior work often uses feature selection techniques to remove correlated metrics in order to improve the performance of defect models. Yet, the interpretation of defect models may be misleading if feature selection techniques p… ▽ More

    Submitted 26 June, 2018; originally announced June 2018.

    Comments: Accepted for publication at the International Conference on Software Maintenance and Evolution (ICSME 2018)

  34. The Impact of IR-based Classifier Configuration on the Performance and the Effort of Method-Level Bug Localization

    Authors: Chakkrit Tantithamthavorn, Surafel Lemma Abebe, Ahmed E. Hassan, Akinori Ihara, Kenichi Matsumoto

    Abstract: Context: IR-based bug localization is a classifier that assists developers in locating buggy source code entities (e.g., files and methods) based on the content of a bug report. Such IR-based classifiers have various parameters that can be configured differently (e.g., the choice of entity representation). Objective: In this paper, we investigate the impact of the choice of the IR-based classifier… ▽ More

    Submitted 20 June, 2018; originally announced June 2018.

    Comments: Accepted at Journal of Information and Software Technology (IST)

  35. arXiv:1801.10271  [pdf, other

    cs.SE

    The Impact of Correlated Metrics on Defect Models

    Authors: Jirayus Jiarpakdee, Chakkrit Tantithamthavorn, Ahmed E. Hassan

    Abstract: Defect models are analytical models that are used to build empirical theories that are related to software quality. Prior studies often derive knowledge from such models using interpretation techniques, such as ANOVA Type-I. Recent work raises concerns that prior studies rarely remove correlated metrics when constructing such models. Such correlated metrics may impact the interpretation of models.… ▽ More

    Submitted 30 January, 2018; originally announced January 2018.

    Comments: 16 pages, under review at a software engineering journal

  36. arXiv:1801.10270  [pdf, other

    cs.SE

    The Impact of Automated Parameter Optimization on Defect Prediction Models

    Authors: Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, Kenichi Matsumoto

    Abstract: Defect prediction models---classifiers that identify defect-prone software modules---have configurable parameters that control their characteristics (e.g., the number of trees in a random forest). Recent studies show that these classifiers underperform when default settings are used. In this paper, we study the impact of automated parameter optimization on defect prediction models. Through a case… ▽ More

    Submitted 30 January, 2018; originally announced January 2018.

    Comments: 32 pages, accepted at IEEE Transactions on Software Engineering

  37. arXiv:1801.10269  [pdf, other

    cs.SE

    The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models

    Authors: Chakkrit Tantithamthavorn, Ahmed E. Hassan, Kenichi Matsumoto

    Abstract: Defect prediction models that are trained on class imbalanced datasets (i.e., the proportion of defective and clean modules is not equally represented) are highly susceptible to produce inaccurate prediction models. Prior research compares the impact of class rebalancing techniques on the performance of defect prediction models. Prior research efforts arrive at contradictory conclusions due to the… ▽ More

    Submitted 30 January, 2018; originally announced January 2018.

    Comments: 20 pages, under review at a software engineering journal