Skip to main content

Showing 1–26 of 26 results for author: Zhang, J M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09843  [pdf, other

    cs.SE

    An Exploratory Study on Using Large Language Models for Mutation Testing

    Authors: Bo Wang, Mingda Chen, Youfang Lin, Mike Papadakis, Jie M. Zhang

    Abstract: The question of how to generate high-utility mutations, to be used for testing purposes, forms a key challenge in mutation testing literature. %Existing approaches rely either on human-specified syntactic rules or learning-based approaches, all of which produce large numbers of redundant mutants. Large Language Models (LLMs) have shown great potential in code-related tasks but their utility in mut… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 13 pages, 3 figures

    ACM Class: D.2.5

  2. arXiv:2405.15189  [pdf, other

    cs.SE cs.CL

    SOAP: Enhancing Efficiency of Generated Code via Self-Optimization

    Authors: Dong Huang, Jianbo Dai, Han Weng, Puzhen Wu, Yuhao Qing, Jie M. Zhang, Heming Cui, Zhijiang Guo

    Abstract: Large language models (LLMs) have shown remarkable progress in code generation, but their generated code often suffers from inefficiency, resulting in longer execution times and higher memory consumption. To address this issue, we propose Self Optimization based on OverheAd Profile (SOAP), a self-optimization framework that utilizes execution overhead profiles to improve the efficiency of LLM-gene… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 31 pages, 18 figures, and 8 tables

  3. arXiv:2404.10304  [pdf, other

    cs.SE cs.LG

    LLM-Powered Test Case Generation for Detecting Tricky Bugs

    Authors: Kaibo Liu, Yiyang Liu, Zhenpeng Chen, Jie M. Zhang, Yudong Han, Yun Ma, Ge Li, Gang Huang

    Abstract: Conventional automated test generation tools struggle to generate test oracles and tricky bug-revealing test inputs. Large Language Models (LLMs) can be prompted to produce test inputs and oracles for a program directly, but the precision of the tests can be very low for complex scenarios (only 6.3% based on our experiments). To fill this gap, this paper proposes AID, which combines LLMs with diff… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  4. arXiv:2404.06852  [pdf, other

    cs.SE

    Research Artifacts in Software Engineering Publications: Status and Trends

    Authors: Mugeng Liu, Xiaolong Huang, Wei He, Yibing Xie, Jie M. Zhang, Xiang **g, Zhenpeng Chen, Yun Ma

    Abstract: The Software Engineering (SE) community has been embracing the open science policy and encouraging researchers to disclose artifacts in their publications. However, the status and trends of artifact practice and quality remain unclear, lacking insights on further improvement. In this paper, we present an empirical study to characterize the research artifacts in SE publications. Specifically, we ma… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: Accepted by Journal of Systems and Software (JSS 2024). Please include JSS in any citations

  5. arXiv:2402.02037  [pdf, other

    cs.SE cs.CL

    EffiBench: Benchmarking the Efficiency of Automatically Generated Code

    Authors: Dong Huang, Yuhao Qing, Weiyi Shang, Heming Cui, Jie M. Zhang

    Abstract: Code generation models have increasingly become integral to aiding software development. Although current research has thoroughly examined the correctness of the code produced by code generation models, a vital aspect that plays a pivotal role in green computing and sustainability efforts has often been neglected. This paper presents EffiBench, a benchmark with 1,000 efficiency-critical coding pro… ▽ More

    Submitted 7 June, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Comments: 30 pages, 7 figures

  6. arXiv:2312.13010  [pdf, other

    cs.CL

    AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

    Authors: Dong Huang, Jie M. Zhang, Michael Luck, Qingwen Bu, Yuhao Qing, Heming Cui

    Abstract: The advancement of natural language processing (NLP) has been significantly boosted by the development of transformer-based large language models (LLMs). These models have revolutionized NLP tasks, particularly in code generation, aiding developers in creating software with enhanced efficiency. Despite their advancements, challenges in balancing code snippet generation with effective test case gen… ▽ More

    Submitted 24 May, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: 24 pages, 12 figures

  7. arXiv:2310.16253  [pdf, other

    cs.SE cs.AI

    ConDefects: A New Dataset to Address the Data Leakage Concern for LLM-based Fault Localization and Program Repair

    Authors: Yonghao Wu, Zheng Li, Jie M. Zhang, Yong Liu

    Abstract: With the growing interest on Large Language Models (LLMs) for fault localization and program repair, ensuring the integrity and generalizability of the LLM-based methods becomes paramount. The code in existing widely-adopted benchmarks for these tasks was written before the the bloom of LLMs and may be included in the training data of existing popular LLMs, thereby suffering from the threat of dat… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: 5pages, 3 figures

  8. arXiv:2310.03533  [pdf, other

    cs.SE

    Large Language Models for Software Engineering: Survey and Open Problems

    Authors: Angela Fan, Beliz Gokkaya, Mark Harman, Mitya Lyubarskiy, Shubho Sengupta, Shin Yoo, Jie M. Zhang

    Abstract: This paper provides a survey of the emerging area of Large Language Models (LLMs) for Software Engineering (SE). It also sets out open research challenges for the application of LLMs to technical problems faced by software engineers. LLMs' emergent properties bring novelty and creativity with applications right across the spectrum of Software Engineering activities including coding, design, requir… ▽ More

    Submitted 11 November, 2023; v1 submitted 5 October, 2023; originally announced October 2023.

  9. arXiv:2308.15276  [pdf, other

    cs.SE

    Large Language Models in Fault Localisation

    Authors: Yonghao Wu, Zheng Li, Jie M. Zhang, Mike Papadakis, Mark Harman, Yong Liu

    Abstract: Large Language Models (LLMs) have shown promise in multiple software engineering tasks including code generation, program repair, code summarisation, and test generation. Fault localisation is instrumental in enabling automated debugging and repair of programs and was prominently featured as a highlight during the launch event of ChatGPT-4. Nevertheless, the performance of LLMs compared to state-o… ▽ More

    Submitted 2 October, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

  10. arXiv:2308.13319  [pdf, other

    cs.SE

    COCO: Testing Code Generation Systems via Concretized Instructions

    Authors: Ming Yan, Junjie Chen, Jie M. Zhang, Xuejie Cao, Chen Yang, Mark Harman

    Abstract: Code generation systems have been extensively developed in recent years to generate source code based on natural language instructions. However, despite their advancements, these systems still face robustness issues where even slightly different instructions can result in significantly different code semantics. Robustness is critical for code generation systems, as it can have significant impacts… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

  11. arXiv:2308.02935  [pdf, other

    cs.CY cs.AI cs.CV cs.SE

    Bias Behind the Wheel: Fairness Analysis of Autonomous Driving Systems

    Authors: Xinyue Li, Zhenpeng Chen, Jie M. Zhang, Federica Sarro, Ying Zhang, Xuanzhe Liu

    Abstract: This paper analyzes fairness in automated pedestrian detection, a crucial but under-explored issue in autonomous driving systems. We evaluate eight state-of-the-art deep learning-based pedestrian detectors across demographic groups on large-scale real-world datasets. To enable thorough fairness testing, we provide extensive annotations for the datasets, resulting in 8,311 images with 16,070 gender… ▽ More

    Submitted 4 April, 2024; v1 submitted 5 August, 2023; originally announced August 2023.

  12. arXiv:2308.02828  [pdf, other

    cs.SE

    LLM is Like a Box of Chocolates: the Non-determinism of ChatGPT in Code Generation

    Authors: Shuyin Ouyang, Jie M. Zhang, Mark Harman, Meng Wang

    Abstract: There has been a recent explosion of research on Large Language Models (LLMs) for software engineering tasks, in particular code generation. However, results from LLMs can be highly unstable; nondeterministically returning very different codes for the same prompt. Non-determinism is a potential menace to scientific conclusion validity. When non-determinism is high, scientific conclusions simply ca… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

  13. arXiv:2308.01923  [pdf, other

    cs.LG cs.AI cs.CY cs.SE

    Fairness Improvement with Multiple Protected Attributes: How Far Are We?

    Authors: Zhenpeng Chen, Jie M. Zhang, Federica Sarro, Mark Harman

    Abstract: Existing research mostly improves the fairness of Machine Learning (ML) software regarding a single protected attribute at a time, but this is unrealistic given that many users have multiple protected attributes. This paper conducts an extensive study of fairness improvement regarding multiple protected attributes, covering 11 state-of-the-art fairness improvement methods. We analyze the effective… ▽ More

    Submitted 4 April, 2024; v1 submitted 25 July, 2023; originally announced August 2023.

    Comments: Accepted by the 46th International Conference on Software Engineering (ICSE 2024). Please include ICSE in any citations

  14. arXiv:2302.04675  [pdf, other

    cs.SE

    Vulnerability Detection with Graph Simplification and Enhanced Graph Representation Learning

    Authors: Xin-Cheng Wen, Yupan Chen, Cuiyun Gao, Hongyu Zhang, Jie M. Zhang, Qing Liao

    Abstract: Prior studies have demonstrated the effectiveness of Deep Learning (DL) in automated software vulnerability detection. Graph Neural Networks (GNNs) have proven effective in learning the graph representations of source code and are commonly adopted by existing DL-based vulnerability detection methods. However, the existing methods are still limited by the fact that GNNs are essentially difficult to… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

    Comments: 13 pages, 8 figures, Accepted for publication in the ICSE 23 Technical Track

  15. arXiv:2301.02496  [pdf, other

    cs.CR cs.SE

    Stealthy Backdoor Attack for Code Models

    Authors: Zhou Yang, Bowen Xu, Jie M. Zhang, Hong ** Kang, Jieke Shi, Junda He, David Lo

    Abstract: Code models, such as CodeBERT and CodeT5, offer general-purpose representations of code and play a vital role in supporting downstream automated software engineering tasks. Most recently, code models were revealed to be vulnerable to backdoor attacks. A code model that is backdoor-attacked can behave normally on clean examples but will produce pre-defined malicious outputs on examples injected wit… ▽ More

    Submitted 28 August, 2023; v1 submitted 6 January, 2023; originally announced January 2023.

    Comments: 18 pages, Under review of IEEE Transactions on Software Engineering

  16. arXiv:2207.10223  [pdf, other

    cs.SE

    Fairness Testing: A Comprehensive Survey and Analysis of Trends

    Authors: Zhenpeng Chen, Jie M. Zhang, Max Hort, Mark Harman, Federica Sarro

    Abstract: Unfair behaviors of Machine Learning (ML) software have garnered increasing attention and concern among software engineers. To tackle this issue, extensive research has been dedicated to conducting fairness testing of ML software, and this paper offers a comprehensive survey of existing studies in this field. We collect 100 papers and organize them based on the testing workflow (i.e., how to test)… ▽ More

    Submitted 6 March, 2024; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted by ACM Transactions on Software Engineering and Methodology (TOSEM 2024). Please include TOSEM in any citations

  17. arXiv:2207.07068  [pdf, other

    cs.LG

    Bias Mitigation for Machine Learning Classifiers: A Comprehensive Survey

    Authors: Max Hort, Zhenpeng Chen, Jie M. Zhang, Mark Harman, Federica Sarro

    Abstract: This paper provides a comprehensive survey of bias mitigation methods for achieving fairness in Machine Learning (ML) models. We collect a total of 341 publications concerning bias mitigation for ML classifiers. These methods can be distinguished based on their intervention procedure (i.e., pre-processing, in-processing, post-processing) and the technique they apply. We investigate how existing bi… ▽ More

    Submitted 11 October, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: 52 pages, 7 figures

  18. arXiv:2207.03277  [pdf, other

    cs.SE cs.AI

    A Comprehensive Empirical Study of Bias Mitigation Methods for Machine Learning Classifiers

    Authors: Zhenpeng Chen, Jie M. Zhang, Federica Sarro, Mark Harman

    Abstract: Software bias is an increasingly important operational concern for software engineers. We present a large-scale, comprehensive empirical study of 17 representative bias mitigation methods for Machine Learning (ML) classifiers, evaluated with 11 ML performance metrics (e.g., accuracy), 4 fairness metrics, and 20 types of fairness-performance trade-off assessment, applied to 8 widely-adopted softwar… ▽ More

    Submitted 10 February, 2023; v1 submitted 7 July, 2022; originally announced July 2022.

    Comments: Accepted by ACM Transactions on Software Engineering and Methodology (TOSEM 2023). Please include TOSEM in any citations

  19. arXiv:2110.06773  [pdf, other

    cs.SE cs.CL cs.LG

    Leveraging Automated Unit Tests for Unsupervised Code Translation

    Authors: Baptiste Roziere, Jie M. Zhang, Francois Charton, Mark Harman, Gabriel Synnaeve, Guillaume Lample

    Abstract: With little to no parallel data available for programming languages, unsupervised methods are well-suited to source code translation. However, the majority of unsupervised machine translation approaches rely on back-translation, a method developed in the context of natural language translation and one that inherently involves training on noisy inputs. Unfortunately, source code is highly sensitive… ▽ More

    Submitted 16 February, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

  20. arXiv:1910.07428  [pdf, other

    cs.HC cs.CV

    Gaze Gestures and Their Applications in human-computer interaction with a head-mounted display

    Authors: W. X. Chen, X. Y. Cui, J. Zheng, J. M. Zhang, S. Chen, Y. D. Yao

    Abstract: A head-mounted display (HMD) is a portable and interactive display device. With the development of 5G technology, it may become a general-purpose computing platform in the future. Human-computer interaction (HCI) technology for HMDs has also been of significant interest in recent years. In addition to tracking gestures and speech, tracking human eyes as a means of interaction is highly effective.… ▽ More

    Submitted 16 October, 2019; originally announced October 2019.

  21. arXiv:1910.02688  [pdf, other

    cs.SE

    Automatic Testing and Improvement of Machine Translation

    Authors: Zeyu Sun, Jie M. Zhang, Mark Harman, Mike Papadakis, Lu Zhang

    Abstract: This paper presents TransRepair, a fully automatic approach for testing and repairing the consistency of machine translation systems. TransRepair combines mutation with metamorphic testing to detect inconsistency bugs (without access to human oracles). It then adopts probability-reference or cross-reference to post-process the translations, in a grey-box or black-box manner, to repair the inconsis… ▽ More

    Submitted 25 December, 2019; v1 submitted 7 October, 2019; originally announced October 2019.

  22. arXiv:1906.10742  [pdf, other

    cs.LG cs.AI cs.SE stat.ML

    Machine Learning Testing: Survey, Landscapes and Horizons

    Authors: Jie M. Zhang, Mark Harman, Lei Ma, Yang Liu

    Abstract: This paper provides a comprehensive survey of Machine Learning Testing (ML testing) research. It covers 144 papers on testing properties (e.g., correctness, robustness, and fairness), testing components (e.g., the data, learning program, and framework), testing workflow (e.g., test generation and test evaluation), and application scenarios (e.g., autonomous driving, machine translation). The paper… ▽ More

    Submitted 21 December, 2019; v1 submitted 19 June, 2019; originally announced June 2019.

  23. arXiv:1905.10201  [pdf, other

    cs.LG stat.ML

    Model Validation Using Mutated Training Labels: An Exploratory Study

    Authors: Jie M. Zhang, Mark Harman, Benjamin Guedj, Earl T. Barr, John Shawe-Taylor

    Abstract: We introduce an exploratory study on Mutation Validation (MV), a model validation method using mutated training labels for supervised learning. MV mutates training data labels, retrains the model against the mutated data, then uses the metamorphic relation that captures the consequent training performance changes to assess model fit. It does not use a validation set or test set. The intuition unde… ▽ More

    Submitted 20 October, 2021; v1 submitted 24 May, 2019; originally announced May 2019.

  24. arXiv:1811.07457  [pdf, other

    cs.LG stat.ML

    Generalizable Adversarial Training via Spectral Normalization

    Authors: Farzan Farnia, Jesse M. Zhang, David Tse

    Abstract: Deep neural networks (DNNs) have set benchmarks on a wide array of supervised learning tasks. Trained DNNs, however, often lack robustness to minor adversarial perturbations to the input, which undermines their true practicality. Recent works have increased the robustness of DNNs by fitting networks using adversarially-perturbed training samples, but the improved performance can still be far below… ▽ More

    Submitted 18 November, 2018; originally announced November 2018.

  25. arXiv:1801.01025  [pdf, other

    cs.SE

    A Study of Bug Resolution Characteristics in Popular Programming Languages

    Authors: Jie M. Zhang, Feng Li, Dan Hao, Meng Wang, Hao Tang, Lu Zhang, Mark Harman

    Abstract: This paper presents a large-scale study that investigates the bug resolution characteristics among popular Github projects written in different programming languages. We explore correlations but, of course, we cannot infer causation. Specifically, we analyse bug resolution data from approximately 70 million Source Line of Code, drawn from 3 million commits to 600 GitHub projects, primarily written… ▽ More

    Submitted 4 January, 2020; v1 submitted 3 January, 2018; originally announced January 2018.

    Journal ref: Transactions on Software Engineering 2020

  26. arXiv:1709.05871  [pdf

    cs.DC

    IBM Deep Learning Service

    Authors: Bishwaranjan Bhattacharjee, Scott Boag, Chandani Doshi, Parijat Dube, Ben Herta, Vatche Ishakian, K. R. Jayaram, Rania Khalaf, Avesh Krishna, Yu Bo Li, Vinod Muthusamy, Ruchir Puri, Yufei Ren, Florian Rosenberg, Seetharami R. Seelam, Yandong Wang, Jian Ming Zhang, Li Zhang

    Abstract: Deep learning driven by large neural network models is overtaking traditional machine learning methods for understanding unstructured and perceptual data domains such as speech, text, and vision. At the same time, the "as-a-Service"-based business model on the cloud is fundamentally transforming the information technology industry. These two trends: deep learning, and "as-a-service" are colliding… ▽ More

    Submitted 18 September, 2017; originally announced September 2017.