Skip to main content

Showing 1–3 of 3 results for author: Min, M J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.01006  [pdf, other

    cs.CL cs.AI cs.SE

    SemCoder: Training Code Language Models with Comprehensive Semantics

    Authors: Yangruibo Ding, **jun Peng, Marcus J. Min, Gail Kaiser, Junfeng Yang, Baishakhi Ray

    Abstract: Code Large Language Models (Code LLMs) have excelled at tasks like code completion but often miss deeper semantics such as execution effects and dynamic states. This paper aims to bridge the gap between Code LLMs' reliance on static text data and the need for thorough semantic understanding for complex tasks like debugging and program repair. We introduce a novel strategy to train Code LLMs with c… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  2. arXiv:2403.18746  [pdf, other

    cs.SE cs.CL

    CYCLE: Learning to Self-Refine the Code Generation

    Authors: Yangruibo Ding, Marcus J. Min, Gail Kaiser, Baishakhi Ray

    Abstract: Pre-trained code language models have achieved promising performance in code generation and improved the programming efficiency of human developers. However, their self-refinement capability is typically overlooked by the existing evaluations of code LMs, which focus only on the accuracy of the one-time prediction. For the cases when code LMs fail to implement the correct program, developers actua… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Camera-ready for OOPSLA'24

  3. arXiv:2310.14053  [pdf, other

    cs.LG cs.CL cs.SE

    Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain

    Authors: Marcus J. Min, Yangruibo Ding, Luca Buratti, Saurabh Pujar, Gail Kaiser, Suman Jana, Baishakhi Ray

    Abstract: Code Large Language Models (Code LLMs) are being increasingly employed in real-life applications, so evaluating them is critical. While the conventional accuracy evaluates the performance of Code LLMs on a set of individual tasks, their self-consistency across different tasks is overlooked. Intuitively, a trustworthy model should be self-consistent when generating natural language specifications f… ▽ More

    Submitted 26 February, 2024; v1 submitted 21 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

    MSC Class: 68 ACM Class: I.2; D.2