OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models

Wang, Shuai; Ding, Liang; Shen, Li; Luo, Yong; Du, Bo; Tao, Dacheng

Computer Science > Computation and Language

arXiv:2401.06628 (cs)

[Submitted on 12 Jan 2024 (v1), last revised 21 Feb 2024 (this version, v2)]

Title:OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models

Authors:Shuai Wang, Liang Ding, Li Shen, Yong Luo, Bo Du, Dacheng Tao

View PDF HTML (experimental)

Abstract:Advancing automated programming necessitates robust and comprehensive code generation benchmarks, yet current evaluation frameworks largely neglect object-oriented programming (OOP) in favor of functional programming (FP), e.g., HumanEval and MBPP. To address this, our study introduces a pioneering OOP-focused benchmark, featuring 431 Python programs that encompass essential OOP concepts and features like classes and encapsulation methods. We propose a novel evaluation metric, pass@o, tailored for OOP, enhancing traditional pass@k measures. Our evaluation of 23 leading large language models (LLMs), including both general and code-specialized models, reveals three key insights: 1) pass@o offers a more relevant and comprehensive assessment for OOP code generation; 2) Despite excelling in FP, code-specialized LLMs like WizardCoder lag in OOP compared to models like ChatGPT; 3) The poor performance of all advanced LLMs on our OOP benchmark highlights a critical need for improvements in this field. Our benchmark and scripts are publicly released at: this https URL.

Comments:	20 pages, 15 figures
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2401.06628 [cs.CL]
	(or arXiv:2401.06628v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2401.06628

Submission history

From: Shuai Wang [view email]
[v1] Fri, 12 Jan 2024 15:21:36 UTC (799 KB)
[v2] Wed, 21 Feb 2024 06:18:16 UTC (8,791 KB)

Computer Science > Computation and Language

Title:OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators