VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models

Liu, Yu; Gao, Lang; Yang, Mingxin; Xie, Yu; Chen, **; Zhang, Xiao**; Chen, Wei

Computer Science > Cryptography and Security

arXiv:2406.07595v3 (cs)

[Submitted on 11 Jun 2024 (v1), last revised 24 Jun 2024 (this version, v3)]

Title:VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models

Authors:Yu Liu, Lang Gao, Mingxin Yang, Yu Xie, ** Chen, Xiao** Zhang, Wei Chen

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) have training corpora containing large amounts of program code, greatly improving the model's code comprehension and generation capabilities. However, sound comprehensive research on detecting program vulnerabilities, a more specific task related to code, and evaluating the performance of LLMs in this more specialized scenario is still lacking. To address common challenges in vulnerability analysis, our study introduces a new benchmark, VulDetectBench, specifically designed to assess the vulnerability detection capabilities of LLMs. The benchmark comprehensively evaluates LLM's ability to identify, classify, and locate vulnerabilities through five tasks of increasing difficulty. We evaluate the performance of 17 models (both open- and closed-source) and find that while existing models can achieve over 80% accuracy on tasks related to vulnerability identification and classification, they still fall short on specific, more detailed vulnerability analysis tasks, with less than 30% accuracy, making it difficult to provide valuable auxiliary information for professional vulnerability mining. Our benchmark effectively evaluates the capabilities of various LLMs at different levels in the specific task of vulnerability detection, providing a foundation for future research and improvements in this critical area of code security. VulDetectBench is publicly available at this https URL.

Subjects:	Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Cite as:	arXiv:2406.07595 [cs.CR]
	(or arXiv:2406.07595v3 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2406.07595

Submission history

From: Yu Liu [view email]
[v1] Tue, 11 Jun 2024 13:42:57 UTC (1,791 KB)
[v2] Fri, 14 Jun 2024 04:36:42 UTC (1,791 KB)
[v3] Mon, 24 Jun 2024 09:02:57 UTC (2,283 KB)

Computer Science > Cryptography and Security

Title:VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators