AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models

Gu, Zhaopeng; Zhu, Bingke; Zhu, Guibo; Chen, Yingying; Tang, Ming; Wang, **qiao

Computer Science > Computer Vision and Pattern Recognition

arXiv:2308.15366v1 (cs)

[Submitted on 29 Aug 2023 (this version), latest version 28 Dec 2023 (v4)]

Title:AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models

Authors:Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Ming Tang, **qiao Wang

View PDF

Abstract:Large Vision-Language Models (LVLMs) such as MiniGPT-4 and LLaVA have demonstrated the capability of understanding images and achieved remarkable performance in various visual tasks. Despite their strong abilities in recognizing common objects due to extensive training datasets, they lack specific domain knowledge and have a weaker understanding of localized details within objects, which hinders their effectiveness in the Industrial Anomaly Detection (IAD) task. On the other hand, most existing IAD methods only provide anomaly scores and necessitate the manual setting of thresholds to distinguish between normal and abnormal samples, which restricts their practical implementation. In this paper, we explore the utilization of LVLM to address the IAD problem and propose AnomalyGPT, a novel IAD approach based on LVLM. We generate training data by simulating anomalous images and producing corresponding textual descriptions for each image. We also employ an image decoder to provide fine-grained semantic and design a prompt learner to fine-tune the LVLM using prompt embeddings. Our AnomalyGPT eliminates the need for manual threshold adjustments, thus directly assesses the presence and locations of anomalies. Additionally, AnomalyGPT supports multi-turn dialogues and exhibits impressive few-shot in-context learning capabilities. With only one normal shot, AnomalyGPT achieves the state-of-the-art performance with an accuracy of 86.1%, an image-level AUC of 94.1%, and a pixel-level AUC of 95.3% on the MVTec-AD dataset. Code is available at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2308.15366 [cs.CV]
	(or arXiv:2308.15366v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2308.15366

Submission history

From: Zhaopeng Gu [view email]
[v1] Tue, 29 Aug 2023 15:02:53 UTC (13,527 KB)
[v2] Mon, 4 Sep 2023 11:44:48 UTC (13,527 KB)
[v3] Wed, 13 Sep 2023 14:58:14 UTC (13,527 KB)
[v4] Thu, 28 Dec 2023 08:22:14 UTC (13,527 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators