VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation

Wang, Hao; Qin, Jiayou; Bastola, Ashish; Chen, Xiwen; Suchanek, John; Gong, Zihao; Razi, Abolfazl

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.12415 (cs)

[Submitted on 19 Mar 2024]

Title:VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation

Authors:Hao Wang, Jiayou Qin, Ashish Bastola, Xiwen Chen, John Suchanek, Zihao Gong, Abolfazl Razi

View PDF HTML (experimental)

Abstract:This paper explores the potential of Large Language Models(LLMs) in zero-shot anomaly detection for safe visual navigation. With the assistance of the state-of-the-art real-time open-world object detection model Yolo-World and specialized prompts, the proposed framework can identify anomalies within camera-captured frames that include any possible obstacles, then generate concise, audio-delivered descriptions emphasizing abnormalities, assist in safe visual navigation in complex circumstances. Moreover, our proposed framework leverages the advantages of LLMs and the open-vocabulary object detection model to achieve the dynamic scenario switch, which allows users to transition smoothly from scene to scene, which addresses the limitation of traditional visual navigation. Furthermore, this paper explored the performance contribution of different prompt components, provided the vision for future improvement in visual accessibility, and paved the way for LLMs in video anomaly detection and vision-language understanding.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2403.12415 [cs.CV]
	(or arXiv:2403.12415v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.12415

Submission history

From: Hao Wang [view email]
[v1] Tue, 19 Mar 2024 03:55:39 UTC (8,954 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators