Anomaly Detection of Tabular Data Using LLMs

Li, Aodong; Zhao, Yunhan; Qiu, Chen; Kloft, Marius; Smyth, Padhraic; Rudolph, Maja; Mandt, Stephan

Computer Science > Machine Learning

arXiv:2406.16308 (cs)

[Submitted on 24 Jun 2024]

Title:Anomaly Detection of Tabular Data Using LLMs

Authors:Aodong Li, Yunhan Zhao, Chen Qiu, Marius Kloft, Padhraic Smyth, Maja Rudolph, Stephan Mandt

View PDF HTML (experimental)

Abstract:Large language models (LLMs) have shown their potential in long-context understanding and mathematical reasoning. In this paper, we study the problem of using LLMs to detect tabular anomalies and show that pre-trained LLMs are zero-shot batch-level anomaly detectors. That is, without extra distribution-specific model fitting, they can discover hidden outliers in a batch of data, demonstrating their ability to identify low-density data regions. For LLMs that are not well aligned with anomaly detection and frequently output factual errors, we apply simple yet effective data-generating processes to simulate synthetic batch-level anomaly detection datasets and propose an end-to-end fine-tuning strategy to bring out the potential of LLMs in detecting real anomalies. Experiments on a large anomaly detection benchmark (ODDS) showcase i) GPT-4 has on-par performance with the state-of-the-art transductive learning-based anomaly detection methods and ii) the efficacy of our synthetic dataset and fine-tuning strategy in aligning LLMs to this task.

Comments:	accepted at the Anomaly Detection with Foundation Models workshop
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2406.16308 [cs.LG]
	(or arXiv:2406.16308v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.16308

Submission history

From: Aodong Li [view email]
[v1] Mon, 24 Jun 2024 04:17:03 UTC (928 KB)

Computer Science > Machine Learning

Title:Anomaly Detection of Tabular Data Using LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Anomaly Detection of Tabular Data Using LLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators