Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers

Pan, Haowen; Cao, Yixin; Wang, Xiaozhi; Yang, Xun; Wang, Meng

Computer Science > Computation and Language

arXiv:2311.07470 (cs)

[Submitted on 13 Nov 2023 (v1), last revised 11 Jun 2024 (this version, v2)]

Title:Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers

Authors:Haowen Pan, Yixin Cao, Xiaozhi Wang, Xun Yang, Meng Wang

View PDF HTML (experimental)

Abstract:Understanding the internal mechanisms by which multi-modal large language models (LLMs) interpret different modalities and integrate cross-modal representations is becoming increasingly critical for continuous improvements in both academia and industry. In this paper, we propose a novel method to identify key neurons for interpretability -- how multi-modal LLMs bridge visual and textual concepts for captioning. Our method improves conventional works upon efficiency and applied range by removing needs of costly gradient computation. Based on those identified neurons, we further design a multi-modal knowledge editing method, beneficial to mitigate sensitive words or hallucination. For rationale of our design, we provide theoretical assumption. For empirical evaluation, we have conducted extensive quantitative and qualitative experiments. The results not only validate the effectiveness of our methods, but also offer insightful findings that highlight three key properties of multi-modal neurons: sensitivity, specificity and causal-effect, to shed light for future research.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2311.07470 [cs.CL]
	(or arXiv:2311.07470v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2311.07470

Submission history

From: Haowen Pan [view email]
[v1] Mon, 13 Nov 2023 17:03:02 UTC (7,365 KB)
[v2] Tue, 11 Jun 2024 12:30:02 UTC (12,765 KB)

Computer Science > Computation and Language

Title:Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators