LT-ViT: A Vision Transformer for multi-label Chest X-ray classification

Marikkar, Umar; Atito, Sara; Awais, Muhammad; Mahdi, Adam

doi:10.1109/ICIP49359.2023.10222175

Computer Science > Computer Vision and Pattern Recognition

arXiv:2311.07263 (cs)

[Submitted on 13 Nov 2023]

Title:LT-ViT: A Vision Transformer for multi-label Chest X-ray classification

Authors:Umar Marikkar, Sara Atito, Muhammad Awais, Adam Mahdi

View PDF

Abstract:Vision Transformers (ViTs) are widely adopted in medical imaging tasks, and some existing efforts have been directed towards vision-language training for Chest X-rays (CXRs). However, we envision that there still exists a potential for improvement in vision-only training for CXRs using ViTs, by aggregating information from multiple scales, which has been proven beneficial for non-transformer networks. Hence, we have developed LT-ViT, a transformer that utilizes combined attention between image tokens and randomly initialized auxiliary tokens that represent labels. Our experiments demonstrate that LT-ViT (1) surpasses the state-of-the-art performance using pure ViTs on two publicly available CXR datasets, (2) is generalizable to other pre-training methods and therefore is agnostic to model initialization, and (3) enables model interpretability without grad-cam and its variants.

Comments:	5 pages, 2 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2311.07263 [cs.CV]
	(or arXiv:2311.07263v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2311.07263
Related DOI:	https://doi.org/10.1109/ICIP49359.2023.10222175

Submission history

From: Umar Marikkar [view email]
[v1] Mon, 13 Nov 2023 12:02:46 UTC (2,638 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:LT-ViT: A Vision Transformer for multi-label Chest X-ray classification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:LT-ViT: A Vision Transformer for multi-label Chest X-ray classification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators