CICA: Content-Injected Contrastive Alignment for Zero-Shot Document Image Classification

Sinha, Sankalp; Khan, Muhammad Saif Ullah; Sheikh, Talha Uddin; Stricker, Didier; Afzal, Muhammad Zeshan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.03660 (cs)

[Submitted on 6 May 2024]

Title:CICA: Content-Injected Contrastive Alignment for Zero-Shot Document Image Classification

Authors:Sankalp Sinha, Muhammad Saif Ullah Khan, Talha Uddin Sheikh, Didier Stricker, Muhammad Zeshan Afzal

View PDF HTML (experimental)

Abstract:Zero-shot learning has been extensively investigated in the broader field of visual recognition, attracting significant interest recently. However, the current work on zero-shot learning in document image classification remains scarce. The existing studies either focus exclusively on zero-shot inference, or their evaluation does not align with the established criteria of zero-shot evaluation in the visual recognition domain. We provide a comprehensive document image classification analysis in Zero-Shot Learning (ZSL) and Generalized Zero-Shot Learning (GZSL) settings to address this gap. Our methodology and evaluation align with the established practices of this domain. Additionally, we propose zero-shot splits for the RVL-CDIP dataset. Furthermore, we introduce CICA (pronounced 'ki-ka'), a framework that enhances the zero-shot learning capabilities of CLIP. CICA consists of a novel 'content module' designed to leverage any generic document-related textual information. The discriminative features extracted by this module are aligned with CLIP's text and image features using a novel 'coupled-contrastive' loss. Our module improves CLIP's ZSL top-1 accuracy by 6.7% and GZSL harmonic mean by 24% on the RVL-CDIP dataset. Our module is lightweight and adds only 3.3% more parameters to CLIP. Our work sets the direction for future research in zero-shot document classification.

Comments:	18 Pages, 4 Figures and Accepted in ICDAR 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2405.03660 [cs.CV]
	(or arXiv:2405.03660v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.03660

Submission history

From: Sankalp Sinha [view email]
[v1] Mon, 6 May 2024 17:37:23 UTC (1,863 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:CICA: Content-Injected Contrastive Alignment for Zero-Shot Document Image Classification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:CICA: Content-Injected Contrastive Alignment for Zero-Shot Document Image Classification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators