SparseSwin: Swin Transformer with Sparse Transformer Block

Pinasthika, Krisna; Laksono, Blessius Sheldo Putra; Irsal, Riyandi Banovbi Putera; Shabiyya, Syifa Hukma; Yudistira, Novanto

doi:10.1016/j.neucom.2024.127433

Computer Science > Computer Vision and Pattern Recognition

arXiv:2309.05224 (cs)

[Submitted on 11 Sep 2023]

Title:SparseSwin: Swin Transformer with Sparse Transformer Block

Authors:Krisna Pinasthika, Blessius Sheldo Putra Laksono, Riyandi Banovbi Putera Irsal, Syifa Hukma Shabiyya, Novanto Yudistira

View PDF

Abstract:Advancements in computer vision research have put transformer architecture as the state of the art in computer vision tasks. One of the known drawbacks of the transformer architecture is the high number of parameters, this can lead to a more complex and inefficient algorithm. This paper aims to reduce the number of parameters and in turn, made the transformer more efficient. We present Sparse Transformer (SparTa) Block, a modified transformer block with an addition of a sparse token converter that reduces the number of tokens used. We use the SparTa Block inside the Swin T architecture (SparseSwin) to leverage Swin capability to downsample its input and reduce the number of initial tokens to be calculated. The proposed SparseSwin model outperforms other state of the art models in image classification with an accuracy of 86.96%, 97.43%, and 85.35% on the ImageNet100, CIFAR10, and CIFAR100 datasets respectively. Despite its fewer parameters, the result highlights the potential of a transformer architecture using a sparse token converter with a limited number of tokens to optimize the use of the transformer and improve its performance.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2309.05224 [cs.CV]
	(or arXiv:2309.05224v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2309.05224
Journal reference:	Neurocomputing, Volume 580, 2024, 127433
Related DOI:	https://doi.org/10.1016/j.neucom.2024.127433

Submission history

From: Krisna Pinasthika [view email]
[v1] Mon, 11 Sep 2023 04:03:43 UTC (560 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:SparseSwin: Swin Transformer with Sparse Transformer Block

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:SparseSwin: Swin Transformer with Sparse Transformer Block

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators