Efficient Distributed Vision Transformer Foundation Model for Medical Imaging through Random Masked Sampling

Park, Sangjoon; Lee, Ik-Jae; Kim, Jun Won; Ye, Jong Chul

Computer Science > Computer Vision and Pattern Recognition

arXiv:2301.02064v2 (cs)

[Submitted on 5 Jan 2023 (v1), revised 5 Feb 2023 (this version, v2), latest version 15 Apr 2023 (v3)]

Title:Efficient Distributed Vision Transformer Foundation Model for Medical Imaging through Random Masked Sampling

Authors:Sangjoon Park, Ik-Jae Lee, Jun Won Kim, Jong Chul Ye

View PDF

Abstract:In spite of the recent success of deep learning in the medical domain, the problem of data scarcity in the medical domain gets aggravated due to privacy and data ownership issues. Distributed learning approaches including federated learning have been studied to alleviate the problems, but they suffer from cumbersome communication overheads and weakness in privacy protection. To address this, here we propose a self-supervised masked sampling distillation method for vision transformer that can be performed without continuous communication but still enhance privacy using a vision transformer-specific encryption method. The effectiveness of our method is demonstrated with extensive experiments on two medical domain data and two different downstream tasks, showing superior performances than those obtained with the existing distributed learning strategy as well as the fine-tuning only baseline. As the self-supervised model built with the proposed method is capable of having a general semantic understanding of the modality, we demonstrate its potential as a task-agnostic foundation model for various medical tasks, widening the applicability in the medical domain.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2301.02064 [cs.CV]
	(or arXiv:2301.02064v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2301.02064

Submission history

From: Jong Chul Ye [view email]
[v1] Thu, 5 Jan 2023 13:47:36 UTC (9,336 KB)
[v2] Sun, 5 Feb 2023 15:06:28 UTC (11,077 KB)
[v3] Sat, 15 Apr 2023 06:36:01 UTC (3,814 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Efficient Distributed Vision Transformer Foundation Model for Medical Imaging through Random Masked Sampling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Efficient Distributed Vision Transformer Foundation Model for Medical Imaging through Random Masked Sampling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators