ResViT: Residual vision transformers for multi-modal medical image synthesis

Dalmaz, Onat; Yurt, Mahmut; Çukur, Tolga

doi:10.1109/TMI.2022.3167808

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:2106.16031 (eess)

[Submitted on 30 Jun 2021 (v1), last revised 6 Mar 2022 (this version, v3)]

Title:ResViT: Residual vision transformers for multi-modal medical image synthesis

Authors:Onat Dalmaz, Mahmut Yurt, Tolga Çukur

View PDF

Abstract:Generative adversarial models with convolutional neural network (CNN) backbones have recently been established as state-of-the-art in numerous medical image synthesis tasks. However, CNNs are designed to perform local processing with compact filters, and this inductive bias compromises learning of contextual features. Here, we propose a novel generative adversarial approach for medical image synthesis, ResViT, that leverages the contextual sensitivity of vision transformers along with the precision of convolution operators and realism of adversarial learning.} ResViT's generator employs a central bottleneck comprising novel aggregated residual transformer (ART) blocks that synergistically combine residual convolutional and transformer modules. Residual connections in ART blocks promote diversity in captured representations, while a channel compression module distills task-relevant information. A weight sharing strategy is introduced among ART blocks to mitigate computational burden. A unified implementation is introduced to avoid the need to rebuild separate synthesis models for varying source-target modality configurations. Comprehensive demonstrations are performed for synthesizing missing sequences in multi-contrast MRI, and CT images from MRI. Our results indicate superiority of ResViT against competing CNN- and transformer-based methods in terms of qualitative observations and quantitative metrics.

Subjects:	Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2106.16031 [eess.IV]
	(or arXiv:2106.16031v3 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.2106.16031
Related DOI:	https://doi.org/10.1109/TMI.2022.3167808

Submission history

From: Onat Dalmaz [view email]
[v1] Wed, 30 Jun 2021 12:57:37 UTC (15,431 KB)
[v2] Mon, 11 Oct 2021 18:08:05 UTC (23,613 KB)
[v3] Sun, 6 Mar 2022 11:07:38 UTC (34,389 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:ResViT: Residual vision transformers for multi-modal medical image synthesis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:ResViT: Residual vision transformers for multi-modal medical image synthesis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators