MulT: An End-to-End Multitask Learning Transformer

Bhattacharjee, Deblina; Zhang, Tong; Süsstrunk, Sabine; Salzmann, Mathieu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2205.08303 (cs)

[Submitted on 17 May 2022]

Title:MulT: An End-to-End Multitask Learning Transformer

Authors:Deblina Bhattacharjee, Tong Zhang, Sabine Süsstrunk, Mathieu Salzmann

View PDF

Abstract:We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks, including depth estimation, semantic segmentation, reshading, surface normal estimation, 2D keypoint detection, and edge detection. Based on the Swin transformer model, our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads. At the heart of our approach is a shared attention mechanism modeling the dependencies across the tasks. We evaluate our model on several multitask benchmarks, showing that our MulT framework outperforms both the state-of-the art multitask convolutional neural network models and all the respective single task transformer models. Our experiments further highlight the benefits of sharing attention across all the tasks, and demonstrate that our MulT model is robust and generalizes well to new domains. Our project website is at this https URL.

Comments:	Accepted to CVPR 2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2205.08303 [cs.CV]
	(or arXiv:2205.08303v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2205.08303

Submission history

From: Deblina Bhattacharjee [view email]
[v1] Tue, 17 May 2022 13:03:18 UTC (18,680 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MulT: An End-to-End Multitask Learning Transformer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MulT: An End-to-End Multitask Learning Transformer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators