Exploring Limits of Diffusion-Synthetic Training with Weakly Supervised Semantic Segmentation

Yoshihashi, Ryota; Otsuka, Yuya; Doi, Kenji; Tanaka, Tomohiro; Kataoka, Hirokatsu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2309.01369 (cs)

[Submitted on 4 Sep 2023 (v1), last revised 15 Apr 2024 (this version, v2)]

Title:Exploring Limits of Diffusion-Synthetic Training with Weakly Supervised Semantic Segmentation

Authors:Ryota Yoshihashi, Yuya Otsuka, Kenji Doi, Tomohiro Tanaka, Hirokatsu Kataoka

View PDF HTML (experimental)

Abstract:The advance of generative models for images has inspired various training techniques for image recognition utilizing synthetic images. In semantic segmentation, one promising approach is extracting pseudo-masks from attention maps in text-to-image diffusion models, which enables real-image-and-annotation-free training. However, the pioneering training method using the diffusion-synthetic images and pseudo-masks, i.e., DiffuMask has limitations in terms of mask quality, scalability, and ranges of applicable domains. To overcome these limitations, this work introduces three techniques for diffusion-synthetic semantic segmentation training. First, reliability-aware robust training, originally used in weakly supervised learning, helps segmentation with insufficient synthetic mask quality. %Second, large-scale pretraining of whole segmentation models, not only backbones, on synthetic ImageNet-1k-class images with pixel-labels benefits downstream segmentation tasks. Second, we introduce prompt augmentation, data augmentation to the prompt text set to scale up and diversify training images with a limited text resources. Finally, LoRA-based adaptation of Stable Diffusion enables the transfer to a distant domain, e.g., auto-driving images. Experiments in PASCAL VOC, ImageNet-S, and Cityscapes show that our method effectively closes gap between real and synthetic training in semantic segmentation.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2309.01369 [cs.CV]
	(or arXiv:2309.01369v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2309.01369

Submission history

From: Ryota Yoshihashi [view email]
[v1] Mon, 4 Sep 2023 05:34:19 UTC (7,065 KB)
[v2] Mon, 15 Apr 2024 13:29:32 UTC (14,543 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Exploring Limits of Diffusion-Synthetic Training with Weakly Supervised Semantic Segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Exploring Limits of Diffusion-Synthetic Training with Weakly Supervised Semantic Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators