TextDiffuser: Diffusion Models as Text Painters

Chen, **gye; Huang, Yupan; Lv, Tengchao; Cui, Lei; Chen, Qifeng; Wei, Furu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2305.10855 (cs)

[Submitted on 18 May 2023 (v1), last revised 30 Oct 2023 (this version, v5)]

Title:TextDiffuser: Diffusion Models as Text Painters

Authors:**gye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei

View PDF

Abstract:Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text. To address this issue, we introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds. TextDiffuser consists of two stages: first, a Transformer model generates the layout of keywords extracted from text prompts, and then diffusion models generate images conditioned on the text prompt and the generated layout. Additionally, we contribute the first large-scale text images dataset with OCR annotations, MARIO-10M, containing 10 million image-text pairs with text recognition, detection, and character-level segmentation annotations. We further collect the MARIO-Eval benchmark to serve as a comprehensive tool for evaluating text rendering quality. Through experiments and user studies, we show that TextDiffuser is flexible and controllable to create high-quality text images using text prompts alone or together with text template images, and conduct text inpainting to reconstruct incomplete images with text. The code, model, and dataset will be available at \url{this https URL}.

Comments:	NeurIPS 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2305.10855 [cs.CV]
	(or arXiv:2305.10855v5 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2305.10855

Submission history

From: Lei Cui [view email]
[v1] Thu, 18 May 2023 10:16:19 UTC (48,086 KB)
[v2] Wed, 24 May 2023 17:57:19 UTC (46,691 KB)
[v3] Wed, 7 Jun 2023 05:55:26 UTC (46,691 KB)
[v4] Tue, 13 Jun 2023 11:13:22 UTC (46,690 KB)
[v5] Mon, 30 Oct 2023 06:33:01 UTC (47,300 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:TextDiffuser: Diffusion Models as Text Painters

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:TextDiffuser: Diffusion Models as Text Painters

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators