Search | arXiv e-print repository

ObjectComposer: Consistent Generation of Multiple Objects Without Fine-tuning

Authors: Alec Helbling, Evan Montoya, Duen Horng Chau

Abstract: Recent text-to-image generative models can generate high-fidelity images from text prompts. However, these models struggle to consistently generate the same objects in different contexts with the same appearance. Consistent object generation is important to many downstream tasks like generating comic book illustrations with consistent characters and setting. Numerous approaches attempt to solve th… ▽ More Recent text-to-image generative models can generate high-fidelity images from text prompts. However, these models struggle to consistently generate the same objects in different contexts with the same appearance. Consistent object generation is important to many downstream tasks like generating comic book illustrations with consistent characters and setting. Numerous approaches attempt to solve this problem by extending the vocabulary of diffusion models through fine-tuning. However, even lightweight fine-tuning approaches can be prohibitively expensive to run at scale and in real-time. We introduce a method called ObjectComposer for generating compositions of multiple objects that resemble user-specified images. Our approach is training-free, leveraging the abilities of preexisting models. We build upon the recent BLIP-Diffusion model, which can generate images of single objects specified by reference images. ObjectComposer enables the consistent generation of compositions containing multiple specific objects simultaneously, all without modifying the weights of the underlying models. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2210.14896 [pdf, other]

DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models

Authors: Zijie J. Wang, Evan Montoya, David Munechika, Haoyang Yang, Benjamin Hoover, Duen Horng Chau

Abstract: With recent advancements in diffusion models, users can generate high-quality images by writing text prompts in natural language. However, generating images with desired details requires proper prompts, and it is often unclear how a model reacts to different prompts or what the best prompts are. To help researchers tackle these critical challenges, we introduce DiffusionDB, the first large-scale t… ▽ More With recent advancements in diffusion models, users can generate high-quality images by writing text prompts in natural language. However, generating images with desired details requires proper prompts, and it is often unclear how a model reacts to different prompts or what the best prompts are. To help researchers tackle these critical challenges, we introduce DiffusionDB, the first large-scale text-to-image prompt dataset totaling 6.5TB, containing 14 million images generated by Stable Diffusion, 1.8 million unique prompts, and hyperparameters specified by real users. We analyze the syntactic and semantic characteristics of prompts. We pinpoint specific hyperparameter values and prompt styles that can lead to model errors and present evidence of potentially harmful model usage, such as the generation of misinformation. The unprecedented scale and diversity of this human-actuated dataset provide exciting research opportunities in understanding the interplay between prompts and generative models, detecting deepfakes, and designing human-AI interaction tools to help users more easily use these models. DiffusionDB is publicly available at: https://poloclub.github.io/diffusiondb. △ Less

Submitted 6 July, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

Comments: Accepted to ACL 2023 (nominated for best paper, top 1.6% of submissions, oral presentation). 17 pages, 11 figures. The dataset is available at https://huggingface.co/datasets/poloclub/diffusiondb. The code is at https://github.com/poloclub/diffusiondb. The interactive visualization demo is at https://poloclub.github.io/diffusiondb/explorer/

arXiv:2210.13510 [pdf, other]

Evaluation of Argo Scholar with Observational Study

Authors: Kevin Li, Haoyang Yang, Evan Montoya, Anish Upadhayay, Zhiyan Zhou, Jon Saad-Falcon, Duen Horng Chau

Abstract: Discovering and making sense of relevant literature is fundamental in any scientific field. Node-link diagram-based visualization tools can aid this process; however, existing tools have been evaluated only on small scales. This paper evaluates Argo Scholar, an open-source visualization tool designed for interactive exploration of literature and easy sharing of exploration results. A large-scale u… ▽ More Discovering and making sense of relevant literature is fundamental in any scientific field. Node-link diagram-based visualization tools can aid this process; however, existing tools have been evaluated only on small scales. This paper evaluates Argo Scholar, an open-source visualization tool designed for interactive exploration of literature and easy sharing of exploration results. A large-scale user study of 122 participants from diverse backgrounds and experiences showed that Argo Scholar is effective at hel** users find related work and understand paper connections, and incremental graph-based exploration is effective across diverse disciplines. Based on the user study and user feedback, we provide design considerations and feature suggestions for future work. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: VIS IEEE 22

Showing 1–3 of 3 results for author: Montoya, E