Skip to main content

Showing 1–3 of 3 results for author: English, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.03206  [pdf, other

    cs.CV

    Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

    Authors: Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, Robin Rombach

    Abstract: Diffusion models create data from noise by inverting the forward paths of data towards noise and have emerged as a powerful generative modeling technique for high-dimensional, perceptual data such as images and videos. Rectified flow is a recent generative model formulation that connects data and noise in a straight line. Despite its better theoretical properties and conceptual simplicity, it is n… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  2. arXiv:2311.15127  [pdf, other

    cs.CV

    Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

    Authors: Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, Varun Jampani, Robin Rombach

    Abstract: We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained for 2D image synthesis have been turned into generative video models by inserting temporal layers and finetuning them on small, high-quality video datasets. However, training methods in the literature vary wi… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

  3. arXiv:2307.01952  [pdf, other

    cs.CV cs.AI

    SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

    Authors: Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, Robin Rombach

    Abstract: We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ra… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.