Skip to main content

Showing 1–9 of 9 results for author: Schönfeld, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05224  [pdf, other

    cs.CV

    Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation

    Authors: Jonas Kohler, Albert Pumarola, Edgar Schönfeld, Artsiom Sanakoyeu, Roshan Sumbaly, Peter Vajda, Ali Thabet

    Abstract: Diffusion models are a powerful generative framework, but come with expensive inference. Existing acceleration methods often compromise image quality or fail under complex conditioning when operating in an extremely low-step regime. In this work, we propose a novel distillation framework tailored to enable high-fidelity, diverse sample generation using just one to three steps. Our approach compris… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  2. arXiv:2402.06088  [pdf, other

    cs.CV

    Animated Stickers: Bringing Stickers to Life with Video Diffusion

    Authors: David Yan, Winnie Zhang, Luxin Zhang, Anmol Kalia, Dingkang Wang, Ankit Ramchandani, Miao Liu, Albert Pumarola, Edgar Schoenfeld, Elliot Blanchard, Krishna Narni, Yaqiao Luo, Lawrence Chen, Guan Pang, Ali Thabet, Peter Vajda, Amy Bearman, Licheng Yu

    Abstract: We introduce animated stickers, a video diffusion model which generates an animation conditioned on a text prompt and static sticker image. Our model is built on top of the state-of-the-art Emu text-to-image model, with the addition of temporal layers to model motion. Due to the domain gap, i.e. differences in visual and motion style, a model which performed well on generating natural videos can n… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  3. arXiv:2312.03209  [pdf, other

    cs.CV

    Cache Me if You Can: Accelerating Diffusion Models through Block Caching

    Authors: Felix Wimbauer, Bichen Wu, Edgar Schoenfeld, Xiaoliang Dai, Ji Hou, Zijian He, Artsiom Sanakoyeu, Peizhao Zhang, Sam Tsai, Jonas Kohler, Christian Rupprecht, Daniel Cremers, Peter Vajda, Jialiang Wang

    Abstract: Diffusion models have recently revolutionized the field of image synthesis due to their ability to generate photorealistic images. However, one of the major drawbacks of diffusion models is that the image generation process is costly. A large image-to-image network has to be applied many times to iteratively refine an image from random noise. While many recent works propose techniques to reduce th… ▽ More

    Submitted 12 January, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Project page: https://fwmb.github.io/blockcaching/

  4. arXiv:2212.01455  [pdf, other

    cs.CV

    Discovering Class-Specific GAN Controls for Semantic Image Synthesis

    Authors: Edgar Schönfeld, Julio Borges, Vadim Sushko, Bernt Schiele, Anna Khoreva

    Abstract: Prior work has extensively studied the latent space structure of GANs for unconditional image synthesis, enabling global editing of generated images by the unsupervised discovery of interpretable latent directions. However, the discovery of latent directions for conditional GANs for semantic image synthesis (SIS) has remained unexplored. In this work, we specifically focus on addressing this gap.… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

  5. arXiv:2205.02843  [pdf

    eess.IV cs.AI cs.CV cs.LG

    Generative Adversarial Network Based Synthetic Learning and a Novel Domain Relevant Loss Term for Spine Radiographs

    Authors: Ethan Schonfeld, Anand Veeravagu

    Abstract: Problem: There is a lack of big data for the training of deep learning models in medicine, characterized by the time cost of data collection and privacy concerns. Generative adversarial networks (GANs) offer both the potential to generate new data, as well as to use this newly generated data, without inclusion of patients' real data, for downstream applications. Approach: A series of GANs were t… ▽ More

    Submitted 4 May, 2022; originally announced May 2022.

  6. arXiv:2205.02841  [pdf

    eess.IV cs.AI cs.CV cs.LG

    Understanding Transfer Learning for Chest Radiograph Clinical Report Generation with Modified Transformer Architectures

    Authors: Edward Vendrow, Ethan Schonfeld

    Abstract: The image captioning task is increasingly prevalent in artificial intelligence applications for medicine. One important application is clinical report generation from chest radiographs. The clinical writing of unstructured reports is time consuming and error-prone. An automated system would improve standardization, error reduction, time consumption, and medical accessibility. In this paper we demo… ▽ More

    Submitted 4 May, 2022; originally announced May 2022.

  7. arXiv:2012.04781  [pdf, other

    cs.CV cs.LG eess.IV

    You Only Need Adversarial Supervision for Semantic Image Synthesis

    Authors: Vadim Sushko, Edgar Schönfeld, Dan Zhang, Juergen Gall, Bernt Schiele, Anna Khoreva

    Abstract: Despite their recent successes, GAN models for semantic image synthesis still suffer from poor image quality when trained with only adversarial supervision. Historically, additionally employing the VGG-based perceptual loss has helped to overcome this issue, significantly improving the synthesis quality, but at the same time limiting the progress of GAN models for semantic image synthesis. In this… ▽ More

    Submitted 19 March, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: Published at ICLR 2021 (Main Conference). Code repository: https://github.com/boschresearch/OASIS

  8. arXiv:2002.12655  [pdf, other

    cs.CV cs.LG eess.IV

    A U-Net Based Discriminator for Generative Adversarial Networks

    Authors: Edgar Schönfeld, Bernt Schiele, Anna Khoreva

    Abstract: Among the major remaining challenges for generative adversarial networks (GANs) is the capacity to synthesize globally and locally coherent images with object shapes and textures indistinguishable from real images. To target this issue we propose an alternative U-Net based discriminator architecture, borrowing the insights from the segmentation literature. The proposed U-Net based architecture all… ▽ More

    Submitted 19 March, 2021; v1 submitted 28 February, 2020; originally announced February 2020.

    Comments: CVPR 2020 (Main Conference). Code repository: https://github.com/boschresearch/unetgan

  9. arXiv:1812.01784  [pdf, other

    cs.CV cs.AI cs.LG

    Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders

    Authors: Edgar Schönfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, Zeynep Akata

    Abstract: Many approaches in generalized zero-shot learning rely on cross-modal map** between the image feature space and the class embedding space. As labeled images are expensive, one direction is to augment the dataset by generating either images or image features. However, the former misses fine-grained details and the latter requires learning a map** associated with class embeddings. In this work,… ▽ More

    Submitted 5 April, 2019; v1 submitted 4 December, 2018; originally announced December 2018.

    Comments: Accepted at CVPR 2019