Skip to main content

Showing 1–2 of 2 results for author: Haji-Ali, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19388  [pdf, other

    cs.SD cs.CL cs.CV cs.MM eess.AS

    Taming Data and Transformers for Audio Generation

    Authors: Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin, Guha Balakrishnan, Sergey Tulyakov, Vicente Ordonez

    Abstract: Generating ambient sounds and effects is a challenging problem due to data scarcity and often insufficient caption quality, making it difficult to employ large-scale generative models for the task. In this work, we tackle the problem by introducing two new models. First, we propose AutoCap, a high-quality and efficient automatic audio captioning model. We show that by leveraging metadata available… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Project Webpage: https://snap-research.github.io/GenAU/

  2. arXiv:2311.18822  [pdf, other

    cs.CV

    ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation

    Authors: Moayed Haji-Ali, Guha Balakrishnan, Vicente Ordonez

    Abstract: Diffusion models have revolutionized image generation in recent years, yet they are still limited to a few sizes and aspect ratios. We propose ElasticDiffusion, a novel training-free decoding method that enables pretrained text-to-image diffusion models to generate images with various sizes. ElasticDiffusion attempts to decouple the generation trajectory of a pretrained model into local and global… ▽ More

    Submitted 31 March, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Accepted at CVPR 2024. Project Page: https://elasticdiffusion.github.io/