Skip to main content

Showing 1–18 of 18 results for author: Menapace, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19388  [pdf, other

    cs.SD cs.CL cs.CV cs.MM eess.AS

    Taming Data and Transformers for Audio Generation

    Authors: Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin, Guha Balakrishnan, Sergey Tulyakov, Vicente Ordonez

    Abstract: Generating ambient sounds and effects is a challenging problem due to data scarcity and often insufficient caption quality, making it difficult to employ large-scale generative models for the task. In this work, we tackle the problem by introducing two new models. First, we propose AutoCap, a high-quality and efficient automatic audio captioning model. We show that by leveraging metadata available… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Project Webpage: https://snap-research.github.io/GenAU/

  2. arXiv:2406.07792  [pdf, other

    cs.CV

    Hierarchical Patch Diffusion Models for High-Resolution Video Generation

    Authors: Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Sergey Tulyakov

    Abstract: Diffusion models have demonstrated remarkable performance in image and video synthesis. However, scaling them to high-resolution inputs is challenging and requires restructuring the diffusion pipeline into multiple independent components, limiting scalability and complicating downstream applications. This makes it very efficient during training and unlocks end-to-end optimization on high-resolutio… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024

  3. arXiv:2406.07472  [pdf, other

    cs.CV

    4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

    Authors: Heng Yu, Chaoyang Wang, Peiye Zhuang, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Laszlo A Jeni, Sergey Tulyakov, Hsin-Ying Lee

    Abstract: Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack photorealism. To address these limitations, we introduce a novel pipeline designed for photorealistic text-to-4D scene generation, discarding the dependen… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  4. arXiv:2406.04324  [pdf, other

    cs.CV eess.IV

    SF-V: Single Forward Video Generation Model

    Authors: Zhixing Zhang, Yanyu Li, Yushu Wu, Yanwu Xu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Dimitris Metaxas, Sergey Tulyakov, Jian Ren

    Abstract: Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune p… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project Page: https://snap-research.github.io/SF-V

  5. arXiv:2404.01014  [pdf, other

    cs.CV

    Harnessing Large Language Models for Training-free Video Anomaly Detection

    Authors: Luca Zanella, Willi Menapace, Massimiliano Mancini, Yiming Wang, Elisa Ricci

    Abstract: Video anomaly detection (VAD) aims to temporally locate abnormal events in a video. Existing works mostly rely on training deep models to learn the distribution of normality with either video-level supervision, one-class supervision, or in an unsupervised setting. Training-based methods are prone to be domain-specific, thus being costly for practical deployment as any domain change will involve da… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR 2024. Project website at https://lucazanella.github.io/lavad/

  6. arXiv:2402.19479  [pdf, other

    cs.CV

    Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

    Authors: Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Ekaterina Deyneka, Hsiang-wei Chao, Byung Eun Jeon, Yuwei Fang, Hsin-Ying Lee, Jian Ren, Ming-Hsuan Yang, Sergey Tulyakov

    Abstract: The quality of the data and annotation upper-bounds the quality of a downstream model. While there exist large text corpora and image-text pairs, high-quality video-text data is much harder to collect. First of all, manual labeling is more time-consuming, as it requires an annotator to watch an entire video. Second, videos have a temporal dimension, consisting of several scenes stacked together, a… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: CVPR 2024. Project Page: https://snap-research.github.io/Panda-70M

  7. arXiv:2402.14797  [pdf, other

    cs.CV cs.AI

    Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

    Authors: Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Ekaterina Deyneka, Tsai-Shien Chen, Anil Kag, Yuwei Fang, Aleksei Stoliar, Elisa Ricci, Jian Ren, Sergey Tulyakov

    Abstract: Contemporary models for generating images show remarkable quality and versatility. Swayed by these advantages, the research community repurposes them to generate videos. Since video content is highly redundant, we argue that naively bringing advances of image models to the video generation domain reduces motion fidelity, visual quality and impairs scalability. In this work, we build Snap Video, a… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  8. arXiv:2312.01800  [pdf, other

    cs.CV

    Collaborative Neural Painting

    Authors: Nicola Dall'Asen, Willi Menapace, Elia Peruzzo, Enver Sangineto, Yiming Wang, Elisa Ricci

    Abstract: The process of painting fosters creativity and rational planning. However, existing generative AI mostly focuses on producing visually pleasant artworks, without emphasizing the painting process. We introduce a novel task, Collaborative Neural Painting (CNP), to facilitate collaborative art painting generation between humans and machines. Given any number of user-input brushstrokes as the context… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Submitted to Computer Vision and Image Understanding, project website at https://fodark.github.io/collaborative-neural-painting/

  9. arXiv:2310.02835  [pdf, other

    cs.CV

    Delving into CLIP latent space for Video Anomaly Recognition

    Authors: Luca Zanella, Benedetta Liberatori, Willi Menapace, Fabio Poiesi, Yiming Wang, Elisa Ricci

    Abstract: We tackle the complex problem of detecting and recognising anomalies in surveillance videos at the frame level, utilising only video-level supervision. We introduce the novel method AnomalyCLIP, the first to combine Large Language and Vision (LLV) models, such as CLIP, with multiple instance learning for joint video anomaly detection and classification. Our approach specifically involves manipulat… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: submitted to Computer Vision and Image Understanding, project website and code are available at https://luca-zanella-dvl.github.io/AnomalyCLIP/

  10. Interactive Neural Painting

    Authors: Elia Peruzzo, Willi Menapace, Vidit Goel, Federica Arrigoni, Hao Tang, Xingqian Xu, Arman Chopikyan, Nikita Orlov, Yuxiao Hu, Humphrey Shi, Nicu Sebe, Elisa Ricci

    Abstract: In the last few years, Neural Painting (NP) techniques became capable of producing extremely realistic artworks. This paper advances the state of the art in this emerging research domain by proposing the first approach for Interactive NP. Considering a setting where a user looks at a scene and tries to reproduce it on a painting, our objective is to develop a computational framework to assist the… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: This is a preprint version of the paper to appear at Computer Vision and Image Understanding (CVIU). The final journal version will be available at https://www.sciencedirect.com/science/article/pii/S1077314223001583

    Journal ref: 10.1016/j.cviu.2023.103778

  11. arXiv:2303.15444  [pdf, other

    cs.CV

    Quantum Multi-Model Fitting

    Authors: Matteo Farina, Luca Magri, Willi Menapace, Elisa Ricci, Vladislav Golyanik, Federica Arrigoni

    Abstract: Geometric model fitting is a challenging but fundamental computer vision problem. Recently, quantum optimization has been shown to enhance robust fitting for the case of a single model, while leaving the question of multi-model fitting open. In response to this challenge, this paper shows that the latter case can significantly benefit from quantum hardware and proposes the first quantum approach t… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: In Computer Vision and Pattern Recognition (CVPR) 2023; Highlight

  12. arXiv:2303.13472  [pdf, other

    cs.CV cs.AI

    Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion Models

    Authors: Willi Menapace, Aliaksandr Siarohin, Stéphane Lathuilière, Panos Achlioptas, Vladislav Golyanik, Sergey Tulyakov, Elisa Ricci

    Abstract: Neural video game simulators emerged as powerful tools to generate and edit videos. Their idea is to represent games as the evolution of an environment's state driven by the actions of its agents. While such a paradigm enables users to play a game action-by-action, its rigidity precludes more semantic forms of control. To overcome this limitation, we augment game models with prompts specified as a… ▽ More

    Submitted 21 January, 2024; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: ACM Transactions on Graphics \c{opyright} Copyright is held by the owner/author(s) 2023. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM Transactions on Graphics, http://dx.doi.org/10.1145/3635705

  13. arXiv:2301.11326  [pdf, other

    cs.CV

    Unsupervised Volumetric Animation

    Authors: Aliaksandr Siarohin, Willi Menapace, Ivan Skorokhodov, Kyle Olszewski, Jian Ren, Hsin-Ying Lee, Menglei Chai, Sergey Tulyakov

    Abstract: We propose a novel approach for unsupervised 3D animation of non-rigid deformable objects. Our method learns the 3D structure and dynamics of objects solely from single-view RGB videos, and can decompose them into semantically meaningful parts that can be tracked and animated. Using a 3D autodecoder framework, paired with a keypoint estimator via a differentiable PnP algorithm, our model learns th… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

  14. arXiv:2301.09637  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    InfiniCity: Infinite-Scale City Synthesis

    Authors: Chieh Hubert Lin, Hsin-Ying Lee, Willi Menapace, Menglei Chai, Aliaksandr Siarohin, Ming-Hsuan Yang, Sergey Tulyakov

    Abstract: Toward infinite-scale 3D city synthesis, we propose a novel framework, InfiniCity, which constructs and renders an unconstrainedly large and 3D-grounded environment from random noises. InfiniCity decomposes the seemingly impractical task into three feasible modules, taking advantage of both 2D and 3D data. First, an infinite-pixel image synthesis module generates arbitrary-scale 2D maps from the b… ▽ More

    Submitted 14 August, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

  15. arXiv:2203.13185  [pdf, other

    cs.CV

    Quantum Motion Segmentation

    Authors: Federica Arrigoni, Willi Menapace, Marcel Seelbach Benkner, Elisa Ricci, Vladislav Golyanik

    Abstract: Motion segmentation is a challenging problem that seeks to identify independent motions in two or several input images. This paper introduces the first algorithm for motion segmentation that relies on adiabatic quantum optimization of the objective function. The proposed method achieves on-par performance with the state of the art on problem instances which can be mapped to modern quantum annealer… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

  16. arXiv:2203.01914  [pdf, other

    cs.CV cs.AI

    Playable Environments: Video Manipulation in Space and Time

    Authors: Willi Menapace, Stéphane Lathuilière, Aliaksandr Siarohin, Christian Theobalt, Sergey Tulyakov, Vladislav Golyanik, Elisa Ricci

    Abstract: We present Playable Environments - a new representation for interactive video generation and manipulation in space and time. With a single image at inference time, our novel framework allows the user to move objects in 3D while generating a video by providing a sequence of desired actions. The actions are learnt in an unsupervised manner. The camera can be controlled to get the desired viewpoint.… ▽ More

    Submitted 15 March, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  17. arXiv:2101.12195  [pdf, other

    cs.CV cs.AI

    Playable Video Generation

    Authors: Willi Menapace, Stéphane Lathuilière, Sergey Tulyakov, Aliaksandr Siarohin, Elisa Ricci

    Abstract: This paper introduces the unsupervised learning problem of playable video generation (PVG). In PVG, we aim at allowing a user to control the generated video by selecting a discrete action at every time step as when playing a video game. The difficulty of the task lies both in learning semantically consistent actions and in generating realistic videos conditioned on the user input. We propose a nov… ▽ More

    Submitted 28 January, 2021; originally announced January 2021.

  18. arXiv:2008.04646  [pdf, other

    cs.CV cs.LG

    Learning to Cluster under Domain Shift

    Authors: Willi Menapace, Stéphane Lathuilière, Elisa Ricci

    Abstract: While unsupervised domain adaptation methods based on deep architectures have achieved remarkable success in many computer vision tasks, they rely on a strong assumption, i.e. labeled source data must be available. In this work we overcome this assumption and we address the problem of transferring knowledge from a source to a target domain when both source and target data have no annotations. Insp… ▽ More

    Submitted 11 August, 2020; originally announced August 2020.

    Comments: ECCV 2020