Skip to main content

Showing 1–42 of 42 results for author: Siarohin, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19388  [pdf, other

    cs.SD cs.CL cs.CV cs.MM eess.AS

    Taming Data and Transformers for Audio Generation

    Authors: Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin, Guha Balakrishnan, Sergey Tulyakov, Vicente Ordonez

    Abstract: Generating ambient sounds and effects is a challenging problem due to data scarcity and often insufficient caption quality, making it difficult to employ large-scale generative models for the task. In this work, we tackle the problem by introducing two new models. First, we propose AutoCap, a high-quality and efficient automatic audio captioning model. We show that by leveraging metadata available… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Project Webpage: https://snap-research.github.io/GenAU/

  2. arXiv:2406.07792  [pdf, other

    cs.CV

    Hierarchical Patch Diffusion Models for High-Resolution Video Generation

    Authors: Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Sergey Tulyakov

    Abstract: Diffusion models have demonstrated remarkable performance in image and video synthesis. However, scaling them to high-resolution inputs is challenging and requires restructuring the diffusion pipeline into multiple independent components, limiting scalability and complicating downstream applications. This makes it very efficient during training and unlocks end-to-end optimization on high-resolutio… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024

  3. arXiv:2406.07472  [pdf, other

    cs.CV

    4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

    Authors: Heng Yu, Chaoyang Wang, Peiye Zhuang, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Laszlo A Jeni, Sergey Tulyakov, Hsin-Ying Lee

    Abstract: Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack photorealism. To address these limitations, we introduce a novel pipeline designed for photorealistic text-to-4D scene generation, discarding the dependen… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  4. arXiv:2406.05649  [pdf, other

    cs.CV cs.AI

    GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement

    Authors: Peiye Zhuang, Songfang Han, Chaoyang Wang, Aliaksandr Siarohin, Jiaxu Zou, Michael Vasilkovsky, Vladislav Shakhrai, Sergey Korolev, Sergey Tulyakov, Hsin-Ying Lee

    Abstract: We propose a novel approach for 3D mesh reconstruction from multi-view images. Our method takes inspiration from large reconstruction models like LRM that use a transformer-based triplane generator and a Neural Radiance Field (NeRF) model trained on multi-view images. However, in our method, we introduce several important modifications that allow us to significantly enhance 3D reconstruction quali… ▽ More

    Submitted 13 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: 19 pages, 17 figures. Project page: https://snap-research.github.io/GTR/

  5. arXiv:2406.04324  [pdf, other

    cs.CV eess.IV

    SF-V: Single Forward Video Generation Model

    Authors: Zhixing Zhang, Yanyu Li, Yushu Wu, Yanwu Xu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Dimitris Metaxas, Sergey Tulyakov, Jian Ren

    Abstract: Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune p… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project Page: https://snap-research.github.io/SF-V

  6. arXiv:2402.19479  [pdf, other

    cs.CV

    Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

    Authors: Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Ekaterina Deyneka, Hsiang-wei Chao, Byung Eun Jeon, Yuwei Fang, Hsin-Ying Lee, Jian Ren, Ming-Hsuan Yang, Sergey Tulyakov

    Abstract: The quality of the data and annotation upper-bounds the quality of a downstream model. While there exist large text corpora and image-text pairs, high-quality video-text data is much harder to collect. First of all, manual labeling is more time-consuming, as it requires an annotator to watch an entire video. Second, videos have a temporal dimension, consisting of several scenes stacked together, a… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: CVPR 2024. Project Page: https://snap-research.github.io/Panda-70M

  7. arXiv:2402.14797  [pdf, other

    cs.CV cs.AI

    Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

    Authors: Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Ekaterina Deyneka, Tsai-Shien Chen, Anil Kag, Yuwei Fang, Aleksei Stoliar, Elisa Ricci, Jian Ren, Sergey Tulyakov

    Abstract: Contemporary models for generating images show remarkable quality and versatility. Swayed by these advantages, the research community repurposes them to generate videos. Since video content is highly redundant, we argue that naively bringing advances of image models to the video generation domain reduces motion fidelity, visual quality and impairs scalability. In this work, we build Snap Video, a… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  8. arXiv:2402.05235  [pdf, other

    cs.CV

    SPAD : Spatially Aware Multiview Diffusers

    Authors: Yash Kant, Ziyi Wu, Michael Vasilkovsky, Guocheng Qian, Jian Ren, Riza Alp Guler, Bernard Ghanem, Sergey Tulyakov, Igor Gilitschenski, Aliaksandr Siarohin

    Abstract: We present SPAD, a novel approach for creating consistent multi-view images from text prompts or single images. To enable multi-view generation, we repurpose a pretrained 2D diffusion model by extending its self-attention layers with cross-view interactions, and fine-tune it on a high quality subset of Objaverse. We find that a naive extension of the self-attention proposed in prior work (e.g. MVD… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: Webpage: https://yashkant.github.io/spad

  9. arXiv:2402.00867  [pdf, other

    cs.CV

    AToM: Amortized Text-to-Mesh using 2D Diffusion

    Authors: Guocheng Qian, Junli Cao, Aliaksandr Siarohin, Yash Kant, Chaoyang Wang, Michael Vasilkovsky, Hsin-Ying Lee, Yuwei Fang, Ivan Skorokhodov, Peiye Zhuang, Igor Gilitschenski, Jian Ren, Bernard Ghanem, Kfir Aberman, Sergey Tulyakov

    Abstract: We introduce Amortized Text-to-Mesh (AToM), a feed-forward text-to-mesh framework optimized across multiple text prompts simultaneously. In contrast to existing text-to-3D methods that often entail time-consuming per-prompt optimization and commonly output representations other than polygonal meshes, AToM directly generates high-quality textured meshes in less than 1 second with around 10 times re… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 19 pages with appendix and references. Webpage: https://snap-research.github.io/AToM/

  10. arXiv:2401.05583  [pdf, other

    cs.CV

    Diffusion Priors for Dynamic View Synthesis from Monocular Videos

    Authors: Chaoyang Wang, Peiye Zhuang, Aliaksandr Siarohin, Junli Cao, Guocheng Qian, Hsin-Ying Lee, Sergey Tulyakov

    Abstract: Dynamic novel view synthesis aims to capture the temporal evolution of visual content within videos. Existing methods struggle to distinguishing between motion and structure, particularly in scenarios where camera poses are either unknown or constrained compared to object motion. Furthermore, with information solely from reference images, it is extremely challenging to hallucinate unseen regions t… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  11. arXiv:2312.08885  [pdf, other

    cs.CV

    SceneWiz3D: Towards Text-guided 3D Scene Composition

    Authors: Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin, Peiye Zhuang, Yinghao Xu, Ceyuan Yang, Dahua Lin, Bolei Zhou, Sergey Tulyakov, Hsin-Ying Lee

    Abstract: We are witnessing significant breakthroughs in the technology for generating 3D objects from text. Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets. Generating entire scenes, however, remains very challenging as a scene contains multiple 3D objects, diverse and scattered. In this work, we introduce Scen… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: Project page: https://zqh0253.github.io/SceneWiz3D/

  12. arXiv:2310.16167  [pdf, other

    cs.CV

    iNVS: Repurposing Diffusion Inpainters for Novel View Synthesis

    Authors: Yash Kant, Aliaksandr Siarohin, Michael Vasilkovsky, Riza Alp Guler, Jian Ren, Sergey Tulyakov, Igor Gilitschenski

    Abstract: We present a method for generating consistent novel views from a single source image. Our approach focuses on maximizing the reuse of visible pixels from the source image. To achieve this, we use a monocular depth estimator that transfers visible pixels from the source view to the target view. Starting from a pre-trained 2D inpainting diffusion model, we train our method on the large-scale Objaver… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted to SIGGRAPH Asia, 2023 (Conference Papers)

  13. arXiv:2310.08579  [pdf, other

    cs.CV

    HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion

    Authors: Xian Liu, Jian Ren, Aliaksandr Siarohin, Ivan Skorokhodov, Yanyu Li, Dahua Lin, Xihui Liu, Ziwei Liu, Sergey Tulyakov

    Abstract: Despite significant advances in large-scale text-to-image models, achieving hyper-realistic human image generation remains a desirable yet unsolved task. Existing models like Stable Diffusion and DALL-E 2 tend to generate human images with incoherent parts or unnatural poses. To tackle these challenges, our key insight is that human image is inherently structural over multiple granularities, from… ▽ More

    Submitted 14 March, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: Accepted by ICLR 2024, camera-ready version. Project Page: https://snap-research.github.io/HyperHuman/

  14. arXiv:2307.05445  [pdf, other

    cs.CV

    AutoDecoding Latent 3D Diffusion Models

    Authors: Evangelos Ntavelis, Aliaksandr Siarohin, Kyle Olszewski, Chaoyang Wang, Luc Van Gool, Sergey Tulyakov

    Abstract: We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core. The 3D autodecoder framework embeds properties learned from the target dataset in the latent space, which can then be decoded into a volumetric representation for rendering view-consistent appearance and geometry. We then identify the appropriate intermediate volumetric latent s… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

    Comments: Project page: https://snap-research.github.io/3DVADER/

  15. arXiv:2307.03190  [pdf, other

    cs.CV cs.GR cs.LG

    Text-Guided Synthesis of Eulerian Cinemagraphs

    Authors: Aniruddha Mahapatra, Aliaksandr Siarohin, Hsin-Ying Lee, Sergey Tulyakov, Jun-Yan Zhu

    Abstract: We introduce Text2Cinemagraph, a fully automated method for creating cinemagraphs from text descriptions - an especially challenging task when prompts feature imaginary elements and artistic styles, given the complexity of interpreting the semantics and motions of these images. We focus on cinemagraphs of fluid elements, such as flowing rivers, and drifting clouds, which exhibit continuous motion… ▽ More

    Submitted 25 September, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: Project website: https://text2cinemagraph.github.io/website/

  16. arXiv:2306.17843  [pdf, other

    cs.CV

    Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

    Authors: Guocheng Qian, **jie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Skorokhodov, Peter Wonka, Sergey Tulyakov, Bernard Ghanem

    Abstract: We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing… ▽ More

    Submitted 23 July, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

    Comments: webpage: https://guochengqian.github.io/project/magic123/

  17. arXiv:2303.13472  [pdf, other

    cs.CV cs.AI

    Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion Models

    Authors: Willi Menapace, Aliaksandr Siarohin, Stéphane Lathuilière, Panos Achlioptas, Vladislav Golyanik, Sergey Tulyakov, Elisa Ricci

    Abstract: Neural video game simulators emerged as powerful tools to generate and edit videos. Their idea is to represent games as the evolution of an environment's state driven by the actions of its agents. While such a paradigm enables users to play a game action-by-action, its rigidity precludes more semantic forms of control. To overcome this limitation, we augment game models with prompts specified as a… ▽ More

    Submitted 21 January, 2024; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: ACM Transactions on Graphics \c{opyright} Copyright is held by the owner/author(s) 2023. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM Transactions on Graphics, http://dx.doi.org/10.1145/3635705

  18. arXiv:2303.01416  [pdf, other

    cs.CV cs.AI cs.GR

    3D generation on ImageNet

    Authors: Ivan Skorokhodov, Aliaksandr Siarohin, Yinghao Xu, Jian Ren, Hsin-Ying Lee, Peter Wonka, Sergey Tulyakov

    Abstract: Existing 3D-from-2D generators are typically designed for well-curated single-category datasets, where all the objects have (approximately) the same scale, 3D location, and orientation, and the camera always points to the center of the scene. This makes them inapplicable to diverse, in-the-wild datasets of non-alignable scenes rendered from arbitrary camera poses. In this work, we develop a 3D gen… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: ICLR 2023 (Oral)

    Journal ref: ICLR 2023

  19. arXiv:2302.09227  [pdf, other

    cs.CV cs.GR

    Invertible Neural Skinning

    Authors: Yash Kant, Aliaksandr Siarohin, Riza Alp Guler, Menglei Chai, Jian Ren, Sergey Tulyakov, Igor Gilitschenski

    Abstract: Building animatable and editable models of clothed humans from raw 3D scans and poses is a challenging problem. Existing reposing methods suffer from the limited expressiveness of Linear Blend Skinning (LBS), require costly mesh extraction to generate each new pose, and typically do not preserve surface correspondences across different poses. In this work, we introduce Invertible Neural Skinning (… ▽ More

    Submitted 4 March, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

  20. arXiv:2301.11326  [pdf, other

    cs.CV

    Unsupervised Volumetric Animation

    Authors: Aliaksandr Siarohin, Willi Menapace, Ivan Skorokhodov, Kyle Olszewski, Jian Ren, Hsin-Ying Lee, Menglei Chai, Sergey Tulyakov

    Abstract: We propose a novel approach for unsupervised 3D animation of non-rigid deformable objects. Our method learns the 3D structure and dynamics of objects solely from single-view RGB videos, and can decompose them into semantically meaningful parts that can be tracked and animated. Using a 3D autodecoder framework, paired with a keypoint estimator via a differentiable PnP algorithm, our model learns th… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

  21. arXiv:2301.09637  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    InfiniCity: Infinite-Scale City Synthesis

    Authors: Chieh Hubert Lin, Hsin-Ying Lee, Willi Menapace, Menglei Chai, Aliaksandr Siarohin, Ming-Hsuan Yang, Sergey Tulyakov

    Abstract: Toward infinite-scale 3D city synthesis, we propose a novel framework, InfiniCity, which constructs and renders an unconstrainedly large and 3D-grounded environment from random noises. InfiniCity decomposes the seemingly impractical task into three feasible modules, taking advantage of both 2D and 3D data. First, an infinite-pixel image synthesis module generates arbitrary-scale 2D maps from the b… ▽ More

    Submitted 14 August, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

  22. arXiv:2301.02700  [pdf, other

    cs.CV cs.GR

    3DAvatarGAN: Bridging Domains for Personalized Editable Avatars

    Authors: Rameen Abdal, Hsin-Ying Lee, Peihao Zhu, Menglei Chai, Aliaksandr Siarohin, Peter Wonka, Sergey Tulyakov

    Abstract: Modern 3D-GANs synthesize geometry and texture by training on large-scale datasets with a consistent structure. Training such models on stylized, artistic data, with often unknown, highly variable geometry, and camera information has not yet been shown possible. Can we train a 3D GAN on such artistic data, while maintaining multi-view consistency and texture quality? To this end, we propose an ada… ▽ More

    Submitted 26 March, 2023; v1 submitted 6 January, 2023; originally announced January 2023.

    Comments: Project Page: https://rameenabdal.github.io/3DAvatarGAN/

  23. arXiv:2212.11984  [pdf, other

    cs.CV

    DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis

    Authors: Yinghao Xu, Menglei Chai, Zifan Shi, Sida Peng, Ivan Skorokhodov, Aliaksandr Siarohin, Ceyuan Yang, Yujun Shen, Hsin-Ying Lee, Bolei Zhou, Sergey Tulyakov

    Abstract: Existing 3D-aware image synthesis approaches mainly focus on generating a single canonical object and show limited capacity in composing a complex scene containing a variety of objects. This work presents DisCoScene: a 3Daware generative model for high-quality and controllable scene synthesis. The key ingredient of our method is a very abstract object-level representation (i.e., 3D bounding boxes… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

    Comments: Project page: https://snap-research.github.io/discoscene/

  24. arXiv:2208.12550  [pdf, other

    cs.CV cs.GR

    Training and Tuning Generative Neural Radiance Fields for Attribute-Conditional 3D-Aware Face Generation

    Authors: Jichao Zhang, Aliaksandr Siarohin, Yahui Liu, Hao Tang, Nicu Sebe, Wei Wang

    Abstract: Generative Neural Radiance Fields (GNeRF) based 3D-aware GANs have demonstrated remarkable capabilities in generating high-quality images while maintaining strong 3D consistency. Notably, significant advancements have been made in the domain of face generation. However, most existing models prioritize view consistency over disentanglement, resulting in limited semantic/attribute control during gen… ▽ More

    Submitted 18 October, 2023; v1 submitted 26 August, 2022; originally announced August 2022.

    Comments: 13 pages

  25. arXiv:2203.01914  [pdf, other

    cs.CV cs.AI

    Playable Environments: Video Manipulation in Space and Time

    Authors: Willi Menapace, Stéphane Lathuilière, Aliaksandr Siarohin, Christian Theobalt, Sergey Tulyakov, Vladislav Golyanik, Elisa Ricci

    Abstract: We present Playable Environments - a new representation for interactive video generation and manipulation in space and time. With a single image at inference time, our novel framework allows the user to move objects in 3D while generating a video by providing a sequence of desired actions. The actions are learnt in an unsupervised manner. The camera can be controlled to get the desired viewpoint.… ▽ More

    Submitted 15 March, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  26. arXiv:2112.01422  [pdf, other

    cs.CV

    3D-Aware Semantic-Guided Generative Model for Human Synthesis

    Authors: Jichao Zhang, Enver Sangineto, Hao Tang, Aliaksandr Siarohin, Zhun Zhong, Nicu Sebe, Wei Wang

    Abstract: Generative Neural Radiance Field (GNeRF) models, which extract implicit 3D representations from 2D images, have recently been shown to produce realistic images representing rigid/semi-rigid objects, such as human faces or cars. However, they usually struggle to generate high-quality images representing non-rigid objects, such as the human body, which is of a great interest for many computer graphi… ▽ More

    Submitted 17 July, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

    Comments: ECCV 2022. 29 pages

  27. arXiv:2105.14739  [pdf, other

    cs.CV cs.GR

    Controllable Person Image Synthesis with Spatially-Adaptive Warped Normalization

    Authors: Jichao Zhang, Aliaksandr Siarohin, Hao Tang, Enver Sangineto, Wei Wang, Humphrey Sh, Nicu Sebe

    Abstract: Controllable person image generation aims to produce realistic human images with desirable attributes such as a given pose, cloth textures, or hairstyles. However, the large spatial misalignment between source and target images makes the standard image-to-image translation architectures unsuitable for this task. Most state-of-the-art methods focus on alignment for global pose-transfer tasks. Howev… ▽ More

    Submitted 9 January, 2023; v1 submitted 31 May, 2021; originally announced May 2021.

    Comments: 12 pages

  28. arXiv:2104.11280  [pdf, other

    cs.CV

    Motion Representations for Articulated Animation

    Authors: Aliaksandr Siarohin, Oliver J. Woodford, Jian Ren, Menglei Chai, Sergey Tulyakov

    Abstract: We propose novel motion representations for animating articulated objects consisting of distinct parts. In a completely unsupervised manner, our method identifies object parts, tracks them in a driving video, and infers their motions by considering their principal axes. In contrast to the previous keypoint-based works, our method extracts meaningful and consistent regions, describing locations, sh… ▽ More

    Submitted 22 April, 2021; originally announced April 2021.

    Journal ref: CVPR 2021

  29. arXiv:2101.12195  [pdf, other

    cs.CV cs.AI

    Playable Video Generation

    Authors: Willi Menapace, Stéphane Lathuilière, Sergey Tulyakov, Aliaksandr Siarohin, Elisa Ricci

    Abstract: This paper introduces the unsupervised learning problem of playable video generation (PVG). In PVG, we aim at allowing a user to control the generated video by selecting a discrete action at every time step as when playing a video game. The difficulty of the task lies both in learning semantically consistent actions and in generating realistic videos conditioned on the user input. We propose a nov… ▽ More

    Submitted 28 January, 2021; originally announced January 2021.

  30. arXiv:2007.06346  [pdf, other

    cs.LG cs.CV stat.ML

    Whitening for Self-Supervised Representation Learning

    Authors: Aleksandr Ermolov, Aliaksandr Siarohin, Enver Sangineto, Nicu Sebe

    Abstract: Most of the current self-supervised representation learning (SSL) methods are based on the contrastive loss and the instance-discrimination task, where augmented versions of the same image instance ("positives") are contrasted with instances extracted from other images ("negatives"). For the learning to be effective, many negatives should be compared with a positive pair, which is computationally… ▽ More

    Submitted 14 May, 2021; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: ICML 2021

  31. TriGAN: Image-to-Image Translation for Multi-Source Domain Adaptation

    Authors: Subhankar Roy, Aliaksandr Siarohin, Enver Sangineto, Nicu Sebe, Elisa Ricci

    Abstract: Most domain adaptation methods consider the problem of transferring knowledge to the target domain from a single source dataset. However, in practical applications, we typically have access to multiple sources. In this paper we propose the first approach for Multi-Source Domain Adaptation (MSDA) based on Generative Adversarial Networks. Our method is inspired by the observation that the appearance… ▽ More

    Submitted 19 April, 2020; originally announced April 2020.

    Journal ref: Machine Vision and Applications 2021

  32. arXiv:2004.03234  [pdf, other

    cs.CV

    Motion-supervised Co-Part Segmentation

    Authors: Aliaksandr Siarohin, Subhankar Roy, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, Nicu Sebe

    Abstract: Recent co-part segmentation methods mostly operate in a supervised learning setting, which requires a large amount of annotated data for training. To overcome this limitation, we propose a self-supervised deep learning method for co-part segmentation. Differently from previous works, our approach develops the idea that motion information inferred from videos can be leveraged to discover meaningful… ▽ More

    Submitted 15 April, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

    Journal ref: ICPR 2021

  33. arXiv:2003.00196  [pdf, other

    cs.CV cs.AI

    First Order Motion Model for Image Animation

    Authors: Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, Nicu Sebe

    Abstract: Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video. Our framework addresses this problem without using any annotation or prior information about the specific object to animate. Once trained on a set of videos depicting objects of the same category (e.g. faces, human bodies), our method can be applied to… ▽ More

    Submitted 1 October, 2020; v1 submitted 29 February, 2020; originally announced March 2020.

    Comments: NeurIPS 2019

  34. arXiv:1910.09139  [pdf, other

    cs.CV cs.LG

    DwNet: Dense warp-based network for pose-guided human video generation

    Authors: Polina Zablotskaia, Aliaksandr Siarohin, Bo Zhao, Leonid Sigal

    Abstract: Generation of realistic high-resolution videos of human subjects is a challenging and important task in computer vision. In this paper, we focus on human motion transfer - generation of a video depicting a particular subject, observed in a single image, performing a series of motions exemplified by an auxiliary (driving) video. Our GAN-based architecture, DwNet, leverages dense intermediate pose-g… ▽ More

    Submitted 20 October, 2019; originally announced October 2019.

    Comments: Accepted to BMVC 2019

  35. arXiv:1905.02655  [pdf, other

    cs.CV

    Attention-based Fusion for Multi-source Human Image Generation

    Authors: Stéphane Lathuilière, Enver Sangineto, Aliaksandr Siarohin, Nicu Sebe

    Abstract: We present a generalization of the person-image generation task, in which a human image is generated conditioned on a target pose and a set X of source appearance images. In this way, we can exploit multiple, possibly complementary images of the same person which are usually available at training and at testing time. The solution we propose is mainly based on a local attention mechanism which sele… ▽ More

    Submitted 7 May, 2019; originally announced May 2019.

    Comments: 10 pages

  36. arXiv:1905.00007  [pdf, other

    cs.CV

    Appearance and Pose-Conditioned Human Image Generation using Deformable GANs

    Authors: Aliaksandr Siarohin, Stéphane Lathuilière, Enver Sangineto, Nicu Sebe

    Abstract: In this paper, we address the problem of generating person images conditioned on both pose and appearance information. Specifically, given an image xa of a person and a target pose P(xb), extracted from a different image xb, we synthesize a new image of that person in pose P(xb), while preserving the visual details in xa. In order to deal with pixel-to-pixel misalignments caused by the pose differ… ▽ More

    Submitted 14 October, 2019; v1 submitted 30 April, 2019; originally announced May 2019.

    Comments: To appear on IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1801.00055

  37. arXiv:1903.03215  [pdf, other

    cs.CV

    Unsupervised Domain Adaptation using Feature-Whitening and Consensus Loss

    Authors: Subhankar Roy, Aliaksandr Siarohin, Enver Sangineto, Samuel Rota Bulo, Nicu Sebe, Elisa Ricci

    Abstract: A classifier trained on a dataset seldom works on other datasets obtained under different conditions due to domain shift. This problem is commonly addressed by domain adaptation methods. In this work we introduce a novel deep learning framework which unifies different paradigms in unsupervised domain adaptation. Specifically, we propose domain alignment layers which implement feature whitening for… ▽ More

    Submitted 16 February, 2020; v1 submitted 7 March, 2019; originally announced March 2019.

    Comments: CVPR 2019

  38. arXiv:1812.08861  [pdf, other

    cs.GR cs.CV cs.LG stat.ML

    Animating Arbitrary Objects via Deep Motion Transfer

    Authors: Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, Nicu Sebe

    Abstract: This paper introduces a novel deep learning framework for image animation. Given an input image with a target object and a driving video sequence depicting a moving object, our framework generates a video in which the target object is animated according to the driving sequence. This is achieved through a deep architecture that decouples appearance and motion information. Our framework consists of… ▽ More

    Submitted 30 August, 2019; v1 submitted 20 December, 2018; originally announced December 2018.

    Comments: CVPR-2019 (oral)

  39. arXiv:1812.00717  [pdf, other

    stat.ML cs.LG

    Enhancing Perceptual Attributes with Bayesian Style Generation

    Authors: Aliaksandr Siarohin, Gloria Zen, Nicu Sebe, Elisa Ricci

    Abstract: Deep learning has brought an unprecedented progress in computer vision and significant advances have been made in predicting subjective properties inherent to visual data (e.g., memorability, aesthetic quality, evoked emotions, etc.). Recently, some research works have even proposed deep learning approaches to modify images such as to appropriately alter these properties. Following this research l… ▽ More

    Submitted 3 December, 2018; originally announced December 2018.

    Comments: ACCV-2018

  40. arXiv:1806.00420  [pdf, other

    stat.ML cs.LG

    Whitening and Coloring batch transform for GANs

    Authors: Aliaksandr Siarohin, Enver Sangineto, Nicu Sebe

    Abstract: Batch Normalization (BN) is a common technique used to speed-up and stabilize training. On the other hand, the learnable parameters of BN are commonly used in conditional Generative Adversarial Networks (cGANs) for representing class-specific information using conditional Batch Normalization (cBN). In this paper we propose to generalize both BN and cBN using a Whitening and Coloring based batch no… ▽ More

    Submitted 25 February, 2019; v1 submitted 1 June, 2018; originally announced June 2018.

    Comments: ICLR 2019

  41. arXiv:1801.00055  [pdf, other

    cs.CV

    Deformable GANs for Pose-based Human Image Generation

    Authors: Aliaksandr Siarohin, Enver Sangineto, Stephane Lathuiliere, Nicu Sebe

    Abstract: In this paper we address the problem of generating person images conditioned on a given pose. Specifically, given an image of a person and a target pose, we synthesize a new image of that person in the novel pose. In order to deal with pixel-to-pixel misalignments caused by the pose differences, we introduce deformable skip connections in the generator of our Generative Adversarial Network. Moreov… ▽ More

    Submitted 6 April, 2018; v1 submitted 29 December, 2017; originally announced January 2018.

    Comments: CVPR 2018 version

  42. arXiv:1704.01745  [pdf, other

    cs.CV

    How to Make an Image More Memorable? A Deep Style Transfer Approach

    Authors: Aliaksandr Siarohin, Gloria Zen, Cveta Majtanovic, Xavier Alameda-Pineda, Elisa Ricci, Nicu Sebe

    Abstract: Recent works have shown that it is possible to automatically predict intrinsic image properties like memorability. In this paper, we take a step forward addressing the question: "Can we make an image more memorable?". Methods for automatically increasing image memorability would have an impact in many application fields like education, gaming or advertising. Our work is inspired by the popular edi… ▽ More

    Submitted 6 April, 2017; originally announced April 2017.

    Comments: Accepted at ACM ICMR 2017