Skip to main content

Showing 1–25 of 25 results for author: Skorokhodov, I

.
  1. arXiv:2406.12831  [pdf, other

    cs.CV cs.AI cs.MM

    VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing

    Authors: **g Gu, Yuwei Fang, Ivan Skorokhodov, Peter Wonka, Xinya Du, Sergey Tulyakov, Xin Eric Wang

    Abstract: Video editing stands as a cornerstone of digital media, from entertainment and education to professional communication. However, previous methods often overlook the necessity of comprehensively understanding both global and local contexts, leading to inaccurate and inconsistency edits in the spatiotemporal dimension, especially for long videos. In this paper, we introduce VIA, a unified spatiotemp… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 13 pages, 11 figures

  2. arXiv:2406.07792  [pdf, other

    cs.CV

    Hierarchical Patch Diffusion Models for High-Resolution Video Generation

    Authors: Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Sergey Tulyakov

    Abstract: Diffusion models have demonstrated remarkable performance in image and video synthesis. However, scaling them to high-resolution inputs is challenging and requires restructuring the diffusion pipeline into multiple independent components, limiting scalability and complicating downstream applications. This makes it very efficient during training and unlocks end-to-end optimization on high-resolutio… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024

  3. arXiv:2406.04324  [pdf, other

    cs.CV eess.IV

    SF-V: Single Forward Video Generation Model

    Authors: Zhixing Zhang, Yanyu Li, Yushu Wu, Yanwu Xu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Dimitris Metaxas, Sergey Tulyakov, Jian Ren

    Abstract: Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune p… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project Page: https://snap-research.github.io/SF-V

  4. arXiv:2403.17920  [pdf, other

    cs.CV

    TC4D: Trajectory-Conditioned Text-to-4D Generation

    Authors: Sherwin Bahmani, Xian Liu, Yifan Wang, Ivan Skorokhodov, Victor Rong, Ziwei Liu, Xihui Liu, Jeong Joon Park, Sergey Tulyakov, Gordon Wetzstein, Andrea Tagliasacchi, David B. Lindell

    Abstract: Recent techniques for text-to-4D generation synthesize dynamic 3D scenes using supervision from pre-trained text-to-video models. However, existing representations for motion, such as deformation models or time-dependent neural representations, are limited in the amount of motion they can generate-they cannot synthesize motion extending far beyond the bounding box used for volume rendering. The la… ▽ More

    Submitted 10 April, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Project Page: https://sherwinbahmani.github.io/tc4d

  5. arXiv:2402.14797  [pdf, other

    cs.CV cs.AI

    Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

    Authors: Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Ekaterina Deyneka, Tsai-Shien Chen, Anil Kag, Yuwei Fang, Aleksei Stoliar, Elisa Ricci, Jian Ren, Sergey Tulyakov

    Abstract: Contemporary models for generating images show remarkable quality and versatility. Swayed by these advantages, the research community repurposes them to generate videos. Since video content is highly redundant, we argue that naively bringing advances of image models to the video generation domain reduces motion fidelity, visual quality and impairs scalability. In this work, we build Snap Video, a… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  6. arXiv:2402.00867  [pdf, other

    cs.CV

    AToM: Amortized Text-to-Mesh using 2D Diffusion

    Authors: Guocheng Qian, Junli Cao, Aliaksandr Siarohin, Yash Kant, Chaoyang Wang, Michael Vasilkovsky, Hsin-Ying Lee, Yuwei Fang, Ivan Skorokhodov, Peiye Zhuang, Igor Gilitschenski, Jian Ren, Bernard Ghanem, Kfir Aberman, Sergey Tulyakov

    Abstract: We introduce Amortized Text-to-Mesh (AToM), a feed-forward text-to-mesh framework optimized across multiple text prompts simultaneously. In contrast to existing text-to-3D methods that often entail time-consuming per-prompt optimization and commonly output representations other than polygonal meshes, AToM directly generates high-quality textured meshes in less than 1 second with around 10 times re… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 19 pages with appendix and references. Webpage: https://snap-research.github.io/AToM/

  7. arXiv:2311.17984  [pdf, other

    cs.CV

    4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling

    Authors: Sherwin Bahmani, Ivan Skorokhodov, Victor Rong, Gordon Wetzstein, Leonidas Guibas, Peter Wonka, Sergey Tulyakov, Jeong Joon Park, Andrea Tagliasacchi, David B. Lindell

    Abstract: Recent breakthroughs in text-to-4D generation rely on pre-trained text-to-image and text-to-video models to generate dynamic 3D scenes. However, current text-to-4D methods face a three-way tradeoff between the quality of scene appearance, 3D structure, and motion. For example, text-to-image models and their 3D-aware variants are trained on internet-scale image datasets and can be used to produce s… ▽ More

    Submitted 26 May, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: CVPR 2024; Project page: https://sherwinbahmani.github.io/4dfy

  8. arXiv:2310.08579  [pdf, other

    cs.CV

    HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion

    Authors: Xian Liu, Jian Ren, Aliaksandr Siarohin, Ivan Skorokhodov, Yanyu Li, Dahua Lin, Xihui Liu, Ziwei Liu, Sergey Tulyakov

    Abstract: Despite significant advances in large-scale text-to-image models, achieving hyper-realistic human image generation remains a desirable yet unsolved task. Existing models like Stable Diffusion and DALL-E 2 tend to generate human images with incoherent parts or unnatural poses. To tackle these challenges, our key insight is that human image is inherently structural over multiple granularities, from… ▽ More

    Submitted 14 March, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: Accepted by ICLR 2024, camera-ready version. Project Page: https://snap-research.github.io/HyperHuman/

  9. arXiv:2308.12366  [pdf, other

    cs.CV

    Continual Zero-Shot Learning through Semantically Guided Generative Random Walks

    Authors: Wenxuan Zhang, Paul Janson, Kai Yi, Ivan Skorokhodov, Mohamed Elhoseiny

    Abstract: Learning novel concepts, remembering previous knowledge, and adapting it to future tasks occur simultaneously throughout a human's lifetime. To model such comprehensive abilities, continual zero-shot learning (CZSL) has recently been introduced. However, most existing methods overused unseen semantic information that may not be continually accessible in realistic settings. In this paper, we addres… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV 2023

  10. arXiv:2306.17843  [pdf, other

    cs.CV

    Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

    Authors: Guocheng Qian, **jie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Skorokhodov, Peter Wonka, Sergey Tulyakov, Bernard Ghanem

    Abstract: We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing… ▽ More

    Submitted 23 July, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

    Comments: webpage: https://guochengqian.github.io/project/magic123/

  11. arXiv:2305.17929  [pdf, other

    cs.CV cs.AI cs.GR

    Factored-NeuS: Reconstructing Surfaces, Illumination, and Materials of Possibly Glossy Objects

    Authors: Yue Fan, Ivan Skorokhodov, Oleg Voynov, Savva Ignatyev, Evgeny Burnaev, Peter Wonka, Yiqun Wang

    Abstract: We develop a method that recovers the surface, materials, and illumination of a scene from its posed multi-view images. In contrast to prior work, it does not require any additional data and can handle glossy objects or bright lighting. It is a progressive inverse rendering approach, which consists of three stages. First, we reconstruct the scene radiance and signed distance function (SDF) with ou… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: 12 pages, 10 figures. Project page: https://authors-hub.github.io/Factored-NeuS

  12. arXiv:2305.05594  [pdf, other

    cs.CV cs.AI cs.GR

    PET-NeuS: Positional Encoding Tri-Planes for Neural Surfaces

    Authors: Yiqun Wang, Ivan Skorokhodov, Peter Wonka

    Abstract: A signed distance function (SDF) parametrized by an MLP is a common ingredient of neural surface reconstruction. We build on the successful recent method NeuS to extend it by three new components. The first component is to borrow the tri-plane representation from EG3D and represent signed distance fields as a mixture of tri-planes and MLPs instead of representing it with MLPs only. Using tri-plane… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: CVPR 2023; 20 Pages; Project page: \url{https://github.com/yiqun-wang/PET-NeuS}

  13. arXiv:2304.04909  [pdf, other

    cs.CV

    SATR: Zero-Shot Semantic Segmentation of 3D Shapes

    Authors: Ahmed Abdelreheem, Ivan Skorokhodov, Maks Ovsjanikov, Peter Wonka

    Abstract: We explore the task of zero-shot semantic segmentation of 3D shapes by using large-scale off-the-shelf 2D image recognition models. Surprisingly, we find that modern zero-shot 2D object detectors are better suited for this task than contemporary text/image similarity predictors or even zero-shot 2D segmentation networks. Our key finding is that it is possible to extract accurate 3D segmentation ma… ▽ More

    Submitted 20 August, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: Project webpage: https://samir55.github.io/SATR/

  14. arXiv:2303.01416  [pdf, other

    cs.CV cs.AI cs.GR

    3D generation on ImageNet

    Authors: Ivan Skorokhodov, Aliaksandr Siarohin, Yinghao Xu, Jian Ren, Hsin-Ying Lee, Peter Wonka, Sergey Tulyakov

    Abstract: Existing 3D-from-2D generators are typically designed for well-curated single-category datasets, where all the objects have (approximately) the same scale, 3D location, and orientation, and the camera always points to the center of the scene. This makes them inapplicable to diverse, in-the-wild datasets of non-alignable scenes rendered from arbitrary camera poses. In this work, we develop a 3D gen… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: ICLR 2023 (Oral)

    Journal ref: ICLR 2023

  15. arXiv:2301.11326  [pdf, other

    cs.CV

    Unsupervised Volumetric Animation

    Authors: Aliaksandr Siarohin, Willi Menapace, Ivan Skorokhodov, Kyle Olszewski, Jian Ren, Hsin-Ying Lee, Menglei Chai, Sergey Tulyakov

    Abstract: We propose a novel approach for unsupervised 3D animation of non-rigid deformable objects. Our method learns the 3D structure and dynamics of objects solely from single-view RGB videos, and can decompose them into semantically meaningful parts that can be tracked and animated. Using a 3D autodecoder framework, paired with a keypoint estimator via a differentiable PnP algorithm, our model learns th… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

  16. arXiv:2212.11984  [pdf, other

    cs.CV

    DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis

    Authors: Yinghao Xu, Menglei Chai, Zifan Shi, Sida Peng, Ivan Skorokhodov, Aliaksandr Siarohin, Ceyuan Yang, Yujun Shen, Hsin-Ying Lee, Bolei Zhou, Sergey Tulyakov

    Abstract: Existing 3D-aware image synthesis approaches mainly focus on generating a single canonical object and show limited capacity in composing a complex scene containing a variety of objects. This work presents DisCoScene: a 3Daware generative model for high-quality and controllable scene synthesis. The key ingredient of our method is a very abstract object-level representation (i.e., 3D bounding boxes… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

    Comments: Project page: https://snap-research.github.io/discoscene/

  17. arXiv:2206.10535  [pdf, other

    cs.CV cs.AI cs.LG

    EpiGRAF: Rethinking training of 3D GANs

    Authors: Ivan Skorokhodov, Sergey Tulyakov, Yiqun Wang, Peter Wonka

    Abstract: A very recent trend in generative modeling is building 3D-aware generators from 2D image collections. To induce the 3D bias, such models typically rely on volumetric rendering, which is expensive to employ at high resolutions. During the past months, there appeared more than 10 works that address this scaling issue by training a separate 2D decoder to upsample a low-resolution image (or a feature… ▽ More

    Submitted 15 December, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022

  18. arXiv:2206.07850  [pdf, other

    cs.CV cs.GR

    HF-NeuS: Improved Surface Reconstruction Using High-Frequency Details

    Authors: Yiqun Wang, Ivan Skorokhodov, Peter Wonka

    Abstract: Neural rendering can be used to reconstruct implicit representations of shapes without 3D supervision. However, current neural surface reconstruction methods have difficulty learning high-frequency geometry details, so the reconstructed shapes are often over-smoothed. We develop HF-NeuS, a novel method to improve the quality of surface reconstruction in neural rendering. We follow recent work to m… ▽ More

    Submitted 22 September, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: To appear in NeurIPS 2022. Project page: https://github.com/yiqun-wang/HFS

  19. arXiv:2112.14683  [pdf, other

    cs.CV cs.AI cs.LG

    StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2

    Authors: Ivan Skorokhodov, Sergey Tulyakov, Mohamed Elhoseiny

    Abstract: Videos show continuous events, yet most $-$ if not all $-$ video synthesis frameworks treat them discretely in time. In this work, we think of videos of what they should be $-$ time-continuous signals, and extend the paradigm of neural representations to build a continuous-time video generator. For this, we first design continuous motion representations through the lens of positional embeddings. T… ▽ More

    Submitted 31 May, 2022; v1 submitted 29 December, 2021; originally announced December 2021.

    Comments: CVPR 2022

  20. arXiv:2104.09757  [pdf, other

    cs.CV cs.AI

    Imaginative Walks: Generative Random Walk Deviation Loss for Improved Unseen Learning Representation

    Authors: Divyansh Jha, Kai Yi, Ivan Skorokhodov, Mohamed Elhoseiny

    Abstract: We propose a novel loss for generative models, dubbed as GRaWD (Generative Random Walk Deviation), to improve learning representations of unexplored visual spaces. Quality learning representation of unseen classes (or styles) is critical to facilitate novel image generation and better generative understanding of unseen visual classes, i.e., zero-shot learning (ZSL). By generating representations o… ▽ More

    Submitted 24 September, 2021; v1 submitted 20 April, 2021; originally announced April 2021.

    Comments: Project homepage: https://imaginative-walks.github.io

  21. arXiv:2104.06954  [pdf, other

    cs.CV cs.AI

    Aligning Latent and Image Spaces to Connect the Unconnectable

    Authors: Ivan Skorokhodov, Grigorii Sotnikov, Mohamed Elhoseiny

    Abstract: In this work, we develop a method to generate infinite high-resolution images with diverse and complex content. It is based on a perfectly equivariant generator with synchronous interpolations in the image and latent spaces. Latent codes, when sampled, are positioned on the coordinate grid, and each pixel is computed from an interpolation of the nearby style codes. We modify the AdaIN mechanism to… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

  22. arXiv:2012.13257  [pdf, other

    cs.CV cs.GR

    Interpolating Points on a Non-Uniform Grid using a Mixture of Gaussians

    Authors: Ivan Skorokhodov

    Abstract: In this work, we propose an approach to perform non-uniform image interpolation based on a Gaussian Mixture Model. Traditional image interpolation methods, like nearest neighbor, bilinear, Hamming, Lanczos, etc. assume that the coordinates you want to interpolate from, are positioned on a uniform grid. However, it is not always the case in practice and we develop an interpolation method that is ab… ▽ More

    Submitted 24 December, 2020; originally announced December 2020.

    Comments: 5 figures, 2 equations

  23. arXiv:2011.12026  [pdf, other

    cs.CV cs.AI cs.LG

    Adversarial Generation of Continuous Images

    Authors: Ivan Skorokhodov, Savva Ignatyev, Mohamed Elhoseiny

    Abstract: In most existing learning systems, images are typically viewed as 2D pixel arrays. However, in another paradigm gaining popularity, a 2D image is represented as an implicit neural representation (INR) - an MLP that predicts an RGB pixel value given its (x,y) coordinate. In this paper, we propose two novel architectural techniques for building INR-based image decoders: factorized multiplicative mod… ▽ More

    Submitted 28 June, 2021; v1 submitted 24 November, 2020; originally announced November 2020.

    Comments: 19 pages, 17 figures

  24. arXiv:2006.11328  [pdf, other

    cs.LG cs.CV stat.ML

    Class Normalization for (Continual)? Generalized Zero-Shot Learning

    Authors: Ivan Skorokhodov, Mohamed Elhoseiny

    Abstract: Normalization techniques have proved to be a crucial ingredient of successful training in a traditional supervised learning regime. However, in the zero-shot learning (ZSL) world, these ideas have received only marginal attention. This work studies normalization in ZSL scenario from both theoretical and practical perspectives. First, we give a theoretical explanation to two popular tricks used in… ▽ More

    Submitted 14 April, 2021; v1 submitted 19 June, 2020; originally announced June 2020.

    Comments: 22 pages, 7 figures, 7 tables

  25. arXiv:1910.03867  [pdf, other

    cs.LG stat.ML

    Loss Landscape Sightseeing with Multi-Point Optimization

    Authors: Ivan Skorokhodov, Mikhail Burtsev

    Abstract: We present multi-point optimization: an optimization technique that allows to train several models simultaneously without the need to keep the parameters of each one individually. The proposed method is used for a thorough empirical analysis of the loss landscape of neural networks. By extensive experiments on FashionMNIST and CIFAR10 datasets we demonstrate two things: 1) loss surface is surprisi… ▽ More

    Submitted 14 October, 2019; v1 submitted 9 October, 2019; originally announced October 2019.