Skip to main content

Showing 1–25 of 25 results for author: Holynski, A

.
  1. arXiv:2406.09417  [pdf, other

    cs.CV cs.GR cs.LG

    Rethinking Score Distillation as a Bridge Between Image Distributions

    Authors: David McAllister, Songwei Ge, Jia-Bin Huang, David W. Jacobs, Alexei A. Efros, Aleksander Holynski, Angjoo Kanazawa

    Abstract: Score distillation sampling (SDS) has proven to be an important tool, enabling the use of large-scale diffusion priors for tasks operating in data-poor domains. Unfortunately, SDS has a number of characteristic artifacts that limit its usefulness in general-purpose applications. In this paper, we make progress toward understanding the behavior of SDS and its variants by viewing them as solving an… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project webpage: https://sds-bridge.github.io/

  2. arXiv:2406.06133  [pdf, other

    cs.CV

    ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models

    Authors: Meng-Li Shih, Wei-Chiu Ma, Lorenzo Boyice, Aleksander Holynski, Forrester Cole, Brian L. Curless, Janne Kontkanen

    Abstract: We propose ExtraNeRF, a novel method for extrapolating the range of views handled by a Neural Radiance Field (NeRF). Our main idea is to leverage NeRFs to model scene-specific, fine-grained details, while capitalizing on diffusion models to extrapolate beyond our observed data. A key ingredient is to track visibility to determine what portions of the scene have not been observed, and focus on reco… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 8 pages, 8 figures, CVPR2024

  3. arXiv:2405.10314  [pdf, other

    cs.CV

    CAT3D: Create Anything in 3D with Multi-View Diffusion Models

    Authors: Ruiqi Gao, Aleksander Holynski, Philipp Henzler, Arthur Brussee, Ricardo Martin-Brualla, Pratul Srinivasan, Jonathan T. Barron, Ben Poole

    Abstract: Advances in 3D reconstruction have enabled high-quality 3D capture, but require a user to collect hundreds to thousands of images to create a 3D scene. We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model. Given any number of input images and a set of target novel viewpoints, our model generates highly consistent nov… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Project page: https://cat3d.github.io

  4. arXiv:2405.08210  [pdf, other

    cs.CV

    Infinite Texture: Text-guided High Resolution Diffusion Texture Synthesis

    Authors: Yifan Wang, Aleksander Holynski, Brian L. Curless, Steven M. Seitz

    Abstract: We present Infinite Texture, a method for generating arbitrarily large texture images from a text prompt. Our approach fine-tunes a diffusion model on a single texture, and learns to embed that statistical distribution in the output domain of the model. We seed this fine-tuning process with a sample texture patch, which can be optionally generated from a text-to-image model like DALL-E 2. At gener… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  5. arXiv:2404.01203  [pdf, other

    cs.CV

    Video Interpolation with Diffusion Models

    Authors: Siddhant Jain, Daniel Watson, Eric Tabellion, Aleksander Hołyński, Ben Poole, Janne Kontkanen

    Abstract: We present VIDIM, a generative model for video interpolation, which creates short videos given a start and end frame. In order to achieve high fidelity and generate motions unseen in the input data, VIDIM uses cascaded diffusion models to first generate the target video at low resolution, and then generate the high-resolution video conditioned on the low-resolution generated video. We compare VIDI… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, Project page at https://vidim-interpolation.github.io/

  6. arXiv:2402.16936  [pdf, other

    cs.CV cs.LG

    Disentangled 3D Scene Generation with Layout Learning

    Authors: Dave Epstein, Ben Poole, Ben Mildenhall, Alexei A. Efros, Aleksander Holynski

    Abstract: We introduce a method to generate 3D scenes that are disentangled into their component objects. This disentanglement is unsupervised, relying only on the knowledge of a large pretrained text-to-image model. Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene. Concretely, our method jo… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  7. arXiv:2312.04560  [pdf, other

    cs.CV cs.AI cs.GR

    NeRFiller: Completing Scenes via Generative 3D Inpainting

    Authors: Ethan Weber, Aleksander Hołyński, Varun Jampani, Saurabh Saxena, Noah Snavely, Abhishek Kar, Angjoo Kanazawa

    Abstract: We propose NeRFiller, an approach that completes missing portions of a 3D capture via generative 3D inpainting using off-the-shelf 2D visual generative models. Often parts of a captured 3D scene or object are missing due to mesh reconstruction failures or a lack of observations (e.g., contact regions, such as the bottom of objects, or hard-to-reach areas). We approach this challenging 3D inpaintin… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Project page: https://ethanweber.me/nerfiller

  8. arXiv:2312.02981  [pdf, other

    cs.CV

    ReconFusion: 3D Reconstruction with Diffusion Priors

    Authors: Rundi Wu, Ben Mildenhall, Philipp Henzler, Keunhong Park, Ruiqi Gao, Daniel Watson, Pratul P. Srinivasan, Dor Verbin, Jonathan T. Barron, Ben Poole, Aleksander Holynski

    Abstract: 3D reconstruction methods such as Neural Radiance Fields (NeRFs) excel at rendering photorealistic novel views of complex scenes. However, recovering a high-quality NeRF typically requires tens to hundreds of input images, resulting in a time-consuming capture process. We present ReconFusion to reconstruct real-world scenes using only a few photos. Our approach leverages a diffusion prior for nove… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: Project page: https://reconfusion.github.io/

  9. arXiv:2312.02150  [pdf, other

    cs.CV

    Readout Guidance: Learning Control from Diffusion Features

    Authors: Grace Luo, Trevor Darrell, Oliver Wang, Dan B Goldman, Aleksander Holynski

    Abstract: We present Readout Guidance, a method for controlling text-to-image diffusion models with learned signals. Readout Guidance uses readout heads, lightweight networks trained to extract signals from the features of a pre-trained, frozen diffusion model at every timestep. These readouts can encode single-image properties, such as pose, depth, and edges; or higher-order properties that relate multiple… ▽ More

    Submitted 2 April, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: CVPR 2024

  10. arXiv:2312.02149  [pdf, other

    cs.CV cs.AI cs.CL cs.GR

    Generative Powers of Ten

    Authors: Xiaojuan Wang, Janne Kontkanen, Brian Curless, Steve Seitz, Ira Kemelmacher, Ben Mildenhall, Pratul Srinivasan, Dor Verbin, Aleksander Holynski

    Abstract: We present a method that uses a text-to-image model to generate consistent content across multiple image scales, enabling extreme semantic zooms into a scene, e.g., ranging from a wide-angle landscape view of a forest to a macro shot of an insect sitting on one of the tree branches. We achieve this through a joint multi-scale diffusion sampling approach that encourages consistency across different… ▽ More

    Submitted 21 May, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Project page: https://powers-of-10.github.io/

  11. arXiv:2310.07204  [pdf, other

    cs.AI cs.CV cs.GR cs.LG

    State of the Art on Diffusion Models for Visual Computing

    Authors: Ryan Po, Wang Yifan, Vladislav Golyanik, Kfir Aberman, Jonathan T. Barron, Amit H. Bermano, Eric Ryan Chan, Tali Dekel, Aleksander Holynski, Angjoo Kanazawa, C. Karen Liu, Lingjie Liu, Ben Mildenhall, Matthias Nießner, Björn Ommer, Christian Theobalt, Peter Wonka, Gordon Wetzstein

    Abstract: The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. In these domains, diffusion models are the generative AI architecture of choice. Within the last year alone, the literature on diffusion-based tools and applicat… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  12. arXiv:2309.16668  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    RealFill: Reference-Driven Generation for Authentic Image Completion

    Authors: Luming Tang, Nataniel Ruiz, Qinghao Chu, Yuanzhen Li, Aleksander Holynski, David E. Jacobs, Bharath Hariharan, Yael Pritch, Neal Wadhwa, Kfir Aberman, Michael Rubinstein

    Abstract: Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions. However, the content these models hallucinate is necessarily inauthentic, since they are unaware of the true scene. In this work, we propose RealFill, a novel generative approach for image completion that fills in missing regions of a… ▽ More

    Submitted 14 May, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: SIGGRAPH 2024 (Journal Track). Project page: https://realfill.github.io

  13. arXiv:2309.07906  [pdf, other

    cs.CV

    Generative Image Dynamics

    Authors: Zhengqi Li, Richard Tucker, Noah Snavely, Aleksander Holynski

    Abstract: We present an approach to modeling an image-space prior on scene motion. Our prior is learned from a collection of motion trajectories extracted from real video sequences depicting natural, oscillatory dynamics such as trees, flowers, candles, and clothes swaying in the wind. We model this dense, long-term motion prior in the Fourier domain:given a single image, our trained model uses a frequency-… ▽ More

    Submitted 14 May, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Project website: http://generative-dynamics.github.io

  14. arXiv:2306.05422  [pdf, other

    cs.CV

    Tracking Everything Everywhere All at Once

    Authors: Qianqian Wang, Yen-Yu Chang, Ruo** Cai, Zhengqi Li, Bharath Hariharan, Aleksander Holynski, Noah Snavely

    Abstract: We present a new test-time optimization method for estimating dense and long-range motion from a video sequence. Prior optical flow or particle video tracking algorithms typically operate within limited temporal windows, struggling to track through occlusions and maintain global consistency of estimated motion trajectories. We propose a complete and globally consistent motion representation, dubbe… ▽ More

    Submitted 12 September, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: ICCV 2023

  15. arXiv:2306.00986  [pdf, other

    cs.CV cs.LG stat.ML

    Diffusion Self-Guidance for Controllable Image Generation

    Authors: Dave Epstein, Allan Jabri, Ben Poole, Alexei A. Efros, Aleksander Holynski

    Abstract: Large-scale generative models are capable of producing high-quality images from detailed text descriptions. However, many aspects of an image are difficult or impossible to convey through text. We introduce self-guidance, a method that provides greater control over generated images by guiding the internal representations of diffusion models. We demonstrate that properties such as the shape, locati… ▽ More

    Submitted 11 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Project page at https://dave.ml/selfguidance/

  16. arXiv:2305.14334  [pdf, other

    cs.CV

    Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence

    Authors: Grace Luo, Lisa Dunlap, Dong Huk Park, Aleksander Holynski, Trevor Darrell

    Abstract: Diffusion models have been shown to be capable of generating high-quality images, suggesting that they could contain meaningful internal representations. Unfortunately, the feature maps that encode a diffusion model's internal information are spread not only over layers of the network, but also over diffusion timesteps, making it challenging to extract useful descriptors. We propose Diffusion Hype… ▽ More

    Submitted 1 April, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  17. arXiv:2304.10532  [pdf, other

    cs.CV cs.AI cs.GR

    Nerfbusters: Removing Ghostly Artifacts from Casually Captured NeRFs

    Authors: Frederik Warburg, Ethan Weber, Matthew Tancik, Aleksander Holynski, Angjoo Kanazawa

    Abstract: Casually captured Neural Radiance Fields (NeRFs) suffer from artifacts such as floaters or flawed geometry when rendered outside the camera trajectory. Existing evaluation protocols often do not capture these effects, since they usually only assess image quality at every 8th frame of the training capture. To push forward progress in novel-view synthesis, we propose a new dataset and evaluation pro… ▽ More

    Submitted 17 October, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: ICCV 2023, project page: https://ethanweber.me/nerfbusters

  18. arXiv:2304.06025  [pdf, other

    cs.CV

    DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion

    Authors: Johanna Karras, Aleksander Holynski, Ting-Chun Wang, Ira Kemelmacher-Shlizerman

    Abstract: We present DreamPose, a diffusion-based method for generating animated fashion videos from still images. Given an image and a sequence of human body poses, our method synthesizes a video containing both human and fabric motion. To achieve this, we transform a pretrained text-to-image model (Stable Diffusion) into a pose-and-image guided video synthesis model, using a novel fine-tuning strategy, a… ▽ More

    Submitted 30 October, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

    Comments: Project page: https://grail.cs.washington.edu/projects/dreampose/

  19. arXiv:2303.12789  [pdf, other

    cs.CV cs.GR

    Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions

    Authors: Ayaan Haque, Matthew Tancik, Alexei A. Efros, Aleksander Holynski, Angjoo Kanazawa

    Abstract: We propose a method for editing NeRF scenes with text-instructions. Given a NeRF of a scene and the collection of images used to reconstruct it, our method uses an image-conditioned diffusion model (InstructPix2Pix) to iteratively edit the input images while optimizing the underlying scene, resulting in an optimized 3D scene that respects the edit instruction. We demonstrate that our proposed meth… ▽ More

    Submitted 1 June, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

    Comments: Project website: https://instruct-nerf2nerf.github.io; v1. Revisions to related work and discussion

  20. arXiv:2211.09800  [pdf, other

    cs.CV cs.AI cs.CL cs.GR cs.LG

    InstructPix2Pix: Learning to Follow Image Editing Instructions

    Authors: Tim Brooks, Aleksander Holynski, Alexei A. Efros

    Abstract: We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image. To obtain training data for this problem, we combine the knowledge of two large pretrained models -- a language model (GPT-3) and a text-to-image model (Stable Diffusion) -- to generate a large datase… ▽ More

    Submitted 18 January, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: Project page with code: https://www.timothybrooks.com/instruct-pix2pix

  21. arXiv:2204.03648  [pdf, other

    cs.CV

    SunStage: Portrait Reconstruction and Relighting using the Sun as a Light Stage

    Authors: Yifan Wang, Aleksander Holynski, Xiuming Zhang, Xuaner Zhang

    Abstract: A light stage uses a series of calibrated cameras and lights to capture a subject's facial appearance under varying illumination and viewpoint. This captured information is crucial for facial reconstruction and relighting. Unfortunately, light stages are often inaccessible: they are expensive and require significant technical expertise for construction and operation. In this paper, we present SunS… ▽ More

    Submitted 24 March, 2023; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: CVPR 2023. Project page: https://sunstage.cs.washington.edu/

  22. arXiv:2011.15128  [pdf, other

    cs.CV cs.GR

    Animating Pictures with Eulerian Motion Fields

    Authors: Aleksander Holynski, Brian Curless, Steven M. Seitz, Richard Szeliski

    Abstract: In this paper, we demonstrate a fully automatic method for converting a still image into a realistic animated loo** video. We target scenes with continuous fluid motion, such as flowing water and billowing smoke. Our method relies on the observation that this type of natural motion can be convincingly reproduced from a static Eulerian motion description, i.e. a single, temporally constant flow f… ▽ More

    Submitted 30 November, 2020; originally announced November 2020.

  23. arXiv:2008.12295  [pdf, other

    cs.CV

    Reducing Drift in Structure From Motion Using Extended Features

    Authors: Aleksander Holynski, David Geraghty, Jan-Michael Frahm, Chris Sweeney, Richard Szeliski

    Abstract: Low-frequency long-range errors (drift) are an endemic problem in 3D structure from motion, and can often hamper reasonable reconstructions of the scene. In this paper, we present a method to dramatically reduce scale and positional drift by using extended structural features such as planes and vanishing points. Unlike traditional feature matches, our extended features are able to span non-overlap… ▽ More

    Submitted 13 October, 2020; v1 submitted 27 August, 2020; originally announced August 2020.

    Comments: 3DV 2020

  24. arXiv:2001.04642  [pdf, other

    cs.CV

    Seeing the World in a Bag of Chips

    Authors: Jeong Joon Park, Aleksander Holynski, Steve Seitz

    Abstract: We address the dual problems of novel view synthesis and environment reconstruction from hand-held RGBD sensors. Our contributions include 1) modeling highly specular objects, 2) modeling inter-reflections and Fresnel effects, and 3) enabling surface light field reconstruction with the same input needed to reconstruct shape alone. In cases where scene surface has a strong mirror-like material comp… ▽ More

    Submitted 15 June, 2020; v1 submitted 14 January, 2020; originally announced January 2020.

    Comments: CVPR 2020

  25. arXiv:1906.03539  [pdf, other

    cs.CV

    Structure from Motion for Panorama-Style Videos

    Authors: Chris Sweeney, Aleksander Holynski, Brian Curless, Steve M Seitz

    Abstract: We present a novel Structure from Motion pipeline that is capable of reconstructing accurate camera poses for panorama-style video capture without prior camera intrinsic calibration. While panorama-style capture is common and convenient, previous reconstruction methods fail to obtain accurate reconstructions due to the rotation-dominant motion and small baseline between views. Our method is built… ▽ More

    Submitted 8 June, 2019; originally announced June 2019.