Skip to main content

Showing 1–22 of 22 results for author: Cole, F

.
  1. arXiv:2406.06133  [pdf, other

    cs.CV

    ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models

    Authors: Meng-Li Shih, Wei-Chiu Ma, Lorenzo Boyice, Aleksander Holynski, Forrester Cole, Brian L. Curless, Janne Kontkanen

    Abstract: We propose ExtraNeRF, a novel method for extrapolating the range of views handled by a Neural Radiance Field (NeRF). Our main idea is to leverage NeRFs to model scene-specific, fine-grained details, while capitalizing on diffusion models to extrapolate beyond our observed data. A key ingredient is to track visibility to determine what portions of the scene have not been observed, and focus on reco… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 8 pages, 8 figures, CVPR2024

  2. arXiv:2404.03145  [pdf, other

    cs.CV

    DreamWalk: Style Space Exploration using Diffusion Guidance

    Authors: Michelle Shu, Charles Herrmann, Richard Strong Bowen, Forrester Cole, Ramin Zabih

    Abstract: Text-conditioned diffusion models can generate impressive images, but fall short when it comes to fine-grained control. Unlike direct-editing tools like Photoshop, text conditioned models require the artist to perform "prompt engineering," constructing special text sentences to control the style or amount of a particular subject present in the output image. Our goal is to provide fine-grained cont… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  3. arXiv:2402.08082  [pdf, ps, other

    stat.ML cs.LG

    Score-based generative models break the curse of dimensionality in learning a family of sub-Gaussian probability distributions

    Authors: Frank Cole, Yulong Lu

    Abstract: While score-based generative models (SGMs) have achieved remarkable success in enormous image generation tasks, their mathematical foundations are still limited. In this paper, we analyze the approximation and generalization of SGMs in learning a family of sub-Gaussian probability distributions. We introduce a notion of complexity for probability distributions in terms of their relative density wi… ▽ More

    Submitted 23 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: 33 pages, to appear in the proceedings of 12th International Conference on Learning Representations

  4. arXiv:2312.03884  [pdf, other

    cs.CV cs.GR

    WonderJourney: Going from Anywhere to Everywhere

    Authors: Hong-Xing Yu, Haoyi Duan, Junhwa Hur, Kyle Sargent, Michael Rubinstein, William T. Freeman, Forrester Cole, Deqing Sun, Noah Snavely, Jiajun Wu, Charles Herrmann

    Abstract: We introduce WonderJourney, a modularized framework for perpetual 3D scene generation. Unlike prior work on view generation that focuses on a single type of scenes, we start at any user-provided location (by a text description or an image) and generate a journey through a long sequence of diverse yet coherently connected 3D scenes. We leverage an LLM to generate textual descriptions of the scenes… ▽ More

    Submitted 12 April, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: Project website with video results: https://kovenyu.com/WonderJourney/

  5. arXiv:2311.13600  [pdf, other

    cs.CV cs.GR cs.LG

    ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

    Authors: Viraj Shah, Nataniel Ruiz, Forrester Cole, Erika Lu, Svetlana Lazebnik, Yuanzhen Li, Varun Jampani

    Abstract: Methods for finetuning generative models for concept-driven personalization generally achieve strong results for subject-driven or style-driven generation. Recently, low-rank adaptations (LoRA) have been proposed as a parameter-efficient way of achieving concept-driven personalization. While recent work explores the combination of separate LoRAs to achieve joint generation of learned styles and su… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: Project page: https://ziplora.github.io

  6. arXiv:2306.05428  [pdf, other

    cs.CV

    Background Prompting for Improved Object Depth

    Authors: Manel Baradad, Yuanzhen Li, Forrester Cole, Michael Rubinstein, Antonio Torralba, William T. Freeman, Varun Jampani

    Abstract: Estimating the depth of objects from a single image is a valuable task for many vision, robotics, and graphics applications. However, current methods often fail to produce accurate depth for objects in diverse scenes. In this work, we propose a simple yet effective Background Prompting strategy that adapts the input object image with a learned background. We learn the background prompts only using… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

  7. arXiv:2211.14020  [pdf, other

    cs.CV

    SCOOP: Self-Supervised Correspondence and Optimization-Based Scene Flow

    Authors: Itai Lang, Dror Aiger, Forrester Cole, Shai Avidan, Michael Rubinstein

    Abstract: Scene flow estimation is a long-standing problem in computer vision, where the goal is to find the 3D motion of a scene from its consecutive observations. Recently, there have been efforts to compute the scene flow from 3D point clouds. A common approach is to train a regression model that consumes source and target point clouds and outputs the per-point translation vector. An alternative is to le… ▽ More

    Submitted 13 April, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: CVPR 2023. Project page: https://itailang.github.io/SCOOP/

  8. arXiv:2211.11082  [pdf, other

    cs.CV

    DynIBaR: Neural Dynamic Image-Based Rendering

    Authors: Zhengqi Li, Qianqian Wang, Forrester Cole, Richard Tucker, Noah Snavely

    Abstract: We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene. State-of-the-art methods based on temporally varying Neural Radiance Fields (aka dynamic NeRFs) have shown impressive results on this task. However, for long videos with complex object motions and uncontrolled camera trajectories, these methods can produce blurry or inaccurate renderings, h… ▽ More

    Submitted 24 April, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

    Comments: Award Candidate, CVPR 2023 Project page: dynibar.github.io

  9. arXiv:2205.15838  [pdf, other

    cs.CV

    D$^2$NeRF: Self-Supervised Decoupling of Dynamic and Static Objects from a Monocular Video

    Authors: Tianhao Wu, Fangcheng Zhong, Andrea Tagliasacchi, Forrester Cole, Cengiz Oztireli

    Abstract: Given a monocular video, segmenting and decoupling dynamic objects while recovering the static environment is a widely studied problem in machine intelligence. Existing solutions usually approach this problem in the image domain, limiting their performance and understanding of the environment. We introduce Decoupled Dynamic Neural Radiance Field (D$^2$NeRF), a self-supervised approach that takes a… ▽ More

    Submitted 5 November, 2022; v1 submitted 31 May, 2022; originally announced May 2022.

  10. arXiv:2110.11325  [pdf, other

    cs.CV

    Learning 3D Semantic Segmentation with only 2D Image Supervision

    Authors: Kyle Genova, Xiaoqi Yin, Abhijit Kundu, Caroline Pantofaru, Forrester Cole, Avneesh Sud, Brian Brewington, Brian Shucker, Thomas Funkhouser

    Abstract: With the recent growth of urban map** and autonomous driving efforts, there has been an explosion of raw 3D data collected from terrestrial platforms with lidar scanners and color cameras. However, due to high labeling costs, ground-truth 3D semantic segmentation annotations are limited in both quantity and geographic diversity, while also being difficult to transfer across sensors. In contrast,… ▽ More

    Submitted 21 October, 2021; originally announced October 2021.

    Comments: Accepted to 3DV 2021 (Oral)

  11. arXiv:2108.04886  [pdf, other

    cs.GR cs.CV

    Differentiable Surface Rendering via Non-Differentiable Sampling

    Authors: Forrester Cole, Kyle Genova, Avneesh Sud, Daniel Vlasic, Zhoutong Zhang

    Abstract: We present a method for differentiable rendering of 3D surfaces that supports both explicit and implicit representations, provides derivatives at occlusion boundaries, and is fast and simple to implement. The method first samples the surface using non-differentiable rasterization, then applies differentiable, depth-aware point splatting to produce the final image. Our approach requires no differen… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

    Comments: Accepted to ICCV 2021

  12. Consistent Depth of Moving Objects in Video

    Authors: Zhoutong Zhang, Forrester Cole, Richard Tucker, William T. Freeman, Tali Dekel

    Abstract: We present a method to estimate depth of a dynamic scene, containing arbitrary moving objects, from an ordinary video captured with a moving camera. We seek a geometrically and temporally consistent solution to this underconstrained problem: the depth predictions of corresponding points across frames should induce plausible, smooth motion in 3D. We formulate this objective in a new test-time train… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

    Comments: Published at SIGGRAPH 2021

    Journal ref: ACM Trans. Graph., Vol. 40, No. 4, Article 148, August 2021

  13. arXiv:2105.06993  [pdf, other

    cs.CV

    Omnimatte: Associating Objects and Their Effects in Video

    Authors: Erika Lu, Forrester Cole, Tali Dekel, Andrew Zisserman, William T. Freeman, Michael Rubinstein

    Abstract: Computer vision is increasingly effective at segmenting objects in images and videos; however, scene effects related to the objects -- shadows, reflections, generated smoke, etc -- are typically overlooked. Identifying such scene effects and associating them with the objects producing them is important for improving our fundamental understanding of visual scenes, and can also assist a variety of a… ▽ More

    Submitted 30 September, 2021; v1 submitted 14 May, 2021; originally announced May 2021.

    Comments: CVPR 2021 Oral. Project webpage: https://omnimatte.github.io/. Added references

  14. arXiv:2105.02976  [pdf, other

    cs.CV cs.GR

    LASR: Learning Articulated Shape Reconstruction from a Monocular Video

    Authors: Gengshan Yang, Deqing Sun, Varun Jampani, Daniel Vlasic, Forrester Cole, Huiwen Chang, Deva Ramanan, William T. Freeman, Ce Liu

    Abstract: Remarkable progress has been made in 3D reconstruction of rigid structures from a video or a collection of images. However, it is still challenging to reconstruct nonrigid structures from RGB inputs, due to its under-constrained nature. While template-based approaches, such as parametric shape models, have achieved great success in modeling the "closed world" of known object categories, they canno… ▽ More

    Submitted 6 May, 2021; originally announced May 2021.

    Comments: CVPR 2021. Project page: https://lasr-google.github.io/

  15. arXiv:2009.07833  [pdf, other

    cs.CV cs.GR

    Layered Neural Rendering for Retiming People in Video

    Authors: Erika Lu, Forrester Cole, Tali Dekel, Weidi Xie, Andrew Zisserman, David Salesin, William T. Freeman, Michael Rubinstein

    Abstract: We present a method for retiming people in an ordinary, natural video -- manipulating and editing the time in which different motions of individuals in the video occur. We can temporally align different motions, change the speed of certain actions (speeding up/slowing down, or entirely "freezing" people), or "erase" selected people from the video altogether. We achieve these effects computationall… ▽ More

    Submitted 30 September, 2021; v1 submitted 16 September, 2020; originally announced September 2020.

    Comments: In SIGGRAPH Asia 2020. Project webpage: https://retiming.github.io/. Added references

  16. arXiv:1912.06126  [pdf, other

    cs.CV cs.GR

    Local Deep Implicit Functions for 3D Shape

    Authors: Kyle Genova, Forrester Cole, Avneesh Sud, Aaron Sarna, Thomas Funkhouser

    Abstract: The goal of this project is to learn a 3D shape representation that enables accurate surface reconstruction, compact storage, efficient computation, consistency for similar shapes, generalization across diverse shape categories, and inference from depth camera observations. Towards this end, we introduce Local Deep Implicit Functions (LDIF), a 3D shape representation that decomposes space into a s… ▽ More

    Submitted 11 June, 2020; v1 submitted 12 December, 2019; originally announced December 2019.

    Comments: Camera ready version for CVPR 2020 Oral. Prior to review, this paper was referred to as DSIF, "Deep Structured Implicit Functions." 11 pages, 9 figures. Project video at https://youtu.be/3RAITzNWVJs

  17. arXiv:1906.07889  [pdf, other

    cs.CV

    Unsupervised Learning of Object Structure and Dynamics from Videos

    Authors: Matthias Minderer, Chen Sun, Ruben Villegas, Forrester Cole, Kevin Murphy, Honglak Lee

    Abstract: Extracting and predicting object structure and dynamics from videos without supervision is a major challenge in machine learning. To address this challenge, we adopt a keypoint-based image representation and learn a stochastic dynamics model of the keypoints. Future frames are reconstructed from the keypoints and a reference frame. By modeling dynamics in the keypoint coordinate space, we achieve… ▽ More

    Submitted 2 March, 2020; v1 submitted 18 June, 2019; originally announced June 2019.

    Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

  18. arXiv:1904.11111  [pdf, other

    cs.CV

    Learning the Depths of Moving People by Watching Frozen People

    Authors: Zhengqi Li, Tali Dekel, Forrester Cole, Richard Tucker, Noah Snavely, Ce Liu, William T. Freeman

    Abstract: We present a method for predicting dense depth in scenarios where both a monocular camera and people in the scene are freely moving. Existing methods for recovering depth for dynamic, non-rigid objects from monocular video impose strong assumptions on the objects' motion and may only recover sparse depth. In this paper, we take a data-driven approach and learn human depth priors from a new source… ▽ More

    Submitted 24 April, 2019; originally announced April 2019.

    Comments: CVPR 2019 (Oral)

  19. arXiv:1904.06447  [pdf, other

    cs.CV cs.GR

    Learning Shape Templates with Structured Implicit Functions

    Authors: Kyle Genova, Forrester Cole, Daniel Vlasic, Aaron Sarna, William T. Freeman, Thomas Funkhouser

    Abstract: Template 3D shapes are useful for many tasks in graphics and vision, including fitting observation data, analyzing shape collections, and transferring shape attributes. Because of the variety of geometry and topology of real-world shapes, previous methods generally use a library of hand-made templates. In this paper, we investigate learning a general shape template from data. To allow for widely v… ▽ More

    Submitted 12 April, 2019; originally announced April 2019.

    Comments: 12 pages, 9 figures, 4 tables

  20. arXiv:1806.06098  [pdf, other

    cs.CV

    Unsupervised Training for 3D Morphable Model Regression

    Authors: Kyle Genova, Forrester Cole, Aaron Maschinot, Aaron Sarna, Daniel Vlasic, William T. Freeman

    Abstract: We present a method for training a regression network from image pixels to 3D morphable model coordinates using only unlabeled photographs. The training loss is based on features from a facial recognition network, computed on-the-fly by rendering the predicted faces with a differentiable renderer. To make training from features feasible and avoid network fooling effects, we introduce three objecti… ▽ More

    Submitted 15 June, 2018; originally announced June 2018.

    Comments: CVPR 2018 version with supplemental material (http://openaccess.thecvf.com/content_cvpr_2018/html/Genova_Unsupervised_Training_for_CVPR_2018_paper.html)

    Journal ref: Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8377-8386

  21. arXiv:1711.05139  [pdf, other

    cs.CV

    XGAN: Unsupervised Image-to-Image Translation for Many-to-Many Map**s

    Authors: Amélie Royer, Konstantinos Bousmalis, Stephan Gouws, Fred Bertsch, Inbar Mosseri, Forrester Cole, Kevin Murphy

    Abstract: Style transfer usually refers to the task of applying color and texture information from a specific style image to a given content image while preserving the structure of the latter. Here we tackle the more generic problem of semantic style transfer: given two unpaired collections of images, we aim to learn a map** between the corpus-level style of each collection, while preserving semantic cont… ▽ More

    Submitted 10 July, 2018; v1 submitted 14 November, 2017; originally announced November 2017.

    Comments: Domain Adaptation for Visual Understanding at ICML'18

  22. arXiv:1701.04851  [pdf, other

    cs.CV stat.ML

    Synthesizing Normalized Faces from Facial Identity Features

    Authors: Forrester Cole, David Belanger, Dilip Krishnan, Aaron Sarna, Inbar Mosseri, William T. Freeman

    Abstract: We present a method for synthesizing a frontal, neutral-expression image of a person's face given an input face photograph. This is achieved by learning to generate facial landmarks and textures from features extracted from a facial-recognition network. Unlike previous approaches, our encoding feature vector is largely invariant to lighting, pose, and facial expression. Exploiting this invariance,… ▽ More

    Submitted 17 October, 2017; v1 submitted 17 January, 2017; originally announced January 2017.