Skip to main content

Showing 1–50 of 93 results for author: Freeman, W T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09394  [pdf, other

    cs.CV cs.GR

    WonderWorld: Interactive 3D Scene Generation from a Single Image

    Authors: Hong-Xing Yu, Haoyi Duan, Charles Herrmann, William T. Freeman, Jiajun Wu

    Abstract: We present WonderWorld, a novel framework for interactive 3D scene extrapolation that enables users to explore and shape virtual environments based on a single input image and user-specified text. While significant improvements have been made to the visual quality of scene generation, existing methods are run offline, taking tens of minutes to hours to generate a scene. By leveraging Fast Gaussian… ▽ More

    Submitted 14 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Project website: https://WonderWorld-2024.github.io/

  2. arXiv:2406.05629  [pdf, other

    cs.CV cs.CL cs.IR cs.LG cs.SD eess.AS

    Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

    Authors: Mark Hamilton, Andrew Zisserman, John R. Hershey, William T. Freeman

    Abstract: We present DenseAV, a novel dual encoder grounding architecture that learns high-resolution, semantically meaningful, and audio-visually aligned features solely through watching videos. We show that DenseAV can discover the ``meaning'' of words and the ``location'' of sounds without explicit localization supervision. Furthermore, it automatically discovers and distinguishes between these two types… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: Computer Vision and Pattern Recognition 2024

  3. arXiv:2406.02785  [pdf, other

    astro-ph.IM cs.LG eess.IV

    Event-horizon-scale Imaging of M87* under Different Assumptions via Deep Generative Image Priors

    Authors: Berthy T. Feng, Katherine L. Bouman, William T. Freeman

    Abstract: Reconstructing images from the Event Horizon Telescope (EHT) observations of M87*, the supermassive black hole at the center of the galaxy M87, depends on a prior to impose desired image statistics. However, given the impossibility of directly observing black holes, there is no clear choice for a prior. We present a framework for flexibly designing a range of priors, each bringing different biases… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  4. arXiv:2405.14867  [pdf, other

    cs.CV

    Improved Distribution Matching Distillation for Fast Image Synthesis

    Authors: Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, William T. Freeman

    Abstract: Recent approaches have shown promises distilling diffusion models into efficient one-step generators. Among them, Distribution Matching Distillation (DMD) produces one-step generators that match their teacher in distribution, without enforcing a one-to-one correspondence with the sampling trajectories of their teachers. However, to ensure stable training, DMD requires an additional regression loss… ▽ More

    Submitted 24 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Code, model, and dataset are available at https://tianweiy.github.io/dmd2

  5. arXiv:2404.13026  [pdf, other

    cs.CV cs.AI

    PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation

    Authors: Tianyuan Zhang, Hong-Xing Yu, Rundi Wu, Brandon Y. Feng, Changxi Zheng, Noah Snavely, Jiajun Wu, William T. Freeman

    Abstract: Realistic object interactions are crucial for creating immersive virtual experiences, yet synthesizing realistic 3D object dynamics in response to novel interactions remains a significant challenge. Unlike unconditional or text-conditioned dynamics generation, action-conditioned dynamics requires perceiving the physical material properties of objects and grounding the 3D motion prediction on these… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: Project website at: https://physdreamer.github.io/

  6. arXiv:2403.10516  [pdf, other

    cs.CV cs.AI cs.IR cs.LG

    FeatUp: A Model-Agnostic Framework for Features at Any Resolution

    Authors: Stephanie Fu, Mark Hamilton, Laura Brandt, Axel Feldman, Zhoutong Zhang, William T. Freeman

    Abstract: Deep features are a cornerstone of computer vision research, capturing image semantics and enabling the community to solve downstream tasks even in the zero- or few-shot regime. However, these features often lack the spatial resolution to directly perform dense prediction tasks like segmentation and depth prediction because models aggressively pool information over large areas. In this work, we in… ▽ More

    Submitted 1 April, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted to the International Conference on Learning Representations (ICLR) 2024

  7. arXiv:2312.08715  [pdf, other

    cs.RO

    Bayes3D: fast learning and inference in structured generative models of 3D objects and scenes

    Authors: Nishad Gothoskar, Matin Ghavami, Eric Li, Aidan Curtis, Michael Noseworthy, Karen Chung, Brian Patton, William T. Freeman, Joshua B. Tenenbaum, Mirko Klukas, Vikash K. Mansinghka

    Abstract: Robots cannot yet match humans' ability to rapidly learn the shapes of novel 3D objects and recognize them robustly despite clutter and occlusion. We present Bayes3D, an uncertainty-aware perception system for structured 3D scenes, that reports accurate posterior uncertainty over 3D object shape, pose, and scene composition in the presence of clutter and occlusion. Bayes3D delivers these capabilit… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  8. arXiv:2312.03884  [pdf, other

    cs.CV cs.GR

    WonderJourney: Going from Anywhere to Everywhere

    Authors: Hong-Xing Yu, Haoyi Duan, Junhwa Hur, Kyle Sargent, Michael Rubinstein, William T. Freeman, Forrester Cole, Deqing Sun, Noah Snavely, Jiajun Wu, Charles Herrmann

    Abstract: We introduce WonderJourney, a modularized framework for perpetual 3D scene generation. Unlike prior work on view generation that focuses on a single type of scenes, we start at any user-provided location (by a text description or an image) and generate a journey through a long sequence of diverse yet coherently connected 3D scenes. We leverage an LLM to generate textual descriptions of the scenes… ▽ More

    Submitted 12 April, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: Project website with video results: https://kovenyu.com/WonderJourney/

  9. arXiv:2312.02970  [pdf, other

    cs.CV cs.AI cs.GR

    Alchemist: Parametric Control of Material Properties with Diffusion Models

    Authors: Prafull Sharma, Varun Jampani, Yuanzhen Li, Xuhui Jia, Dmitry Lagun, Fredo Durand, William T. Freeman, Mark Matthews

    Abstract: We propose a method to control material attributes of objects like roughness, metallic, albedo, and transparency in real images. Our method capitalizes on the generative prior of text-to-image models known for photorealism, employing a scalar value and instructions to alter low-level material properties. Addressing the lack of datasets with controlled material attributes, we generated an object-ce… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  10. arXiv:2311.18828  [pdf, other

    cs.CV

    One-step Diffusion with Distribution Matching Distillation

    Authors: Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T. Freeman, Taesung Park

    Abstract: Diffusion models generate high-quality images but require dozens of forward passes. We introduce Distribution Matching Distillation (DMD), a procedure to transform a diffusion model into a one-step image generator with minimal impact on image quality. We enforce the one-step image generator match the diffusion model at distribution level, by minimizing an approximate KL divergence whose gradient c… ▽ More

    Submitted 5 December, 2023; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Project page: https://tianweiy.github.io/dmd/

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

  11. arXiv:2309.03926  [pdf, other

    cs.SD cs.AI cs.DC cs.DL cs.LG eess.AS

    Large-Scale Automatic Audiobook Creation

    Authors: Brendan Walsh, Mark Hamilton, Greg Newby, Xi Wang, Serena Ruan, Sheng Zhao, Lei He, Shaofei Zhang, Eric Dettinger, William T. Freeman, Markus Weimer

    Abstract: An audiobook can dramatically improve a work of literature's accessibility and improve reader engagement. However, audiobooks can take hundreds of hours of human effort to create, edit, and publish. In this work, we present a system that can automatically generate high-quality audiobooks from online e-books. In particular, we leverage recent advances in neural text-to-speech to create and release… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  12. arXiv:2308.03757  [pdf, other

    cs.CV

    3D Motion Magnification: Visualizing Subtle Motions with Time Varying Radiance Fields

    Authors: Brandon Y. Feng, Hadi Alzayer, Michael Rubinstein, William T. Freeman, Jia-Bin Huang

    Abstract: Motion magnification helps us visualize subtle, imperceptible motion. However, prior methods only work for 2D videos captured with a fixed camera. We present a 3D motion magnification method that can magnify subtle motions from scenes captured by a moving camera, while supporting novel view rendering. We represent the scene with time-varying radiance fields and leverage the Eulerian principle for… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: ICCV 2023. See the project page at https://3d-motion-magnification.github.io

  13. arXiv:2306.11719  [pdf, other

    cs.CV cs.GR cs.LG

    Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision

    Authors: Ayush Tewari, Tianwei Yin, George Cazenavette, Semon Rezchikov, Joshua B. Tenenbaum, Frédo Durand, William T. Freeman, Vincent Sitzmann

    Abstract: Denoising diffusion models are a powerful type of generative models used to capture complex distributions of real-world signals. However, their applicability is limited to scenarios where training samples are readily available, which is not always the case in real-world applications. For example, in inverse graphics, the goal is to generate samples from a distribution of 3D scenes that align with… ▽ More

    Submitted 16 November, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Project page: https://diffusion-with-forward-models.github.io/

  14. arXiv:2306.05428  [pdf, other

    cs.CV

    Background Prompting for Improved Object Depth

    Authors: Manel Baradad, Yuanzhen Li, Forrester Cole, Michael Rubinstein, Antonio Torralba, William T. Freeman, Varun Jampani

    Abstract: Estimating the depth of objects from a single image is a valuable task for many vision, robotics, and graphics applications. However, current methods often fail to produce accurate depth for objects in diverse scenes. In this work, we propose a simple yet effective Background Prompting strategy that adapts the input object image with a learned background. We learn the background prompts only using… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

  15. arXiv:2305.13291  [pdf, other

    cs.CV cs.GR cs.LG

    Materialistic: Selecting Similar Materials in Images

    Authors: Prafull Sharma, Julien Philip, Michaël Gharbi, William T. Freeman, Fredo Durand, Valentin Deschaintre

    Abstract: Separating an image into meaningful underlying components is a crucial first step for both editing and understanding images. We present a method capable of selecting the regions of a photograph exhibiting the same material as an artist-chosen area. Our proposed approach is robust to shading, specular highlights, and cast shadows, enabling selection in real images. As we do not rely on semantic seg… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  16. arXiv:2305.10431  [pdf, other

    cs.CV

    FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention

    Authors: Guangxuan Xiao, Tianwei Yin, William T. Freeman, Frédo Durand, Song Han

    Abstract: Diffusion models excel at text-to-image generation, especially in subject-driven generation for personalized images. However, existing methods are inefficient due to the subject-specific fine-tuning, which is computationally intensive and hampers efficient deployment. Moreover, existing methods struggle with multi-subject generation as they often blend features among subjects. We present FastCompo… ▽ More

    Submitted 21 May, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: The first two authors contributed equally to this work

  17. arXiv:2304.11751  [pdf, other

    cs.CV

    Score-Based Diffusion Models as Principled Priors for Inverse Imaging

    Authors: Berthy T. Feng, Jamie Smith, Michael Rubinstein, Huiwen Chang, Katherine L. Bouman, William T. Freeman

    Abstract: Priors are essential for reconstructing images from noisy and/or incomplete measurements. The choice of the prior determines both the quality and uncertainty of recovered images. We propose turning score-based diffusion models into principled image priors ("score-based priors") for analyzing a posterior of images given measurements. Previously, probabilistic priors were limited to handcrafted regu… ▽ More

    Submitted 28 August, 2023; v1 submitted 23 April, 2023; originally announced April 2023.

    Comments: ICCV 2023

  18. arXiv:2301.00704  [pdf, other

    cs.CV cs.AI cs.LG

    Muse: Text-To-Image Generation via Masked Generative Transformers

    Authors: Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T. Freeman, Michael Rubinstein, Yuanzhen Li, Dilip Krishnan

    Abstract: We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. C… ▽ More

    Submitted 2 January, 2023; originally announced January 2023.

  19. arXiv:2212.09898  [pdf, other

    cs.CV

    MetaCLUE: Towards Comprehensive Visual Metaphors Research

    Authors: Arjun R. Akula, Brendan Driscoll, Pradyumna Narayana, Soravit Changpinyo, Zhiwei Jia, Suyash Damle, Garima Pruthi, Sugato Basu, Leonidas Guibas, William T. Freeman, Yuanzhen Li, Varun Jampani

    Abstract: Creativity is an indispensable part of human cognition and also an inherent part of how we make sense of the world. Metaphorical abstraction is fundamental in communicating creative ideas through nuanced relationships between abstract concepts such as feelings. While computer vision benchmarks and approaches predominantly focus on understanding and generating literal interpretations of images, met… ▽ More

    Submitted 2 June, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: Accepted in CVPR 2023. Project page: https://metaclue.github.io/ , Video summary: https://youtu.be/V3TmeNETL-o

  20. arXiv:2209.10077  [pdf, other

    cs.CV cs.LG

    Can Shadows Reveal Biometric Information?

    Authors: Safa C. Medin, Amir Weiss, Frédo Durand, William T. Freeman, Gregory W. Wornell

    Abstract: We study the problem of extracting biometric information of individuals by looking at shadows of objects cast on diffuse surfaces. We show that the biometric information leakage from shadows can be sufficient for reliable identity inference under representative scenarios via a maximum likelihood analysis. We then develop a learning-based method that demonstrates this phenomenon in real settings, e… ▽ More

    Submitted 4 October, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

  21. arXiv:2207.11232  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Neural Groundplans: Persistent Neural Scene Representations from a Single Image

    Authors: Prafull Sharma, Ayush Tewari, Yilun Du, Sergey Zakharov, Rares Ambrus, Adrien Gaidon, William T. Freeman, Fredo Durand, Joshua B. Tenenbaum, Vincent Sitzmann

    Abstract: We present a method to map 2D image observations of a scene to a persistent 3D scene representation, enabling novel view synthesis and disentangled representation of the movable and immovable components of the scene. Motivated by the bird's-eye-view (BEV) representation commonly used in vision and robotics, we propose conditional neural groundplans, ground-aligned 2D feature grids, as persistent a… ▽ More

    Submitted 9 April, 2023; v1 submitted 22 July, 2022; originally announced July 2022.

    Comments: Project page: https://prafullsharma.net/neural_groundplans/

  22. arXiv:2203.10712  [pdf, other

    cs.CV

    Disentangling Architecture and Training for Optical Flow

    Authors: Deqing Sun, Charles Herrmann, Fitsum Reda, Michael Rubinstein, David Fleet, William T. Freeman

    Abstract: How important are training details and datasets to recent optical flow models like RAFT? And do they generalize? To explore these questions, rather than develop a new model, we revisit three prominent models, PWC-Net, IRR-PWC and RAFT, with a common set of modern training techniques and datasets, and observe significant performance gains, demonstrating the importance and generality of these traini… ▽ More

    Submitted 19 September, 2022; v1 submitted 20 March, 2022; originally announced March 2022.

    Comments: Accepted to ECCV22. 33 pages, including supplementals. Website at: https://autoflow-google.github.io/

  23. arXiv:2203.08414  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Unsupervised Semantic Segmentation by Distilling Feature Correspondences

    Authors: Mark Hamilton, Zhoutong Zhang, Bharath Hariharan, Noah Snavely, William T. Freeman

    Abstract: Unsupervised semantic segmentation aims to discover and localize semantically meaningful categories within image corpora without any form of annotation. To solve this task, algorithms must produce features for every pixel that are both semantically meaningful and compact enough to form distinct clusters. Unlike previous works which achieve this with a single end-to-end framework, we propose to sep… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

  24. arXiv:2202.04200  [pdf, other

    cs.CV

    MaskGIT: Masked Generative Image Transformer

    Authors: Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, William T. Freeman

    Abstract: Generative transformers have experienced rapid popularity growth in the computer vision community in synthesizing high-fidelity and high-resolution images. The best generative transformer models so far, however, still treat an image naively as a sequence of tokens, and decode an image sequentially following the raster scan ordering (i.e. line-by-line). We find this strategy neither optimal nor eff… ▽ More

    Submitted 8 February, 2022; originally announced February 2022.

  25. arXiv:2112.12867  [pdf, other

    cs.CV

    HSPACE: Synthetic Parametric Humans Animated in Complex Environments

    Authors: Eduard Gabriel Bazavan, Andrei Zanfir, Mihai Zanfir, William T. Freeman, Rahul Sukthankar, Cristian Sminchisescu

    Abstract: Advances in the state of the art for 3d human sensing are currently limited by the lack of visual datasets with 3d ground truth, including multiple people, in motion, operating in real-world environments, with complex illumination or occlusion, and potentially observed by a moving camera. Sophisticated scene understanding would require estimating human pose and shape as well as gestures, towards r… ▽ More

    Submitted 6 January, 2022; v1 submitted 23 December, 2021; originally announced December 2021.

  26. arXiv:2109.01068  [pdf, other

    cs.CV cs.GR

    SLIDE: Single Image 3D Photography with Soft Layering and Depth-aware Inpainting

    Authors: Varun Jampani, Huiwen Chang, Kyle Sargent, Abhishek Kar, Richard Tucker, Michael Krainin, Dominik Kaeser, William T. Freeman, David Salesin, Brian Curless, Ce Liu

    Abstract: Single image 3D photography enables viewers to view a still image from novel viewpoints. Recent approaches combine monocular depth networks with inpainting networks to achieve compelling results. A drawback of these techniques is the use of hard depth layering, making them unable to model intricate appearance details such as thin hair-like structures. We present SLIDE, a modular and unified system… ▽ More

    Submitted 2 September, 2021; originally announced September 2021.

    Comments: ICCV 2021 (Oral); Project page: https://varunjampani.github.io/slide ; Video: https://www.youtube.com/watch?v=RQio7q-ueY8

  27. arXiv:2108.13027  [pdf, other

    cs.CV

    What You Can Learn by Staring at a Blank Wall

    Authors: Prafull Sharma, Miika Aittala, Yoav Y. Schechner, Antonio Torralba, Gregory W. Wornell, William T. Freeman, Fredo Durand

    Abstract: We present a passive non-line-of-sight method that infers the number of people or activity of a person from the observation of a blank wall in an unknown room. Our technique analyzes complex imperceptible changes in indirect illumination in a video of the wall to reveal a signal that is correlated with motion in the hidden part of a scene. We use this signal to classify between zero, one, or two m… ▽ More

    Submitted 30 August, 2021; originally announced August 2021.

  28. Consistent Depth of Moving Objects in Video

    Authors: Zhoutong Zhang, Forrester Cole, Richard Tucker, William T. Freeman, Tali Dekel

    Abstract: We present a method to estimate depth of a dynamic scene, containing arbitrary moving objects, from an ordinary video captured with a moving camera. We seek a geometrically and temporally consistent solution to this underconstrained problem: the depth predictions of corresponding points across frames should induce plausible, smooth motion in 3D. We formulate this objective in a new test-time train… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

    Comments: Published at SIGGRAPH 2021

    Journal ref: ACM Trans. Graph., Vol. 40, No. 4, Article 148, August 2021

  29. arXiv:2106.09336  [pdf, other

    cs.CV

    THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers

    Authors: Mihai Zanfir, Andrei Zanfir, Eduard Gabriel Bazavan, William T. Freeman, Rahul Sukthankar, Cristian Sminchisescu

    Abstract: We present THUNDR, a transformer-based deep neural network methodology to reconstruct the 3d pose and shape of people, given monocular RGB images. Key to our methodology is an intermediate 3d marker representation, where we aim to combine the predictive power of model-free-output architectures and the regularizing, anthropometrically-preserving properties of a statistical human surface model like… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

  30. arXiv:2106.07627  [pdf, other

    cs.CV

    Toward Automatic Interpretation of 3D Plots

    Authors: Laura E. Brandt, William T. Freeman

    Abstract: This paper explores the challenge of teaching a machine how to reverse-engineer the grid-marked surfaces used to represent data in 3D surface plots of two-variable functions. These are common in scientific and economic publications; and humans can often interpret them with ease, quickly gleaning general shape and curvature information from the simple collection of curves. While machines have no su… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

    Comments: 16 pages, 12 figures, accepted to the 16th International Conference on Document Analysis and Recognition (ICDAR21)

  31. arXiv:2106.02634  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.MM

    Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering

    Authors: Vincent Sitzmann, Semon Rezchikov, William T. Freeman, Joshua B. Tenenbaum, Fredo Durand

    Abstract: Inferring representations of 3D scenes from 2D observations is a fundamental problem of computer graphics, computer vision, and artificial intelligence. Emerging 3D-structured neural scene representations are a promising approach to 3D scene understanding. In this work, we propose a novel neural scene representation, Light Field Networks or LFNs, which represent both geometry and appearance of the… ▽ More

    Submitted 18 January, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: First two authors contributed equally. Project website: https://vsitzmann.github.io/lfns/

  32. NeRFactor: Neural Factorization of Shape and Reflectance Under an Unknown Illumination

    Authors: Xiuming Zhang, Pratul P. Srinivasan, Boyang Deng, Paul Debevec, William T. Freeman, Jonathan T. Barron

    Abstract: We address the problem of recovering the shape and spatially-varying reflectance of an object from multi-view images (and their camera poses) of an object illuminated by one unknown lighting condition. This enables the rendering of novel views of the object under arbitrary environment lighting and editing of the object's material properties. The key to our approach, which we call Neural Radiance F… ▽ More

    Submitted 21 December, 2021; v1 submitted 3 June, 2021; originally announced June 2021.

    Comments: Camera-ready version for SIGGRAPH Asia 2021. Project Page: https://people.csail.mit.edu/xiuming/projects/nerfactor/

  33. arXiv:2105.06993  [pdf, other

    cs.CV

    Omnimatte: Associating Objects and Their Effects in Video

    Authors: Erika Lu, Forrester Cole, Tali Dekel, Andrew Zisserman, William T. Freeman, Michael Rubinstein

    Abstract: Computer vision is increasingly effective at segmenting objects in images and videos; however, scene effects related to the objects -- shadows, reflections, generated smoke, etc -- are typically overlooked. Identifying such scene effects and associating them with the objects producing them is important for improving our fundamental understanding of visual scenes, and can also assist a variety of a… ▽ More

    Submitted 30 September, 2021; v1 submitted 14 May, 2021; originally announced May 2021.

    Comments: CVPR 2021 Oral. Project webpage: https://omnimatte.github.io/. Added references

  34. arXiv:2105.02976  [pdf, other

    cs.CV cs.GR

    LASR: Learning Articulated Shape Reconstruction from a Monocular Video

    Authors: Gengshan Yang, Deqing Sun, Varun Jampani, Daniel Vlasic, Forrester Cole, Huiwen Chang, Deva Ramanan, William T. Freeman, Ce Liu

    Abstract: Remarkable progress has been made in 3D reconstruction of rigid structures from a video or a collection of images. However, it is still challenging to reconstruct nonrigid structures from RGB inputs, due to its under-constrained nature. While template-based approaches, such as parametric shape models, have achieved great success in modeling the "closed world" of known object categories, they canno… ▽ More

    Submitted 6 May, 2021; originally announced May 2021.

    Comments: CVPR 2021. Project page: https://lasr-google.github.io/

  35. arXiv:2104.14544  [pdf, other

    cs.CV

    AutoFlow: Learning a Better Training Set for Optical Flow

    Authors: Deqing Sun, Daniel Vlasic, Charles Herrmann, Varun Jampani, Michael Krainin, Huiwen Chang, Ramin Zabih, William T. Freeman, Ce Liu

    Abstract: Synthetic datasets play a critical role in pre-training CNN models for optical flow, but they are painstaking to generate and hard to adapt to new applications. To automate the process, we present AutoFlow, a simple and effective method to render training data for optical flow that optimizes the performance of a model on a target dataset. AutoFlow takes a layered approach to render synthetic data,… ▽ More

    Submitted 29 April, 2021; originally announced April 2021.

    Comments: CVPR 2021

  36. arXiv:2104.13369  [pdf, other

    cs.CV cs.LG cs.NE eess.IV stat.ML

    Explaining in Style: Training a GAN to explain a classifier in StyleSpace

    Authors: Oran Lang, Yossi Gandelsman, Michal Yarom, Yoav Wald, Gal Elidan, Avinatan Hassidim, William T. Freeman, Phillip Isola, Amir Globerson, Michal Irani, Inbar Mosseri

    Abstract: Image classification models can depend on multiple different semantic attributes of the image. An explanation of the decision of the classifier needs to both discover and visualize these properties. Here we present StylEx, a method for doing this, by training a generative model to specifically explain multiple attributes that underlie classifier decisions. A natural source for such attributes is t… ▽ More

    Submitted 1 September, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

    Comments: Accepted to ICCV 2021. Project page: https://explaining-in-style.github.io/, Code: https://github.com/google/explaining-in-style

  37. arXiv:2103.00370  [pdf, other

    cs.LG cs.CV cs.HC cs.IR

    Axiomatic Explanations for Visual Search, Retrieval, and Similarity Learning

    Authors: Mark Hamilton, Scott Lundberg, Lei Zhang, Stephanie Fu, William T. Freeman

    Abstract: Visual search, recommendation, and contrastive similarity learning power technologies that impact billions of users worldwide. Modern model architectures can be complex and difficult to interpret, and there are several competing techniques one can use to explain a search engine's behavior. We show that the theory of fair credit assignment provides a $\textit{unique}$ axiomatic solution that genera… ▽ More

    Submitted 16 March, 2022; v1 submitted 27 February, 2021; originally announced March 2021.

  38. arXiv:2011.10007  [pdf, other

    cs.CV cs.LG stat.ML

    Multi-Plane Program Induction with 3D Box Priors

    Authors: Yikai Li, Jiayuan Mao, Xiuming Zhang, William T. Freeman, Joshua B. Tenenbaum, Noah Snavely, Jiajun Wu

    Abstract: We consider two important aspects in understanding and editing images: modeling regular, program-like texture or patterns in 2D planes, and 3D posing of these planes in the scene. Unlike prior work on image-based program synthesis, which assumes the image contains a single visible 2D plane, we present Box Program Induction (BPI), which infers a program-like scene representation that simultaneously… ▽ More

    Submitted 22 November, 2020; v1 submitted 19 November, 2020; originally announced November 2020.

    Comments: NeurIPS 2020. First two authors contributed equally. Project page: http://bpi.csail.mit.edu

  39. arXiv:2009.08044  [pdf, other

    cs.AI cs.DB cs.DC cs.LG cs.NI

    Large-Scale Intelligent Microservices

    Authors: Mark Hamilton, Nick Gonsalves, Christina Lee, Anand Raman, Brendan Walsh, Siddhartha Prasad, Dalitso Banda, Lucy Zhang, Mei Gao, Lei Zhang, William T. Freeman

    Abstract: Deploying Machine Learning (ML) algorithms within databases is a challenge due to the varied computational footprints of modern ML algorithms and the myriad of database technologies each with its own restrictive syntax. We introduce an Apache Spark-based micro-service orchestration framework that extends database operations to include web service primitives. Our system can orchestrate web services… ▽ More

    Submitted 2 December, 2021; v1 submitted 16 September, 2020; originally announced September 2020.

  40. arXiv:2009.07833  [pdf, other

    cs.CV cs.GR

    Layered Neural Rendering for Retiming People in Video

    Authors: Erika Lu, Forrester Cole, Tali Dekel, Weidi Xie, Andrew Zisserman, David Salesin, William T. Freeman, Michael Rubinstein

    Abstract: We present a method for retiming people in an ordinary, natural video -- manipulating and editing the time in which different motions of individuals in the video occur. We can temporally align different motions, change the speed of certain actions (speeding up/slowing down, or entirely "freezing" people), or "erase" selected people from the video altogether. We achieve these effects computationall… ▽ More

    Submitted 30 September, 2021; v1 submitted 16 September, 2020; originally announced September 2020.

    Comments: In SIGGRAPH Asia 2020. Project webpage: https://retiming.github.io/. Added references

  41. arXiv:2008.06910  [pdf, other

    cs.CV

    Neural Descent for Visual 3D Human Pose and Shape

    Authors: Andrei Zanfir, Eduard Gabriel Bazavan, Mihai Zanfir, William T. Freeman, Rahul Sukthankar, Cristian Sminchisescu

    Abstract: We present deep neural network methodology to reconstruct the 3d pose and shape of people, given an input RGB image. We rely on a recently introduced, expressivefull body statistical 3d human model, GHUM, trained end-to-end, and learn to reconstruct its pose and shape state in a self-supervised regime. Central to our methodology, is a learning to learn and optimize approach, referred to as HUmanNe… ▽ More

    Submitted 14 June, 2021; v1 submitted 16 August, 2020; originally announced August 2020.

    Comments: CVPR 2021

  42. arXiv:2008.03806  [pdf, other

    cs.CV cs.GR

    Neural Light Transport for Relighting and View Synthesis

    Authors: Xiuming Zhang, Sean Fanello, Yun-Ta Tsai, Tiancheng Sun, Tianfan Xue, Rohit Pandey, Sergio Orts-Escolano, Philip Davidson, Christoph Rhemann, Paul Debevec, Jonathan T. Barron, Ravi Ramamoorthi, William T. Freeman

    Abstract: The light transport (LT) of a scene describes how it appears under different lighting and viewing directions, and complete knowledge of a scene's LT enables the synthesis of novel views under arbitrary lighting. In this paper, we focus on image-based LT acquisition, primarily for human bodies within a light stage setup. We propose a semi-parametric approach to learn a neural representation of LT t… ▽ More

    Submitted 20 January, 2021; v1 submitted 9 August, 2020; originally announced August 2020.

    Comments: Camera-ready version for TOG 2021. Project Page: http://nlt.csail.mit.edu/

  43. arXiv:2007.07177  [pdf, other

    cs.LG cs.CV cs.GR cs.IR stat.ML

    MosAIc: Finding Artistic Connections across Culture with Conditional Image Retrieval

    Authors: Mark Hamilton, Stephanie Fu, Mindren Lu, Johnny Bui, Darius Bopp, Zhenbang Chen, Felix Tran, Margaret Wang, Marina Rogers, Lei Zhang, Chris Hoder, William T. Freeman

    Abstract: We introduce MosAIc, an interactive web app that allows users to find pairs of semantically related artworks that span different cultures, media, and millennia. To create this application, we introduce Conditional Image Retrieval (CIR) which combines visual similarity search with user supplied filters or "conditions". This technique allows one to find pairs of similar images that span distinct sub… ▽ More

    Submitted 27 February, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

  44. arXiv:2007.06059  [pdf, other

    cs.LG cs.CV stat.ML

    It Is Likely That Your Loss Should be a Likelihood

    Authors: Mark Hamilton, Evan Shelhamer, William T. Freeman

    Abstract: Many common loss functions such as mean-squared-error, cross-entropy, and reconstruction loss are unnecessarily rigid. Under a probabilistic interpretation, these common losses correspond to distributions with fixed shapes and scales. We instead argue for optimizing full likelihoods that include parameters like the normal variance and softmax temperature. Joint optimization of these "likelihood pa… ▽ More

    Submitted 2 October, 2020; v1 submitted 12 July, 2020; originally announced July 2020.

  45. arXiv:2006.14708  [pdf, other

    cs.CV cs.LG stat.ML

    Perspective Plane Program Induction from a Single Image

    Authors: Yikai Li, Jiayuan Mao, Xiuming Zhang, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

    Abstract: We study the inverse graphics problem of inferring a holistic representation for natural images. Given an input image, our goal is to induce a neuro-symbolic, program-like representation that jointly models camera poses, object locations, and global scene structures. Such high-level, holistic scene representations further facilitate low-level image manipulation tasks such as inpainting. We formula… ▽ More

    Submitted 25 June, 2020; originally announced June 2020.

    Comments: CVPR 2020. First two authors contributed equally. Project page: http://p3i.csail.mit.edu/

  46. arXiv:2006.09241  [pdf, other

    eess.IV cs.CV

    Two-Dimensional Non-Line-of-Sight Scene Estimation from a Single Edge Occluder

    Authors: Sheila W. Seidel, John Murray-Bruce, Yanting Ma, Christopher Yu, William T. Freeman, Vivek K Goyal

    Abstract: Passive non-line-of-sight imaging methods are often faster and stealthier than their active counterparts, requiring less complex and costly equipment. However, many of these methods exploit motion of an occluder or the hidden scene, or require knowledge or calibration of complicated occluders. The edge of a wall is a known and ubiquitous occluding structure that may be used as an aperture to image… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

    Comments: 14 pages, 15 figures

  47. arXiv:2004.06130  [pdf, other

    cs.CV

    SpeedNet: Learning the Speediness in Videos

    Authors: Sagie Benaim, Ariel Ephrat, Oran Lang, Inbar Mosseri, William T. Freeman, Michael Rubinstein, Michal Irani, Tali Dekel

    Abstract: We wish to automatically predict the "speediness" of moving objects in videos---whether they move faster, at, or slower than their "natural" speed. The core component in our approach is SpeedNet---a novel deep network trained to detect if a video is playing at normal rate, or if it is sped up. SpeedNet is trained on a large corpus of natural videos in a self-supervised manner, without requiring an… ▽ More

    Submitted 26 July, 2020; v1 submitted 13 April, 2020; originally announced April 2020.

    Comments: Accepted to CVPR 2020 (oral). Project webpage: http://speednet-cvpr20.github.io

  48. arXiv:2003.06221  [pdf, other

    cs.CV cs.LG

    Semantic Pyramid for Image Generation

    Authors: Assaf Shocher, Yossi Gandelsman, Inbar Mosseri, Michal Yarom, Michal Irani, William T. Freeman, Tali Dekel

    Abstract: We present a novel GAN-based model that utilizes the space of deep features learned by a pre-trained classification model. Inspired by classical image pyramid representations, we construct our model as a Semantic Generation Pyramid -- a hierarchical framework which leverages the continuum of semantic information encapsulated in such deep features; this ranges from low level information contained i… ▽ More

    Submitted 16 March, 2020; v1 submitted 13 March, 2020; originally announced March 2020.

    Journal ref: IEEE Conference on Computer Vision and Pattern Recognition, 2020. CVPR 2020

  49. arXiv:1912.02314  [pdf, other

    cs.CV cs.LG

    Computational Mirrors: Blind Inverse Light Transport by Deep Matrix Factorization

    Authors: Miika Aittala, Prafull Sharma, Lukas Murmann, Adam B. Yedidia, Gregory W. Wornell, William T. Freeman, Fredo Durand

    Abstract: We recover a video of the motion taking place in a hidden scene by observing changes in indirect illumination in a nearby uncalibrated visible region. We solve this problem by factoring the observed video into a matrix product between the unknown hidden scene video and an unknown light transport matrix. This task is extremely ill-posed, as any non-negative factorization will satisfy the data. Insp… ▽ More

    Submitted 4 December, 2019; originally announced December 2019.

    Comments: 14 pages, 5 figures, Advances in Neural Information Processing Systems 2019

    Journal ref: Aittala, Miika, et al. "Computational Mirrors: Blind Inverse Light Transport by Deep Matrix Factorization." Advances in Neural Information Processing Systems. 2019

  50. arXiv:1909.02116  [pdf, other

    cs.CV cs.LG stat.ML

    Program-Guided Image Manipulators

    Authors: Jiayuan Mao, Xiuming Zhang, Yikai Li, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

    Abstract: Humans are capable of building holistic representations for images at various levels, from local objects, to pairwise relations, to global structures. The interpretation of structures involves reasoning over repetition and symmetry of the objects in the image. In this paper, we present the Program-Guided Image Manipulator (PG-IM), inducing neuro-symbolic program-like representations to represent a… ▽ More

    Submitted 4 September, 2019; originally announced September 2019.

    Comments: ICCV 2019. First two authors contributed equally. Project page: http://pgim.csail.mit.edu/