Skip to main content

Showing 51–100 of 101 results for author: Freeman, T

.
  1. arXiv:2003.06221  [pdf, other

    cs.CV cs.LG

    Semantic Pyramid for Image Generation

    Authors: Assaf Shocher, Yossi Gandelsman, Inbar Mosseri, Michal Yarom, Michal Irani, William T. Freeman, Tali Dekel

    Abstract: We present a novel GAN-based model that utilizes the space of deep features learned by a pre-trained classification model. Inspired by classical image pyramid representations, we construct our model as a Semantic Generation Pyramid -- a hierarchical framework which leverages the continuum of semantic information encapsulated in such deep features; this ranges from low level information contained i… ▽ More

    Submitted 16 March, 2020; v1 submitted 13 March, 2020; originally announced March 2020.

    Journal ref: IEEE Conference on Computer Vision and Pattern Recognition, 2020. CVPR 2020

  2. arXiv:1912.02314  [pdf, other

    cs.CV cs.LG

    Computational Mirrors: Blind Inverse Light Transport by Deep Matrix Factorization

    Authors: Miika Aittala, Prafull Sharma, Lukas Murmann, Adam B. Yedidia, Gregory W. Wornell, William T. Freeman, Fredo Durand

    Abstract: We recover a video of the motion taking place in a hidden scene by observing changes in indirect illumination in a nearby uncalibrated visible region. We solve this problem by factoring the observed video into a matrix product between the unknown hidden scene video and an unknown light transport matrix. This task is extremely ill-posed, as any non-negative factorization will satisfy the data. Insp… ▽ More

    Submitted 4 December, 2019; originally announced December 2019.

    Comments: 14 pages, 5 figures, Advances in Neural Information Processing Systems 2019

    Journal ref: Aittala, Miika, et al. "Computational Mirrors: Blind Inverse Light Transport by Deep Matrix Factorization." Advances in Neural Information Processing Systems. 2019

  3. arXiv:1909.02116  [pdf, other

    cs.CV cs.LG stat.ML

    Program-Guided Image Manipulators

    Authors: Jiayuan Mao, Xiuming Zhang, Yikai Li, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

    Abstract: Humans are capable of building holistic representations for images at various levels, from local objects, to pairwise relations, to global structures. The interpretation of structures involves reasoning over repetition and symmetry of the objects in the image. In this paper, we present the Program-Guided Image Manipulator (PG-IM), inducing neuro-symbolic program-like representations to represent a… ▽ More

    Submitted 4 September, 2019; originally announced September 2019.

    Comments: ICCV 2019. First two authors contributed equally. Project page: http://pgim.csail.mit.edu/

  4. arXiv:1909.00475  [pdf, other

    cs.CV

    Visual Deprojection: Probabilistic Recovery of Collapsed Dimensions

    Authors: Guha Balakrishnan, Adrian V. Dalca, Amy Zhao, John V. Guttag, Fredo Durand, William T. Freeman

    Abstract: We introduce visual deprojection: the task of recovering an image or video that has been collapsed along a dimension. Projections arise in various contexts, such as long-exposure photography, where a dynamic scene is collapsed in time to produce a motion-blurred image, and corner cameras, where reflected light from a scene is collapsed along a spatial dimension because of an edge occluder to yield… ▽ More

    Submitted 1 September, 2019; originally announced September 2019.

    Comments: ICCV 2019

  5. arXiv:1908.07007  [pdf, other

    cs.CV

    Boundless: Generative Adversarial Networks for Image Extension

    Authors: Piotr Teterwak, Aaron Sarna, Dilip Krishnan, Aaron Maschinot, David Belanger, Ce Liu, William T. Freeman

    Abstract: Image extension models have broad applications in image editing, computational photography and computer graphics. While image inpainting has been extensively studied in the literature, it is challenging to directly apply the state-of-the-art inpainting methods to image extension as they tend to generate blurry or repetitive pixels with inconsistent semantics. We introduce semantic conditioning to… ▽ More

    Submitted 19 August, 2019; originally announced August 2019.

  6. arXiv:1905.09773  [pdf, other

    cs.CV cs.MM

    Speech2Face: Learning the Face Behind a Voice

    Authors: Tae-Hyun Oh, Tali Dekel, Changil Kim, Inbar Mosseri, William T. Freeman, Michael Rubinstein, Wojciech Matusik

    Abstract: How much can we infer about a person's looks from the way they speak? In this paper, we study the task of reconstructing a facial image of a person from a short audio recording of that person speaking. We design and train a deep neural network to perform this task using millions of natural Internet/YouTube videos of people speaking. During training, our model learns voice-face correlations that al… ▽ More

    Submitted 23 May, 2019; originally announced May 2019.

    Comments: To appear in CVPR2019. Project page: http://speech2face.github.io

  7. arXiv:1904.11111  [pdf, other

    cs.CV

    Learning the Depths of Moving People by Watching Frozen People

    Authors: Zhengqi Li, Tali Dekel, Forrester Cole, Richard Tucker, Noah Snavely, Ce Liu, William T. Freeman

    Abstract: We present a method for predicting dense depth in scenarios where both a monocular camera and people in the scene are freely moving. Existing methods for recovering depth for dynamic, non-rigid objects from monocular video impose strong assumptions on the objects' motion and may only recover sparse depth. In this paper, we take a data-driven approach and learn human depth priors from a new source… ▽ More

    Submitted 24 April, 2019; originally announced April 2019.

    Comments: CVPR 2019 (Oral)

  8. arXiv:1904.06447  [pdf, other

    cs.CV cs.GR

    Learning Shape Templates with Structured Implicit Functions

    Authors: Kyle Genova, Forrester Cole, Daniel Vlasic, Aaron Sarna, William T. Freeman, Thomas Funkhouser

    Abstract: Template 3D shapes are useful for many tasks in graphics and vision, including fitting observation data, analyzing shape collections, and transferring shape attributes. Because of the variety of geometry and topology of real-world shapes, previous methods generally use a library of hand-made templates. In this paper, we investigate learning a general shape template from data. To allow for widely v… ▽ More

    Submitted 12 April, 2019; originally announced April 2019.

    Comments: 12 pages, 9 figures, 4 tables

  9. arXiv:1903.05136  [pdf, other

    cs.CV cs.AI cs.LG

    Unsupervised Discovery of Parts, Structure, and Dynamics

    Authors: Zhenjia Xu, Zhijian Liu, Chen Sun, Kevin Murphy, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

    Abstract: Humans easily recognize object parts and their hierarchical structure by watching how they move; they can then predict how each part moves in the future. In this paper, we propose a novel formulation that simultaneously learns a hierarchical, disentangled object representation and a dynamics model for object parts from unlabeled videos. Our Parts, Structure, and Dynamics (PSD) model learns to, fir… ▽ More

    Submitted 12 March, 2019; originally announced March 2019.

    Comments: ICLR 2019. The first two authors contributed equally to this work

  10. arXiv:1901.09887  [pdf, other

    cs.LG stat.ML

    On the Units of GANs (Extended Abstract)

    Authors: David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, Antonio Torralba

    Abstract: Generative Adversarial Networks (GANs) have achieved impressive results for many real-world applications. As an active research topic, many GAN variants have emerged with improvements in sample quality and training stability. However, visualization and understanding of GANs is largely missing. How does a GAN represent our visual world internally? What causes the artifacts in GAN results? How do ar… ▽ More

    Submitted 6 August, 2020; v1 submitted 29 January, 2019; originally announced January 2019.

    Comments: In AAAI-19 workshop on Network Interpretability for Deep Learning arXiv admin note: substantial text overlap with arXiv:1811.10597

  11. arXiv:1901.02875  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Learning to Infer and Execute 3D Shape Programs

    Authors: Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

    Abstract: Human perception of 3D shapes goes beyond reconstructing them as a set of points or a composition of geometric primitives: we also effortlessly understand higher-level shape structure such as the repetition and reflective symmetry of object parts. In contrast, recent advances in 3D shape sensing focus more on low-level geometry but less on these higher-level relationships. In this paper, we propos… ▽ More

    Submitted 9 August, 2019; v1 submitted 9 January, 2019; originally announced January 2019.

    Comments: ICLR 2019. Project page: http://shape2prog.csail.mit.edu

  12. arXiv:1812.11166  [pdf, other

    cs.CV cs.AI

    Learning to Reconstruct Shapes from Unseen Classes

    Authors: Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Joshua B. Tenenbaum, William T. Freeman, Jiajun Wu

    Abstract: From a single image, humans are able to perceive the full 3D shape of an object by exploiting learned shape priors from everyday life. Contemporary single-image 3D reconstruction algorithms aim to solve this task in a similar fashion, but often end up with priors that are highly biased by training classes. Here we present an algorithm, Generalizable Reconstruction (GenRe), designed to capture more… ▽ More

    Submitted 28 December, 2018; originally announced December 2018.

    Comments: NeurIPS 2018 (Oral). The first two authors contributed equally to this paper. Project page: http://genre.csail.mit.edu/

  13. arXiv:1812.10972  [pdf, other

    cs.LG cs.AI cs.CV cs.RO stat.ML

    Reasoning About Physical Interactions with Object-Oriented Prediction and Planning

    Authors: Michael Janner, Sergey Levine, William T. Freeman, Joshua B. Tenenbaum, Chelsea Finn, Jiajun Wu

    Abstract: Object-based factorizations provide a useful level of abstraction for interacting with the world. Building explicit object representations, however, often requires supervisory signals that are difficult to obtain in practice. We present a paradigm for learning object-centric representations for physical scene understanding without direct supervision of object properties. Our model, Object-Oriented… ▽ More

    Submitted 7 January, 2019; v1 submitted 28 December, 2018; originally announced December 2018.

    Comments: ICLR 2019, project page: https://people.eecs.berkeley.edu/~janner/o2p2/

  14. arXiv:1812.02725  [pdf, other

    cs.CV cs.GR stat.ML

    Visual Object Networks: Image Generation with Disentangled 3D Representation

    Authors: Jun-Yan Zhu, Zhoutong Zhang, Chengkai Zhang, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum, William T. Freeman

    Abstract: Recent progress in deep generative models has led to tremendous breakthroughs in image generation. However, while existing models can synthesize photorealistic images, they lack an understanding of our underlying 3D world. We present a new generative model, Visual Object Networks (VON), synthesizing natural images of objects with a disentangled 3D representation. Inspired by classic graphics rende… ▽ More

    Submitted 6 December, 2018; originally announced December 2018.

    Comments: NeurIPS 2018. Code: https://github.com/junyanz/VON Website: http://von.csail.mit.edu/

  15. arXiv:1811.10597  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

    Authors: David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, Antonio Torralba

    Abstract: Generative Adversarial Networks (GANs) have recently achieved impressive results for many real-world applications, and many GAN variants have emerged with improvements in sample quality and training stability. However, they have not been well visualized or understood. How does a GAN represent our visual world internally? What causes the artifacts in GAN results? How do architectural choices affect… ▽ More

    Submitted 8 December, 2018; v1 submitted 26 November, 2018; originally announced November 2018.

    Comments: 18 pages, 19 figures

  16. arXiv:1811.05443  [pdf, other

    cs.LG stat.ML

    Co-regularized Alignment for Unsupervised Domain Adaptation

    Authors: Abhishek Kumar, Prasanna Sattigeri, Kahini Wadhawan, Leonid Karlinsky, Rogerio Feris, William T. Freeman, Gregory Wornell

    Abstract: Deep neural networks, trained with large amount of labeled data, can fail to generalize well when tested with examples from a \emph{target domain} whose distribution differs from the training data distribution, referred as the \emph{source domain}. It can be expensive or even infeasible to obtain required amount of labeled data in all possible domains. Unsupervised domain adaptation sets out to ad… ▽ More

    Submitted 13 November, 2018; originally announced November 2018.

    Comments: NIPS 2018 accepted version

  17. arXiv:1810.01054  [pdf, other

    cs.RO cs.AI cs.GR cs.LG

    ChainQueen: A Real-Time Differentiable Physical Simulator for Soft Robotics

    Authors: Yuanming Hu, Jiancheng Liu, Andrew Spielberg, Joshua B. Tenenbaum, William T. Freeman, Jiajun Wu, Daniela Rus, Wojciech Matusik

    Abstract: Physical simulators have been widely used in robot planning and control. Among them, differentiable simulators are particularly favored, as they can be incorporated into gradient-based optimization algorithms that are efficient in solving inverse problems such as optimal control and motion planning. Simulating deformable objects is, however, more challenging compared to rigid body dynamics. The un… ▽ More

    Submitted 1 October, 2018; originally announced October 2018.

    Comments: In submission to ICRA 2019. Supplemental Video: https://www.youtube.com/watch?v=4IWD4iGIsB4 Project Page: https://github.com/yuanming-hu/ChainQueen

  18. arXiv:1809.05491  [pdf, other

    cs.HC cs.CV cs.GR

    MoSculp: Interactive Visualization of Shape and Time

    Authors: Xiuming Zhang, Tali Dekel, Tianfan Xue, Andrew Owens, Qiurui He, Jiajun Wu, Stefanie Mueller, William T. Freeman

    Abstract: We present a system that allows users to visualize complex human motion via 3D motion sculptures---a representation that conveys the 3D structure swept by a human body as it moves through space. Given an input video, our system computes the motion sculptures and provides a user interface for rendering it in different styles, including the options to insert the sculpture back into the original vide… ▽ More

    Submitted 2 January, 2019; v1 submitted 14 September, 2018; originally announced September 2018.

    Comments: UIST 2018. Project page: http://mosculp.csail.mit.edu/

  19. arXiv:1809.05070  [pdf, other

    cs.CV cs.AI

    Physical Primitive Decomposition

    Authors: Zhijian Liu, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

    Abstract: Objects are made of parts, each with distinct geometry, physics, functionality, and affordances. Develo** such a distributed, physical, interpretable representation of objects will facilitate intelligent agents to better explore and interact with the world. In this paper, we study physical primitive decomposition---understanding an object through its components, each with physical and geometric… ▽ More

    Submitted 13 September, 2018; originally announced September 2018.

    Comments: ECCV 2018. Project page: http://ppd.csail.mit.edu/

  20. arXiv:1809.05068  [pdf, other

    cs.CV cs.AI

    Learning Shape Priors for Single-View 3D Completion and Reconstruction

    Authors: Jiajun Wu, Chengkai Zhang, Xiuming Zhang, Zhoutong Zhang, William T. Freeman, Joshua B. Tenenbaum

    Abstract: The problem of single-view 3D shape completion or reconstruction is challenging, because among the many possible shapes that explain an observation, most are implausible and do not correspond to natural objects. Recent research in the field has tackled this problem by exploiting the expressiveness of deep convolutional networks. In fact, there is another level of ambiguity that is often overlooked… ▽ More

    Submitted 13 September, 2018; originally announced September 2018.

    Comments: ECCV 2018. The first two authors contributed equally to this work. Project page: http://shapehd.csail.mit.edu/

  21. arXiv:1809.05067  [pdf, other

    cs.CV eess.IV

    Seeing Tree Structure from Vibration

    Authors: Tianfan Xue, Jiajun Wu, Zhoutong Zhang, Chengkai Zhang, Joshua B. Tenenbaum, William T. Freeman

    Abstract: Humans recognize object structure from both their appearance and motion; often, motion helps to resolve ambiguities in object structure that arise when we observe object appearance only. There are particular scenarios, however, where neither appearance nor spatial-temporal motion signals are informative: occluding twigs may look connected and have almost identical movements, though they belong to… ▽ More

    Submitted 13 September, 2018; originally announced September 2018.

    Comments: ECCV 2018. The first two authors contributed equally to this work. Project page: http://tree.csail.mit.edu/

  22. arXiv:1808.09351  [pdf, other

    cs.CV cs.GR eess.IV

    3D-Aware Scene Manipulation via Inverse Graphics

    Authors: Shunyu Yao, Tzu Ming Harry Hsu, Jun-Yan Zhu, Jiajun Wu, Antonio Torralba, William T. Freeman, Joshua B. Tenenbaum

    Abstract: We aim to obtain an interpretable, expressive, and disentangled scene representation that contains comprehensive structural and textural information for each object. Previous scene representations learned by neural networks are often uninterpretable, limited to a single object, or lacking 3D knowledge. In this work, we propose 3D scene de-rendering networks (3D-SDN) to address the above issues by… ▽ More

    Submitted 18 December, 2018; v1 submitted 28 August, 2018; originally announced August 2018.

    Comments: NeurIPS 2018. Code: https://github.com/ysymyth/3D-SDN Website: http://3dsdn.csail.mit.edu/

  23. Medical Image Imputation from Image Collections

    Authors: Adrian V. Dalca, Katherine L. Bouman, William T. Freeman, Natalia S. Rost, Mert R. Sabuncu, Polina Golland

    Abstract: We present an algorithm for creating high resolution anatomically plausible images consistent with acquired clinical brain MRI scans with large inter-slice spacing. Although large data sets of clinical images contain a wealth of information, time constraints during acquisition result in sparse scans that fail to capture much of the anatomy. These characteristics often render computational analysis… ▽ More

    Submitted 16 August, 2018; originally announced August 2018.

    Comments: Accepted at IEEE Transactions on Medical Imaging (\c{opyright} 2018 IEEE)

  24. arXiv:1808.03247  [pdf, other

    cs.CV cs.RO

    3D Shape Perception from Monocular Vision, Touch, and Shape Priors

    Authors: Shaoxiong Wang, Jiajun Wu, Xingyuan Sun, Wenzhen Yuan, William T. Freeman, Joshua B. Tenenbaum, Edward H. Adelson

    Abstract: Perceiving accurate 3D object shape is important for robots to interact with the physical world. Current research along this direction has been primarily relying on visual observations. Vision, however useful, has inherent limitations due to occlusions and the 2D-3D ambiguities, especially for perception with a monocular camera. In contrast, touch gets precise local shape information, though its e… ▽ More

    Submitted 9 August, 2018; originally announced August 2018.

    Comments: IROS 2018. The first two authors contributed equally to this work

  25. Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks

    Authors: Tianfan Xue, Jiajun Wu, Katherine L. Bouman, William T. Freeman

    Abstract: We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods that have tackled this problem in a deterministic or non-parametric way, we propose to model future frames in a probabilistic manner. Our probabilistic model makes it possible for us to sample and synthesize many possible future frames from a single input image. To sy… ▽ More

    Submitted 9 August, 2019; v1 submitted 24 July, 2018; originally announced July 2018.

    Comments: Journal preprint of arXiv:1607.02586 (IEEE TPAMI, 2019). The first two authors contributed equally to this work. Project page: http://visualdynamics.csail.mit.edu

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 41, no. 9, pp. 2236-2250, 2019

  26. arXiv:1806.06098  [pdf, other

    cs.CV

    Unsupervised Training for 3D Morphable Model Regression

    Authors: Kyle Genova, Forrester Cole, Aaron Maschinot, Aaron Sarna, Daniel Vlasic, William T. Freeman

    Abstract: We present a method for training a regression network from image pixels to 3D morphable model coordinates using only unlabeled photographs. The training loss is based on features from a facial recognition network, computed on-the-fly by rendering the predicted faces with a differentiable renderer. To make training from features feasible and avoid network fooling effects, we introduce three objecti… ▽ More

    Submitted 15 June, 2018; originally announced June 2018.

    Comments: CVPR 2018 version with supplemental material (http://openaccess.thecvf.com/content_cvpr_2018/html/Genova_Unsupervised_Training_for_CVPR_2018_paper.html)

    Journal ref: Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8377-8386

  27. arXiv:1804.04610  [pdf, other

    cs.CV cs.LG

    Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling

    Authors: Xingyuan Sun, Jiajun Wu, Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Tianfan Xue, Joshua B. Tenenbaum, William T. Freeman

    Abstract: We study 3D shape modeling from a single image and make contributions to it in three aspects. First, we present Pix3D, a large-scale benchmark of diverse image-shape pairs with pixel-level 2D-3D alignment. Pix3D has wide applications in shape-related tasks including reconstruction, retrieval, viewpoint estimation, etc. Building such a large-scale dataset, however, is highly challenging; existing d… ▽ More

    Submitted 12 April, 2018; originally announced April 2018.

    Comments: CVPR 2018. The first two authors contributed equally to this work. Project page: http://pix3d.csail.mit.edu

  28. arXiv:1804.03619  [pdf, other

    cs.SD cs.CV eess.AS

    Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation

    Authors: Ariel Ephrat, Inbar Mosseri, Oran Lang, Tali Dekel, Kevin Wilson, Avinatan Hassidim, William T. Freeman, Michael Rubinstein

    Abstract: We present a joint audio-visual model for isolating a single speech signal from a mixture of sounds such as other speakers and background noise. Solving this task using only audio as input is extremely challenging and does not provide an association of the separated speech signals with speakers in the video. In this paper, we present a deep network-based model that incorporates both visual and aud… ▽ More

    Submitted 9 August, 2018; v1 submitted 10 April, 2018; originally announced April 2018.

    Comments: Accepted to SIGGRAPH 2018. Project webpage: https://looking-to-listen.github.io

    Journal ref: ACM Trans. Graph. 37(4): 112:1-112:11 (2018)

  29. arXiv:1804.02684  [pdf, other

    cs.CV cs.GR

    Learning-based Video Motion Magnification

    Authors: Tae-Hyun Oh, Ronnachai Jaroensri, Changil Kim, Mohamed Elgharib, Frédo Durand, William T. Freeman, Wojciech Matusik

    Abstract: Video motion magnification techniques allow us to see small motions previously invisible to the naked eyes, such as those of vibrating airplane wings, or swaying buildings under the influence of the wind. Because the motion is small, the magnification results are prone to noise or excessive blurring. The state of the art relies on hand-designed filters to extract representations that may not be op… ▽ More

    Submitted 31 July, 2018; v1 submitted 8 April, 2018; originally announced April 2018.

    Comments: Accepted as ECCV 2018 Oral. The 1st and 2nd authors equally contributed. Video result: https://youtu.be/GrMLeEcSNzY , Project page: http://people.csail.mit.edu/tiam/deepmag/ Some bibliography information was fixed

  30. 3D Interpreter Networks for Viewer-Centered Wireframe Modeling

    Authors: Jiajun Wu, Tianfan Xue, Joseph J. Lim, Yuandong Tian, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman

    Abstract: Understanding 3D object structure from a single image is an important but challenging task in computer vision, mostly due to the lack of 3D object annotations to real images. Previous research tackled this problem by either searching for a 3D shape that best explains 2D annotations, or training purely on synthetic data with ground truth 3D information. In this work, we propose 3D INterpreter Netwo… ▽ More

    Submitted 9 August, 2019; v1 submitted 2 April, 2018; originally announced April 2018.

    Comments: Journal preprint of arXiv:1604.08685 (IJCV, 2018). The first two authors contributed equally to this work. Project page: http://3dinterpreter.csail.mit.edu

    Journal ref: International Journal of Computer Vision, Volume 126, Issue 9, pp 1009-1026, 2018

  31. arXiv:1712.08232  [pdf, other

    cs.CV

    Smart, Sparse Contours to Represent and Edit Images

    Authors: Tali Dekel, Chuang Gan, Dilip Krishnan, Ce Liu, William T. Freeman

    Abstract: We study the problem of reconstructing an image from information stored at contour locations. We show that high-quality reconstructions with high fidelity to the source image can be obtained from sparse input, e.g., comprising less than $6\%$ of image pixels. This is a significant improvement over existing contour-based reconstruction methods that require much denser input to capture subtle textur… ▽ More

    Submitted 9 April, 2018; v1 submitted 21 December, 2017; originally announced December 2017.

    Comments: Accepted to CVPR'18; Project page: contour2im.github.io

  32. arXiv:1712.07271  [pdf, other

    cs.CV

    Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning

    Authors: Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba

    Abstract: The sound of crashing waves, the roar of fast-moving cars -- sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, throug… ▽ More

    Submitted 19 December, 2017; originally announced December 2017.

    Comments: Journal preprint of arXiv:1608.07017 (unpublished submission to IJCV)

  33. Video Enhancement with Task-Oriented Flow

    Authors: Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, William T. Freeman

    Abstract: Many video enhancement algorithms rely on optical flow to register frames in a video sequence. Precise flow estimation is however intractable; and optical flow itself is often a sub-optimal representation for particular video processing tasks. In this paper, we propose task-oriented flow (TOFlow), a motion representation learned in a self-supervised, task-specific manner. We design a neural networ… ▽ More

    Submitted 10 November, 2019; v1 submitted 24 November, 2017; originally announced November 2017.

    Comments: IJCV 2019. Project page: http://toflow.csail.mit.edu

    Journal ref: International Journal of Computer Vision (IJCV), 127(8):1106-1125, 2019

  34. arXiv:1711.06297  [pdf, other

    eess.IV

    Exploiting Occlusion in Non-Line-of-Sight Active Imaging

    Authors: Christos Thrampoulidis, Gal Shulkind, Feihu Xu, William T. Freeman, Jeffrey H. Shapiro, Antonio Torralba, Franco N. C. Wong, Gregory W. Wornell

    Abstract: Active non-line-of-sight imaging systems are of growing interest for diverse applications. The most commonly proposed approaches to date rely on exploiting time-resolved measurements, i.e., measuring the time it takes for short light pulses to transit the scene. This typically requires expensive, specialized, ultrafast lasers and detectors that must be carefully calibrated. We develop an alternati… ▽ More

    Submitted 16 November, 2017; originally announced November 2017.

  35. arXiv:1711.03129  [pdf, other

    cs.CV cs.LG cs.NE

    MarrNet: 3D Shape Reconstruction via 2.5D Sketches

    Authors: Jiajun Wu, Yifan Wang, Tianfan Xue, Xingyuan Sun, William T Freeman, Joshua B Tenenbaum

    Abstract: 3D object reconstruction from a single image is a highly under-determined problem, requiring strong prior knowledge of plausible 3D shapes. This introduces challenges for learning-based approaches, as 3D object annotations are scarce in real images. Previous work chose to train on synthetic data with ground truth 3D information, but suffered from domain adaptation when tested on real data. In this… ▽ More

    Submitted 8 November, 2017; originally announced November 2017.

    Comments: NIPS 2017. The first two authors contributed equally to this paper. Project page: http://marrnet.csail.mit.edu

  36. arXiv:1711.01357  [pdf, other

    astro-ph.IM cs.CV

    Reconstructing Video from Interferometric Measurements of Time-Varying Sources

    Authors: Katherine L. Bouman, Michael D. Johnson, Adrian V. Dalca, Andrew A. Chael, Freek Roelofs, Sheperd S. Doeleman, William T. Freeman

    Abstract: Very long baseline interferometry (VLBI) makes it possible to recover images of astronomical sources with extremely high angular resolution. Most recently, the Event Horizon Telescope (EHT) has extended VLBI to short millimeter wavelengths with a goal of achieving angular resolution sufficient for imaging the event horizons of nearby supermassive black holes. VLBI provides measurements related to… ▽ More

    Submitted 1 February, 2018; v1 submitted 3 November, 2017; originally announced November 2017.

    Comments: Submitted to Transactions on Computational Imaging

  37. arXiv:1701.04851  [pdf, other

    cs.CV stat.ML

    Synthesizing Normalized Faces from Facial Identity Features

    Authors: Forrester Cole, David Belanger, Dilip Krishnan, Aaron Sarna, Inbar Mosseri, William T. Freeman

    Abstract: We present a method for synthesizing a frontal, neutral-expression image of a person's face given an input face photograph. This is achieved by learning to generate facial landmarks and textures from features extracted from a facial-recognition network. Unlike previous approaches, our encoding feature vector is largely invariant to lighting, pose, and facial expression. Exploiting this invariance,… ▽ More

    Submitted 17 October, 2017; v1 submitted 17 January, 2017; originally announced January 2017.

  38. arXiv:1610.07584  [pdf, other

    cs.CV cs.LG

    Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

    Authors: Jiajun Wu, Chengkai Zhang, Tianfan Xue, William T. Freeman, Joshua B. Tenenbaum

    Abstract: We study the problem of 3D object generation. We propose a novel framework, namely 3D Generative Adversarial Network (3D-GAN), which generates 3D objects from a probabilistic space by leveraging recent advances in volumetric convolutional networks and generative adversarial nets. The benefits of our model are three-fold: first, the use of an adversarial criterion, instead of traditional heuristic… ▽ More

    Submitted 4 January, 2017; v1 submitted 24 October, 2016; originally announced October 2016.

    Comments: NIPS 2016. The first two authors contributed equally to this work

  39. arXiv:1609.01571  [pdf, other

    cs.CV

    Best-Buddies Similarity - Robust Template Matching using Mutual Nearest Neighbors

    Authors: Shaul Oron, Tali Dekel, Tianfan Xue, William T. Freeman, Shai Avidan

    Abstract: We propose a novel method for template matching in unconstrained environments. Its essence is the Best-Buddies Similarity (BBS), a useful, robust, and parameter-free similarity measure between two sets of points. BBS is based on counting the number of Best-Buddies Pairs (BBPs)--pairs of points in source and target sets, where each point is the nearest neighbor of the other. BBS has several key fea… ▽ More

    Submitted 6 September, 2016; originally announced September 2016.

  40. arXiv:1608.07017  [pdf, other

    cs.CV

    Ambient Sound Provides Supervision for Visual Learning

    Authors: Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba

    Abstract: The sound of crashing waves, the roar of fast-moving cars -- sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, throug… ▽ More

    Submitted 5 December, 2016; v1 submitted 25 August, 2016; originally announced August 2016.

    Comments: ECCV 2016

  41. arXiv:1607.03034  [pdf, other

    astro-ph.IM astro-ph.GA astro-ph.HE

    Observing---and Imaging---Active Galactic Nuclei with the Event Horizon Telescope

    Authors: Vincent L. Fish, Kazunori Akiyama, Katherine L. Bouman, Andrew A. Chael, Michael D. Johnson, Sheperd S. Doeleman, Lindy Blackburn, John F. C. Wardle, William T. Freeman, the Event Horizon Telescope Collaboration

    Abstract: Originally developed to image the shadow region of the central black hole in Sagittarius A* and in the nearby galaxy M87, the Event Horizon Telescope (EHT) provides deep, very high angular resolution data on other AGN sources too. The challenges of working with EHT data have spurred the development of new image reconstruction algorithms. This work briefly reviews the status of the EHT and its util… ▽ More

    Submitted 11 July, 2016; originally announced July 2016.

    Comments: 10 pages, proceedings contribution for Blazars through Sharp Multi-Wavelength Eyes, submitted to Galaxies

  42. arXiv:1607.02586  [pdf, other

    cs.CV cs.LG

    Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks

    Authors: Tianfan Xue, Jiajun Wu, Katherine L. Bouman, William T. Freeman

    Abstract: We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods, which have tackled this problem in a deterministic or non-parametric way, we propose a novel approach that models future frames in a probabilistic manner. Our probabilistic model makes it possible for us to sample and synthesize many possible future frames from a sin… ▽ More

    Submitted 9 July, 2016; originally announced July 2016.

    Comments: The first two authors contributed equally to this work

  43. arXiv:1605.01138  [pdf, other

    cs.AI cs.CV q-bio.NC

    A Comparative Evaluation of Approximate Probabilistic Simulation and Deep Neural Networks as Accounts of Human Physical Scene Understanding

    Authors: Renqiao Zhang, Jiajun Wu, Chengkai Zhang, William T. Freeman, Joshua B. Tenenbaum

    Abstract: Humans demonstrate remarkable abilities to predict physical events in complex scenes. Two classes of models for physical scene understanding have recently been proposed: "Intuitive Physics Engines", or IPEs, which posit that people make predictions by running approximate probabilistic simulations in causal mental models similar in nature to video-game physics engines, and memory-based models, whic… ▽ More

    Submitted 3 October, 2016; v1 submitted 4 May, 2016; originally announced May 2016.

    Comments: Accepted to CogSci 2016 as an oral presentation

  44. Single Image 3D Interpreter Network

    Authors: Jiajun Wu, Tianfan Xue, Joseph J. Lim, Yuandong Tian, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman

    Abstract: Understanding 3D object structure from a single image is an important but difficult task in computer vision, mostly due to the lack of 3D object annotations in real images. Previous work tackles this problem by either solving an optimization task given 2D keypoint positions, or training on synthetic data with ground truth 3D information. In this work, we propose 3D INterpreter Network (3D-INN), an… ▽ More

    Submitted 4 October, 2016; v1 submitted 29 April, 2016; originally announced April 2016.

    Comments: ECCV 2016 (oral). The first two authors contributed equally to this work

  45. arXiv:1512.08512  [pdf, other

    cs.CV cs.LG cs.SD

    Visually Indicated Sounds

    Authors: Andrew Owens, Phillip Isola, Josh McDermott, Antonio Torralba, Edward H. Adelson, William T. Freeman

    Abstract: Objects make distinctive sounds when they are hit or scratched. These sounds reveal aspects of an object's material properties, as well as the actions that produced them. In this paper, we propose the task of predicting what sound an object makes when struck as a way of studying physical interactions within a visual scene. We present an algorithm that synthesizes sound from silent videos of people… ▽ More

    Submitted 29 April, 2016; v1 submitted 28 December, 2015; originally announced December 2015.

  46. arXiv:1512.01413  [pdf, other

    astro-ph.IM astro-ph.GA cs.CV

    Computational Imaging for VLBI Image Reconstruction

    Authors: Katherine L. Bouman, Michael D. Johnson, Daniel Zoran, Vincent L. Fish, Sheperd S. Doeleman, William T. Freeman

    Abstract: Very long baseline interferometry (VLBI) is a technique for imaging celestial radio emissions by simultaneously observing a source from telescopes distributed across Earth. The challenges in reconstructing images from fine angular resolution VLBI data are immense. The data is extremely sparse and noisy, thus requiring statistical image models such as those designed in the computer vision community… ▽ More

    Submitted 7 November, 2016; v1 submitted 4 December, 2015; originally announced December 2015.

    Comments: Accepted for publication at CVPR 2016, Project Website: http://vlbiimaging.csail.mit.edu/, Video of Oral Presentation at CVPR June 2016: https://www.youtube.com/watch?v=YgB6o_d4tL8

    ACM Class: I.4, I.4.5, I.4.4, G.1.8, J.2

    Journal ref: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 913-922

  47. arXiv:1409.4690  [pdf, ps, other

    astro-ph.IM astro-ph.GA

    Imaging an Event Horizon: Mitigation of Scattering Toward Sagittarius A*

    Authors: Vincent L. Fish, Michael D. Johnson, Ru-Sen Lu, Sheperd S. Doeleman, Katherine L. Bouman, Daniel Zoran, William T. Freeman, Dimitrios Psaltis, Ramesh Narayan, Victor Pankratius, Avery E. Broderick, Carl R. Gwinn, Laura E. Vertatschitsch

    Abstract: The image of the emission surrounding the black hole in the center of the Milky Way is predicted to exhibit the imprint of general relativistic (GR) effects, including the existence of a shadow feature and a photon ring of diameter ~50 microarcseconds. Structure on these scales can be resolved by millimeter-wavelength very long baseline interferometry (VLBI). However, strong-field GR features of i… ▽ More

    Submitted 16 September, 2014; originally announced September 2014.

    Comments: 7 pages, accepted to ApJ

  48. Network analysis reveals distinct clinical syndromes underlying acute mountain sickness

    Authors: David P Hall, Ian JC MacCormick, Alex T Phythian-Adams, Nina M Rzechorzek, David Hope-Jones, Sorrel Cosens, Stewart Jackson, Matthew GD Bates, David J Collier, David A Hume, Thomas Freeman, AA Roger Thompson, J Kenneth Baillie

    Abstract: Acute mountain sickness (AMS) is a common problem among visitors at high altitude, and may progress to life-threatening pulmonary and cerebral oedema in a minority of cases. International consensus defines AMS as a constellation of subjective, non-specific symptoms. Specifically, headache, sleep disturbance, fatigue and dizziness are given equal diagnostic weighting. Different pathophysiological m… ▽ More

    Submitted 26 March, 2013; originally announced March 2013.

  49. arXiv:1210.4856  [pdf

    cs.LG stat.ML

    Exploiting compositionality to explore a large space of model structures

    Authors: Roger Grosse, Ruslan R Salakhutdinov, William T. Freeman, Joshua B. Tenenbaum

    Abstract: The recent proliferation of richly structured probabilistic models raises the question of how to automatically determine an appropriate model for a dataset. We investigate this question for a space of matrix decomposition models which can express a variety of widely used models from unsupervised learning. To enable model selection, we organize these models into a context-free grammar which generat… ▽ More

    Submitted 16 October, 2012; originally announced October 2012.

    Comments: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)

    Report number: UAI-P-2012-PG-306-315

  50. arXiv:0901.4275  [pdf, ps, other

    cs.IT

    Informative Sensing

    Authors: Hyun Sung Chang, Yair Weiss, William T. Freeman

    Abstract: Compressed sensing is a recent set of mathematical results showing that sparse signals can be exactly reconstructed from a small number of linear measurements. Interestingly, for ideal sparse signals with no measurement noise, random measurements allow perfect reconstruction while measurements based on principal component analysis (PCA) or independent component analysis (ICA) do not. At the same… ▽ More

    Submitted 27 January, 2009; originally announced January 2009.

    Comments: 26 pages; submitted to IEEE Transactions on Information Theory