Skip to main content

Showing 1–42 of 42 results for author: Fouhey, D

.
  1. arXiv:2407.06192  [pdf, other

    cs.CV cs.AI cs.CL

    Multi-Object Hallucination in Vision-Language Models

    Authors: Xuweiyi Chen, Ziqiao Ma, Xuejun Zhang, Sihan Xu, Shengyi Qian, Jianing Yang, David F. Fouhey, Joyce Chai

    Abstract: Large vision language models (LVLMs) often suffer from object hallucination, producing objects not present in the given images. While current benchmarks for object hallucination primarily concentrate on the presence of a single object class rather than individual entities, this work systematically investigates multi-object hallucination, examining how models misperceive (e.g., invent nonexistent o… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted to ALVR @ ACL 2024 | Project page: https://multi-object-hallucination.github.io/

  2. arXiv:2406.18158  [pdf, other

    cs.RO cs.CV

    3D-MVP: 3D Multiview Pretraining for Robotic Manipulation

    Authors: Shengyi Qian, Kaichun Mo, Valts Blukis, David F. Fouhey, Dieter Fox, Ankit Goyal

    Abstract: Recent works have shown that visual pretraining on egocentric datasets using masked autoencoders (MAE) can improve generalization for downstream robotics tasks. However, these approaches pretrain only on 2D images, while many robotics applications require 3D scene understanding. In this work, we propose 3D-MVP, a novel approach for 3D multi-view pretraining using masked autoencoders. We leverage R… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.05132  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.RO

    3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

    Authors: Jianing Yang, Xuweiyi Chen, Nikhil Madaan, Madhavan Iyengar, Shengyi Qian, David F. Fouhey, Joyce Chai

    Abstract: The integration of language and 3D perception is crucial for develo** embodied agents and robots that comprehend and interact with the physical world. While large language models (LLMs) have demonstrated impressive language understanding and generation capabilities, their adaptation to 3D environments (3D-LLMs) remains in its early stages. A primary challenge is the absence of large-scale datase… ▽ More

    Submitted 12 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Project website: https://3d-grand.github.io

  4. arXiv:2403.08768  [pdf, other

    cs.CV

    3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surface

    Authors: Linyi **, Nilesh Kulkarni, David Fouhey

    Abstract: This paper introduces 3DFIRES, a novel system for scene-level 3D reconstruction from posed images. Designed to work with as few as one view, 3DFIRES reconstructs the complete geometry of unseen scenes, including hidden surfaces. With multiple view inputs, our method produces full reconstruction within all camera frustums. A key feature of our approach is the fusion of multi-view information at the… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024. Project Page https://**linyi.github.io/3DFIRES/

  5. arXiv:2403.03221  [pdf, other

    cs.CV

    FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation

    Authors: Chris Rockwell, Nilesh Kulkarni, Linyi **, Jeong Joon Park, Justin Johnson, David F. Fouhey

    Abstract: Estimating relative camera poses between images has been a central problem in computer vision. Methods that find correspondences and solve for the fundamental matrix offer high precision in most cases. Conversely, methods predicting pose directly using neural networks are more robust to limited overlap and can infer absolute translation scale, but at the expense of reduced precision. We show how t… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024. Project Page: https://crockwell.github.io/far/

  6. arXiv:2312.05251  [pdf, other

    cs.CV

    Reconstructing Hands in 3D with Transformers

    Authors: Georgios Pavlakos, Dandan Shan, Ilija Radosavovic, Angjoo Kanazawa, David Fouhey, Jitendra Malik

    Abstract: We present an approach that can reconstruct hands in 3D from monocular input. Our approach for Hand Mesh Recovery, HaMeR, follows a fully transformer-based architecture and can analyze hands with significantly increased accuracy and robustness compared to previous work. The key to HaMeR's success lies in scaling up both the data used for training and the capacity of the deep network for hand recon… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  7. arXiv:2309.12311  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.RO

    LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent

    Authors: Jianing Yang, Xuweiyi Chen, Shengyi Qian, Nikhil Madaan, Madhavan Iyengar, David F. Fouhey, Joyce Chai

    Abstract: 3D visual grounding is a critical skill for household robots, enabling them to navigate, manipulate objects, and answer questions based on their environment. While existing approaches often rely on extensive labeled data or exhibit limitations in handling complex language queries, we propose LLM-Grounder, a novel zero-shot, open-vocabulary, Large Language Model (LLM)-based 3D visual grounding pipe… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: Project website: https://chat-with-nerf.github.io/

  8. arXiv:2307.07511  [pdf, other

    cs.CV

    NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis

    Authors: Nilesh Kulkarni, Davis Rempe, Kyle Genova, Abhijit Kundu, Justin Johnson, David Fouhey, Leonidas Guibas

    Abstract: We address the problem of generating realistic 3D motions of humans interacting with objects in a scene. Our key idea is to create a neural interaction field attached to a specific object, which outputs the distance to the valid interaction manifold given a human pose as input. This interaction field guides the sampling of an object-conditioned human motion diffusion model, so as to encourage plau… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: Project Page with additional results available https://nileshkulkarni.github.io/nifty

  9. arXiv:2306.08731  [pdf, other

    cs.CV

    EPIC Fields: Marrying 3D Geometry and Video Understanding

    Authors: Vadim Tschernezki, Ahmad Darkhalil, Zhifan Zhu, David Fouhey, Iro Laina, Diane Larlus, Dima Damen, Andrea Vedaldi

    Abstract: Neural rendering is fuelling a unification of learning, 3D geometry and video understanding that has been waiting for more than two decades. Progress, however, is still hampered by a lack of suitable datasets and benchmarks. To address this gap, we introduce EPIC Fields, an augmentation of EPIC-KITCHENS with 3D camera information. Like other datasets for neural rendering, EPIC Fields removes the c… ▽ More

    Submitted 1 February, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: Published at NeurIPS 2023. 24 pages, 15 figures. Project Webpage: http://epic-kitchens.github.io/epic-fields

  10. arXiv:2306.08671  [pdf, other

    cs.CV

    Learning to Predict Scene-Level Implicit 3D from Posed RGBD Data

    Authors: Nilesh Kulkarni, Linyi **, Justin Johnson, David F. Fouhey

    Abstract: We introduce a method that can learn to predict scene-level implicit functions for 3D reconstruction from posed RGBD data. At test time, our system maps a previously unseen RGB image to a 3D reconstruction of a scene via implicit functions. While implicit functions for 3D reconstruction have often been tied to meshes, we show that we can train one using only a set of posed RGBD images. This settin… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: Project page this https://nileshkulkarni.github.io/d2drdf/

  11. arXiv:2305.09664  [pdf, other

    cs.CV

    Understanding 3D Object Interaction from a Single Image

    Authors: Shengyi Qian, David F. Fouhey

    Abstract: Humans can easily understand a single image as depicting multiple potential objects permitting interaction. We use this skill to plan our interactions with the world and accelerate understanding new objects without engaging in interaction. In this paper, we would like to endow machines with the similar ability, so that intelligent agents can better explore the 3D scene or manipulate objects. Our a… ▽ More

    Submitted 4 August, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

    Comments: ICCV 2023

  12. arXiv:2212.03239  [pdf, other

    cs.CV

    Perspective Fields for Single Image Camera Calibration

    Authors: Linyi **, Jianming Zhang, Yannick Hold-Geoffroy, Oliver Wang, Kevin Matzen, Matthew Sticha, David F. Fouhey

    Abstract: Geometric camera calibration is often required for applications that understand the perspective of the image. We propose perspective fields as a representation that models the local perspective properties of an image. Perspective Fields contain per-pixel information about the camera view, parameterized as an up vector and a latitude value. This representation has a number of advantages as it makes… ▽ More

    Submitted 16 March, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

    Comments: CVPR 2023 Camera Ready. Project Page https://**linyi.github.io/PerspectiveFields/

  13. arXiv:2209.15036  [pdf, other

    astro-ph.SR astro-ph.IM cs.CV

    Large-Scale Spatial Cross-Calibration of Hinode/SOT-SP and SDO/HMI

    Authors: David F. Fouhey, Richard E. L. Higgins, Spiro K. Antiochos, Graham Barnes, Marc L. DeRosa, J. Todd Hoeksema, K. D. Leka, Yang Liu, Peter W. Schuck, Tamas I. Gombosi

    Abstract: We investigate the cross-calibration of the Hinode/SOT-SP and SDO/HMI instrument meta-data, specifically the correspondence of the scaling and pointing information. Accurate calibration of these datasets gives the correspondence needed by inter-instrument studies and learning-based magnetogram systems, and is required for physically-meaningful photospheric magnetic field vectors. We approach the p… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

    Comments: Under revisions at ApJS

  14. arXiv:2209.13064  [pdf, other

    cs.CV cs.AI cs.LG

    EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations

    Authors: Ahmad Darkhalil, Dandan Shan, Bin Zhu, Jian Ma, Amlan Kar, Richard Higgins, Sanja Fidler, David Fouhey, Dima Damen

    Abstract: We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transf… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: 10 pages main, 38 pages appendix. Accepted at NeurIPS 2022 Track on Datasets and Benchmarks Data, code and leaderboards from: http://epic-kitchens.github.io/VISOR

  15. arXiv:2208.08988  [pdf, other

    cs.CV

    The 8-Point Algorithm as an Inductive Bias for Relative Pose Prediction by ViTs

    Authors: Chris Rockwell, Justin Johnson, David F. Fouhey

    Abstract: We present a simple baseline for directly estimating the relative pose (rotation and translation, including scale) between two images. Deep methods have recently shown strong progress but often require complex or multi-stage architectures. We show that a handful of modifications can be applied to a Vision Transformer (ViT) to bring its computations close to the Eight-Point Algorithm. This inductiv… ▽ More

    Submitted 23 January, 2023; v1 submitted 18 August, 2022; originally announced August 2022.

    Comments: Accepted to 3DV 2022; Project Page: https://crockwell.github.io/rel_pose/ Revision: Fixed Epipolar Lines in Figure 3, Figure 10

  16. arXiv:2208.04307  [pdf, other

    cs.CV

    PlaneFormers: From Sparse View Planes to 3D Reconstruction

    Authors: Samir Agarwala, Linyi **, Chris Rockwell, David F. Fouhey

    Abstract: We present an approach for the planar surface reconstruction of a scene from images with limited overlap. This reconstruction task is challenging since it requires jointly reasoning about single image 3D reconstruction, correspondence between images, and the relative camera pose between images. Past work has proposed optimization-based approaches. We introduce a simpler approach, the PlaneFormer,… ▽ More

    Submitted 8 August, 2022; originally announced August 2022.

    Comments: Accepted to ECCV 2022

  17. arXiv:2204.12489  [pdf, other

    cs.CV cs.SD eess.AS

    Sound Localization by Self-Supervised Time Delay Estimation

    Authors: Ziyang Chen, David F. Fouhey, Andrew Owens

    Abstract: Sounds reach one microphone in a stereo pair sooner than the other, resulting in an interaural time delay that conveys their directions. Estimating a sound's time delay requires finding correspondences between the signals recorded by each microphone. We propose to learn these correspondences through self-supervision, drawing on recent techniques from visual tracking. We adapt the contrastive rando… ▽ More

    Submitted 28 January, 2023; v1 submitted 26 April, 2022; originally announced April 2022.

    Comments: ECCV 2022

  18. arXiv:2203.16531  [pdf, other

    cs.CV

    Understanding 3D Object Articulation in Internet Videos

    Authors: Shengyi Qian, Linyi **, Chris Rockwell, Siyi Chen, David F. Fouhey

    Abstract: We propose to investigate detecting and characterizing the 3D planar articulation of objects from ordinary videos. While seemingly easy for humans, this problem poses many challenges for computers. We propose to approach this problem by combining a top-down detection system that finds planes that can be articulated along with an optimization approach that solves for a 3D plane that can explain a s… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  19. arXiv:2112.04481  [pdf, other

    cs.CV cs.GR

    What's Behind the Couch? Directed Ray Distance Functions (DRDF) for 3D Scene Reconstruction

    Authors: Nilesh Kulkarni, Justin Johnson, David F. Fouhey

    Abstract: We present an approach for full 3D scene reconstruction from a single unseen image. We train on dataset of realistic non-watertight scans of scenes. Our approach predicts a distance function, since these have shown promise in handling complex topologies and large spaces. We identify and analyze two key challenges for predicting such image conditioned distance functions that have prevented their su… ▽ More

    Submitted 4 April, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: Updated illustrations for method section. Project Page see https://nileshkulkarni.github.io/scene_drdf

  20. arXiv:2112.01520  [pdf, other

    cs.CV

    Recognizing Scenes from Novel Viewpoints

    Authors: Shengyi Qian, Alexander Kirillov, Nikhila Ravi, Devendra Singh Chaplot, Justin Johnson, David F. Fouhey, Georgia Gkioxari

    Abstract: Humans can perceive scenes in 3D from a handful of 2D views. For AI agents, the ability to recognize a scene from any viewpoint given only a few images enables them to efficiently interact with the scene and its objects. In this work, we attempt to endow machines with this ability. We propose a model which takes as input a few RGB images of a new scene and recognizes the scene from novel viewpoint… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

  21. arXiv:2108.12530  [pdf

    cs.LG cs.AI cs.CV

    Combining chest X-rays and electronic health record (EHR) data using machine learning to diagnose acute respiratory failure

    Authors: Sarah Jabbour, David Fouhey, Ella Kazerooni, Jenna Wiens, Michael W Sjoding

    Abstract: Objective: When patients develop acute respiratory failure, accurately identifying the underlying etiology is essential for determining the best treatment. However, differentiating between common medical diagnoses can be challenging in clinical practice. Machine learning models could improve medical diagnosis by aiding in the diagnostic evaluation of these patients. Materials and Methods: Machine… ▽ More

    Submitted 20 April, 2022; v1 submitted 27 August, 2021; originally announced August 2021.

  22. arXiv:2108.12421  [pdf, other

    astro-ph.IM astro-ph.SR cs.CV

    SynthIA: A Synthetic Inversion Approximation for the Stokes Vector Fusing SDO and Hinode into a Virtual Observatory

    Authors: Richard E. L. Higgins, David F. Fouhey, Spiro K. Antiochos, Graham Barnes, Mark C. M. Cheung, J. Todd Hoeksema, KD Leka, Yang Liu, Peter W. Schuck, Tamas I. Gombosi

    Abstract: Both NASA's Solar Dynamics Observatory (SDO) and the JAXA/NASA Hinode mission include spectropolarimetric instruments designed to measure the photospheric magnetic field. SDO's Helioseismic and Magnetic Imager (HMI) emphasizes full-disk high-cadence and good spatial resolution data acquisition while Hinode's Solar Optical Telescope Spectro-Polarimeter (SOT-SP) focuses on high spatial resolution an… ▽ More

    Submitted 27 August, 2021; originally announced August 2021.

  23. arXiv:2108.05892  [pdf, other

    cs.CV

    PixelSynth: Generating a 3D-Consistent Experience from a Single Image

    Authors: Chris Rockwell, David F. Fouhey, Justin Johnson

    Abstract: Recent advancements in differentiable rendering and 3D reasoning have driven exciting results in novel view synthesis from a single image. Despite realistic results, methods are limited to relatively small view change. In order to synthesize immersive scenes, models must also be able to extrapolate. We present an approach that fuses 3D reasoning with autoregressive modeling to outpaint large view… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

    Comments: In ICCV 2021

  24. arXiv:2105.01061  [pdf, other

    cs.CV

    Collision Replay: What Does Bum** Into Things Tell You About Scene Geometry?

    Authors: Alexander Raistrick, Nilesh Kulkarni, David F. Fouhey

    Abstract: What does bum** into things in a scene tell you about scene geometry? In this paper, we investigate the idea of learning from collisions. At the heart of our approach is the idea of collision replay, where we use examples of a collision to provide supervision for observations at a past frame. We use collision replay to train convolutional neural networks to predict a distribution over collision… ▽ More

    Submitted 3 May, 2021; originally announced May 2021.

  25. arXiv:2103.17273  [pdf, other

    astro-ph.SR astro-ph.IM cs.CV

    Fast and Accurate Emulation of the SDO/HMI Stokes Inversion with Uncertainty Quantification

    Authors: Richard E. L. Higgins, David F. Fouhey, Dichang Zhang, Spiro K. Antiochos, Graham Barnes, J. Todd Hoeksema, K. D. Leka, Yang Liu, Peter W. Schuck, Tamas I. Gombosi

    Abstract: The Helioseismic and Magnetic Imager (HMI) onboard NASA's Solar Dynamics Observatory (SDO) produces estimates of the photospheric magnetic field which are a critical input to many space weather modelling and forecasting systems. The magnetogram products produced by HMI and its analysis pipeline are the result of a per-pixel optimization that estimates solar atmospheric parameters and minimizes dis… ▽ More

    Submitted 27 August, 2021; v1 submitted 31 March, 2021; originally announced March 2021.

  26. arXiv:2103.14644  [pdf, other

    cs.CV

    Planar Surface Reconstruction from Sparse Views

    Authors: Linyi **, Shengyi Qian, Andrew Owens, David F. Fouhey

    Abstract: The paper studies planar surface reconstruction of indoor scenes from two views with unknown camera poses. While prior approaches have successfully created object-centric reconstructions of many scenes, they fail to exploit other structures, such as planes, which are typically the dominant components of indoor scenes. In this paper, we reconstruct planar surfaces from multiple views, while jointly… ▽ More

    Submitted 20 August, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

    Comments: Accepted to ICCV 2021 (Oral Presentation)

  27. arXiv:2009.10132  [pdf, other

    cs.CV cs.AI cs.LG

    Deep Learning Applied to Chest X-Rays: Exploiting and Preventing Shortcuts

    Authors: Sarah Jabbour, David Fouhey, Ella Kazerooni, Michael W. Sjoding, Jenna Wiens

    Abstract: While deep learning has shown promise in improving the automated diagnosis of disease based on chest X-rays, deep networks may exhibit undesirable behavior related to shortcuts. This paper studies the case of spurious class skew in which patients with a particular attribute are spuriously more likely to have the outcome of interest. For instance, clinical protocols might lead to a dataset in which… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

    Comments: 32 pages, 9 figures, 12 tables, MLHC 2020

  28. arXiv:2008.06046  [pdf, other

    cs.CV

    Full-Body Awareness from Partial Observations

    Authors: Chris Rockwell, David F. Fouhey

    Abstract: There has been great progress in human 3D mesh recovery and great interest in learning about the world from consumer video data. Unfortunately current methods for 3D human mesh recovery work rather poorly on consumer video data, since on the Internet, unusual camera viewpoints and aggressive truncations are the norm rather than a rarity. We study this problem and make a number of contributions to… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

    Comments: In ECCV 2020

  29. arXiv:2007.13727  [pdf, other

    cs.CV

    Associative3D: Volumetric Reconstruction from Sparse Views

    Authors: Shengyi Qian, Linyi **, David F. Fouhey

    Abstract: This paper studies the problem of 3D volumetric reconstruction from two views of a scene with an unknown camera. While seemingly easy for humans, this problem poses many challenges for computers since it requires simultaneously reconstructing objects in the two views while also figuring out their relationship. We propose a new approach that estimates reconstructions, distributions over the camera/… ▽ More

    Submitted 27 July, 2020; originally announced July 2020.

    Comments: ECCV 2020

  30. arXiv:2006.06669  [pdf, other

    cs.CV

    Understanding Human Hands in Contact at Internet Scale

    Authors: Dandan Shan, Jiaqi Geng, Michelle Shu, David F. Fouhey

    Abstract: Hands are the central means by which humans manipulate their world and being able to reliably extract hand state information from Internet videos of humans engaged in their hands has the potential to pave the way to systems that can learn from petabytes of video data. This paper proposes steps towards this by inferring a rich representation of hands engaged in interaction method that includes: han… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

    Comments: To appear at CVPR 2020 (Oral). Project and dataset webpage: http://fouheylab.eecs.umich.edu/~dandans/projects/100DOH/

  31. arXiv:2006.03586  [pdf, other

    cs.CV

    Novel Object Viewpoint Estimation through Reconstruction Alignment

    Authors: Mohamed El Banani, Jason J. Corso, David F. Fouhey

    Abstract: The goal of this paper is to estimate the viewpoint for a novel object. Standard viewpoint estimation approaches generally fail on this task due to their reliance on a 3D model for alignment or large amounts of class-specific training data and their corresponding canonical pose. We overcome those limitations by learning a reconstruct and align approach. Our key insight is that although we do not h… ▽ More

    Submitted 5 June, 2020; originally announced June 2020.

    Comments: To appear at CVPR 2020. Project page: https://mbanani.github.io/novelviewpoints/

  32. arXiv:2004.00614  [pdf, other

    cs.CV

    Articulation-aware Canonical Surface Map**

    Authors: Nilesh Kulkarni, Abhinav Gupta, David F. Fouhey, Shubham Tulsiani

    Abstract: We tackle the tasks of: 1) predicting a Canonical Surface Map** (CSM) that indicates the map** from 2D pixels to corresponding points on a canonical template shape, and 2) inferring the articulation and pose of the template corresponding to the input image. While previous approaches rely on keypoint supervision for learning, we present an approach that can learn without such annotations. Our k… ▽ More

    Submitted 26 May, 2020; v1 submitted 1 April, 2020; originally announced April 2020.

    Comments: To appear at CVPR 2020, project page https://nileshkulkarni.github.io/acsm/

  33. arXiv:1903.08225  [pdf, other

    cs.CV

    Cross-task weakly supervised learning from instructional videos

    Authors: Dimitri Zhukov, Jean-Baptiste Alayrac, Ramazan Gokberk Cinbis, David Fouhey, Ivan Laptev, Josef Sivic

    Abstract: In this paper we investigate learning visual models for the steps of ordinary tasks using weak supervision via instructional narrations and an ordered list of steps instead of strong supervision via temporal annotations. At the heart of our approach is the observation that weakly supervised learning may be easier if a model shares components while learning different steps: `pour egg' should be tra… ▽ More

    Submitted 29 April, 2019; v1 submitted 19 March, 2019; originally announced March 2019.

    Comments: 18 pages, 17 figures, to be published in proceedings of the CVPR, 2019

  34. arXiv:1903.04538  [pdf, other

    astro-ph.SR cs.AI cs.DB cs.LG

    A Machine Learning Dataset Prepared From the NASA Solar Dynamics Observatory Mission

    Authors: Richard Galvez, David F. Fouhey, Meng **, Alexandre Szenicer, Andrés Muñoz-Jaramillo, Mark C. M. Cheung, Paul J. Wright, Monica G. Bobra, Yang Liu, James Mason, Rajat Thomas

    Abstract: In this paper we present a curated dataset from the NASA Solar Dynamics Observatory (SDO) mission in a format suitable for machine learning research. Beginning from level 1 scientific products we have processed various instrumental corrections, downsampled to manageable spatial and temporal resolutions, and synchronized observations spatially and temporally. We illustrate the use of this dataset w… ▽ More

    Submitted 11 March, 2019; originally announced March 2019.

    Comments: Accepted to The Astrophysical Journal Supplement Series; 11 pages, 8 figures

  35. arXiv:1812.00940  [pdf, other

    cs.CV cs.LG cs.RO

    Visual Memory for Robust Path Following

    Authors: Ashish Kumar, Saurabh Gupta, David Fouhey, Sergey Levine, Jitendra Malik

    Abstract: Humans routinely retrace paths in a novel environment both forwards and backwards despite uncertainty in their motion. This paper presents an approach for doing so. Given a demonstration of a path, a first network generates a path abstraction. Equipped with this abstraction, a second network observes the world and decides how to act to retrace the path under noisy actuation and a changing environm… ▽ More

    Submitted 3 December, 2018; originally announced December 2018.

    Comments: Neural Information Processing Systems (NeurIPS) 2018. Oral Presentation

  36. arXiv:1712.08125  [pdf, other

    cs.CV cs.LG cs.RO

    Unifying Map and Landmark Based Representations for Visual Navigation

    Authors: Saurabh Gupta, David Fouhey, Sergey Levine, Jitendra Malik

    Abstract: This works presents a formulation for visual navigation that unifies map based spatial reasoning and path planning, with landmark based robust plan execution in noisy environments. Our proposed formulation is learned from data and is thus able to leverage statistical regularities of the world. This allows it to efficiently navigate in novel environments given only a sparse set of registered images… ▽ More

    Submitted 21 December, 2017; originally announced December 2017.

    Comments: Project page with videos: https://s-gupta.github.io/cmpl/

  37. arXiv:1712.02310  [pdf, other

    cs.CV

    From Lifestyle Vlogs to Everyday Interactions

    Authors: David F. Fouhey, Wei-cheng Kuo, Alexei A. Efros, Jitendra Malik

    Abstract: A major stumbling block to progress in understanding basic human interactions, such as getting out of bed or opening a refrigerator, is lack of good training data. Most past efforts have gathered this data explicitly: starting with a laundry list of action labels, and then querying search engines for videos tagged with each label. In this work, we do the reverse and search implicitly: we start wit… ▽ More

    Submitted 6 December, 2017; originally announced December 2017.

    Comments: Project page at: http://people.eecs.berkeley.edu/~dfouhey/2017/VLOG/

  38. arXiv:1712.01812  [pdf, other

    cs.CV

    Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene

    Authors: Shubham Tulsiani, Saurabh Gupta, David Fouhey, Alexei A. Efros, Jitendra Malik

    Abstract: The goal of this paper is to take a single 2D image of a scene and recover the 3D structure in terms of a small set of factors: a layout representing the enclosing surfaces as well as a set of objects represented in terms of shape and pose. We propose a convolutional neural network-based approach to predict this representation and benchmark it on a large dataset of indoor scenes. Our experiments e… ▽ More

    Submitted 24 April, 2018; v1 submitted 5 December, 2017; originally announced December 2017.

    Comments: Project url with code: https://shubhtuls.github.io/factored3d

  39. arXiv:1612.06836  [pdf, other

    cs.CV

    From Images to 3D Shape Attributes

    Authors: David F. Fouhey, Abhinav Gupta, Andrew Zisserman

    Abstract: Our goal in this paper is to investigate properties of 3D shape that can be determined from a single image. We define 3D shape attributes -- generic properties of the shape that capture curvature, contact and occupied space. Our first objective is to infer these 3D shape attributes from a single image. A second objective is to infer a 3D shape embedding -- a low dimensional vector representing the… ▽ More

    Submitted 3 December, 2017; v1 submitted 20 December, 2016; originally announced December 2016.

    Comments: Updated based on TPAMI reviews: title changed, sections reordered, moderate modifications throughout text

  40. arXiv:1603.08637  [pdf, other

    cs.CV

    Learning a Predictable and Generative Vector Representation for Objects

    Authors: Rohit Girdhar, David F. Fouhey, Mikel Rodriguez, Abhinav Gupta

    Abstract: What is a good vector representation of an object? We believe that it should be generative in 3D, in the sense that it can produce new 3D objects; as well as be predictable from 2D, in the sense that it can be perceived from 2D images. We propose a novel architecture, called the TL-embedding network, to learn an embedding space with these properties. The network consists of two components: (a) an… ▽ More

    Submitted 31 August, 2016; v1 submitted 29 March, 2016; originally announced March 2016.

    Comments: To appear in ECCV 2016. Project webpage: rohitgirdhar.github.io/GenerativePredictableVoxels/

  41. arXiv:1505.01085  [pdf, other

    cs.CV

    In Defense of the Direct Perception of Affordances

    Authors: David F. Fouhey, Xiaolong Wang, Abhinav Gupta

    Abstract: The field of functional recognition or affordance estimation from images has seen a revival in recent years. As originally proposed by Gibson, the affordances of a scene were directly perceived from the ambient light: in other words, functional properties like sittable were estimated directly from incoming pixels. Recent work, however, has taken a mediated approach in which affordances are derived… ▽ More

    Submitted 5 May, 2015; originally announced May 2015.

  42. arXiv:1411.4958  [pdf, other

    cs.CV

    Designing Deep Networks for Surface Normal Estimation

    Authors: Xiaolong Wang, David F. Fouhey, Abhinav Gupta

    Abstract: In the past few years, convolutional neural nets (CNN) have shown incredible promise for learning visual representations. In this paper, we use CNNs for the task of predicting surface normals from a single image. But what is the right architecture we should use? We propose to build upon the decades of hard work in 3D scene understanding, to design new CNN architecture for the task of surface norma… ▽ More

    Submitted 18 November, 2014; originally announced November 2014.