Skip to main content

Showing 1–19 of 19 results for author: Genova, K

.
  1. arXiv:2312.03763  [pdf, other

    cs.CV cs.GR cs.LG

    Gaussian3Diff: 3D Gaussian Diffusion for 3D Full Head Synthesis and Editing

    Authors: Yushi Lan, Feitong Tan, Di Qiu, Qiangeng Xu, Kyle Genova, Zeng Huang, Sean Fanello, Rohit Pandey, Thomas Funkhouser, Chen Change Loy, Yinda Zhang

    Abstract: We present a novel framework for generating photorealistic 3D human head and subsequently manipulating and reposing them with remarkable flexibility. The proposed approach leverages an implicit function representation of 3D human heads, employing 3D Gaussians anchored on a parametric face model. To enhance representational capabilities and encode spatial information, we embed a lightweight tri-pla… ▽ More

    Submitted 19 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: project webpage: https://nirvanalan.github.io/projects/gaussian3diff/

  2. arXiv:2307.07511  [pdf, other

    cs.CV

    NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis

    Authors: Nilesh Kulkarni, Davis Rempe, Kyle Genova, Abhijit Kundu, Justin Johnson, David Fouhey, Leonidas Guibas

    Abstract: We address the problem of generating realistic 3D motions of humans interacting with objects in a scene. Our key idea is to create a neural interaction field attached to a specific object, which outputs the distance to the valid interaction manifold given a human pose as input. This interaction field guides the sampling of an object-conditioned human motion diffusion model, so as to encourage plau… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: Project Page with additional results available https://nileshkulkarni.github.io/nifty

  3. arXiv:2303.03361  [pdf, other

    cs.CV cs.GR

    Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation from 2D Supervision

    Authors: Xiaoshuai Zhang, Abhijit Kundu, Thomas Funkhouser, Leonidas Guibas, Hao Su, Kyle Genova

    Abstract: We address efficient and structure-aware 3D scene representation from images. Nerflets are our key contribution -- a set of local neural radiance fields that together represent a scene. Each nerflet maintains its own spatial position, orientation, and extent, within which it contributes to panoptic, density, and radiance reconstructions. By leveraging only photometric and inferred panoptic image s… ▽ More

    Submitted 10 March, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: accepted by CVPR 2023

  4. arXiv:2302.04862  [pdf, other

    cs.CV cs.LG

    Polynomial Neural Fields for Subband Decomposition and Manipulation

    Authors: Guandao Yang, Sagie Benaim, Varun Jampani, Kyle Genova, Jonathan T. Barron, Thomas Funkhouser, Bharath Hariharan, Serge Belongie

    Abstract: Neural fields have emerged as a new paradigm for representing signals, thanks to their ability to do it compactly while being easy to optimize. In most applications, however, neural fields are treated like black boxes, which precludes many signal manipulation tasks. In this paper, we propose a new class of neural fields called polynomial neural fields (PNFs). The key advantage of a PNF is that it… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

    Comments: Accepted to NeurIPS 2022

  5. arXiv:2211.15654  [pdf, other

    cs.CV

    OpenScene: 3D Scene Understanding with Open Vocabularies

    Authors: Songyou Peng, Kyle Genova, Chiyu "Max" Jiang, Andrea Tagliasacchi, Marc Pollefeys, Thomas Funkhouser

    Abstract: Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a model for a single task with supervision. We propose OpenScene, an alternative approach where a model predicts dense features for 3D scene points that are co-embedded with text and image pixels in CLIP feature space. This zero-shot approach enables task-agnostic training and open-vocabulary queries. For example, t… ▽ More

    Submitted 6 April, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: CVPR 2023. Project page: https://pengsongyou.github.io/openscene

  6. arXiv:2205.04334  [pdf, other

    cs.CV

    Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation

    Authors: Abhijit Kundu, Kyle Genova, Xiaoqi Yin, Alireza Fathi, Caroline Pantofaru, Leonidas Guibas, Andrea Tagliasacchi, Frank Dellaert, Thomas Funkhouser

    Abstract: We present Panoptic Neural Fields (PNF), an object-aware neural scene representation that decomposes a scene into a set of objects (things) and background (stuff). Each object is represented by an oriented 3D bounding box and a multi-layer perceptron (MLP) that takes position, direction, and time and outputs density and radiance. The background stuff is represented by a similar MLP that additional… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

    Comments: CVPR 2022 paper. See project page at https://abhijitkundu.info/projects/pnf

  7. arXiv:2111.13260  [pdf, other

    cs.CV cs.RO

    NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes

    Authors: Suhani Vora, Noha Radwan, Klaus Greff, Henning Meyer, Kyle Genova, Mehdi S. M. Sajjadi, Etienne Pot, Andrea Tagliasacchi, Daniel Duckworth

    Abstract: We present NeSF, a method for producing 3D semantic fields from posed RGB images alone. In place of classical 3D representations, our method builds on recent work in implicit neural scene representations wherein 3D structure is captured by point-wise functions. We leverage this methodology to recover 3D density fields upon which we then train a 3D semantic segmentation model supervised by posed 2D… ▽ More

    Submitted 2 December, 2021; v1 submitted 25 November, 2021; originally announced November 2021.

    Comments: Project website: https://nesf3d.github.io/. Updated with minor edits to text

  8. arXiv:2110.11325  [pdf, other

    cs.CV

    Learning 3D Semantic Segmentation with only 2D Image Supervision

    Authors: Kyle Genova, Xiaoqi Yin, Abhijit Kundu, Caroline Pantofaru, Forrester Cole, Avneesh Sud, Brian Brewington, Brian Shucker, Thomas Funkhouser

    Abstract: With the recent growth of urban map** and autonomous driving efforts, there has been an explosion of raw 3D data collected from terrestrial platforms with lidar scanners and color cameras. However, due to high labeling costs, ground-truth 3D semantic segmentation annotations are limited in both quantity and geographic diversity, while also being difficult to transfer across sensors. In contrast,… ▽ More

    Submitted 21 October, 2021; originally announced October 2021.

    Comments: Accepted to 3DV 2021 (Oral)

  9. arXiv:2109.05591  [pdf, other

    cs.CV

    Multiresolution Deep Implicit Functions for 3D Shape Representation

    Authors: Zhang Chen, Yinda Zhang, Kyle Genova, Sean Fanello, Sofien Bouaziz, Christian Haene, Ruofei Du, Cem Keskin, Thomas Funkhouser, Danhang Tang

    Abstract: We introduce Multiresolution Deep Implicit Functions (MDIF), a hierarchical representation that can recover fine geometry detail, while being able to perform global operations such as shape completion. Our model represents a complex 3D shape with a hierarchy of latent grids, which can be decoded into different levels of detail and also achieve better accuracy. For shape completion, we propose late… ▽ More

    Submitted 16 September, 2021; v1 submitted 12 September, 2021; originally announced September 2021.

    Comments: 8 pages of main paper, 10 pages of supplementary. Accepted by ICCV'21

  10. arXiv:2108.04886  [pdf, other

    cs.GR cs.CV

    Differentiable Surface Rendering via Non-Differentiable Sampling

    Authors: Forrester Cole, Kyle Genova, Avneesh Sud, Daniel Vlasic, Zhoutong Zhang

    Abstract: We present a method for differentiable rendering of 3D surfaces that supports both explicit and implicit representations, provides derivatives at occlusion boundaries, and is fast and simple to implement. The method first samples the surface using non-differentiable rasterization, then applies differentiable, depth-aware point splatting to produce the final image. Our approach requires no differen… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

    Comments: Accepted to ICCV 2021

  11. arXiv:2102.13090  [pdf, other

    cs.CV

    IBRNet: Learning Multi-View Image-Based Rendering

    Authors: Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul Srinivasan, Howard Zhou, Jonathan T. Barron, Ricardo Martin-Brualla, Noah Snavely, Thomas Funkhouser

    Abstract: We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views. The core of our method is a network architecture that includes a multilayer perceptron and a ray transformer that estimates radiance and volume density at continuous 5D locations (3D spatial locations and 2D viewing directions), drawing appearance information on the fly from multiple s… ▽ More

    Submitted 6 April, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

    Comments: CVPR 2021. Project page: https://ibrnet.github.io/

  12. arXiv:1912.06126  [pdf, other

    cs.CV cs.GR

    Local Deep Implicit Functions for 3D Shape

    Authors: Kyle Genova, Forrester Cole, Avneesh Sud, Aaron Sarna, Thomas Funkhouser

    Abstract: The goal of this project is to learn a 3D shape representation that enables accurate surface reconstruction, compact storage, efficient computation, consistency for similar shapes, generalization across diverse shape categories, and inference from depth camera observations. Towards this end, we introduce Local Deep Implicit Functions (LDIF), a 3D shape representation that decomposes space into a s… ▽ More

    Submitted 11 June, 2020; v1 submitted 12 December, 2019; originally announced December 2019.

    Comments: Camera ready version for CVPR 2020 Oral. Prior to review, this paper was referred to as DSIF, "Deep Structured Implicit Functions." 11 pages, 9 figures. Project video at https://youtu.be/3RAITzNWVJs

  13. arXiv:1911.11834  [pdf, other

    cs.CV

    Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation

    Authors: Zeyu Wang, Klint Qinami, Ioannis Christos Karakozis, Kyle Genova, Prem Nair, Kenji Hata, Olga Russakovsky

    Abstract: Computer vision models learn to perform a task by capturing relevant statistics from training data. It has been shown that models learn spurious age, gender, and race correlations when trained for seemingly unrelated tasks like activity recognition or image captioning. Various mitigation techniques have been presented to prevent models from utilizing or learning such biases. However, there has bee… ▽ More

    Submitted 2 April, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

    Comments: To appear in CVPR 2020

  14. arXiv:1909.05736  [pdf, other

    cs.CV cs.GR cs.LG

    CvxNet: Learnable Convex Decomposition

    Authors: Boyang Deng, Kyle Genova, Soroosh Yazdani, Sofien Bouaziz, Geoffrey Hinton, Andrea Tagliasacchi

    Abstract: Any solid object can be decomposed into a collection of convex polytopes (in short, convexes). When a small number of convexes are used, such a decomposition can be thought of as a piece-wise approximation of the geometry. This decomposition is fundamental in computer graphics, where it provides one of the most common ways to approximate geometry, for example, in real-time physics simulation. A co… ▽ More

    Submitted 12 April, 2020; v1 submitted 12 September, 2019; originally announced September 2019.

  15. arXiv:1906.01524  [pdf, other

    cs.CV cs.GR cs.LG

    Text-based Editing of Talking-head Video

    Authors: Ohad Fried, Ayush Tewari, Michael Zollhöfer, Adam Finkelstein, Eli Shechtman, Dan B Goldman, Kyle Genova, Zeyu **, Christian Theobalt, Maneesh Agrawala

    Abstract: Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video wi… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

    Comments: A version with higher resolution images can be downloaded from the authors' website

  16. arXiv:1904.06447  [pdf, other

    cs.CV cs.GR

    Learning Shape Templates with Structured Implicit Functions

    Authors: Kyle Genova, Forrester Cole, Daniel Vlasic, Aaron Sarna, William T. Freeman, Thomas Funkhouser

    Abstract: Template 3D shapes are useful for many tasks in graphics and vision, including fitting observation data, analyzing shape collections, and transferring shape attributes. Because of the variety of geometry and topology of real-world shapes, previous methods generally use a library of hand-made templates. In this paper, we investigate learning a general shape template from data. To allow for widely v… ▽ More

    Submitted 12 April, 2019; originally announced April 2019.

    Comments: 12 pages, 9 figures, 4 tables

  17. arXiv:1806.06098  [pdf, other

    cs.CV

    Unsupervised Training for 3D Morphable Model Regression

    Authors: Kyle Genova, Forrester Cole, Aaron Maschinot, Aaron Sarna, Daniel Vlasic, William T. Freeman

    Abstract: We present a method for training a regression network from image pixels to 3D morphable model coordinates using only unlabeled photographs. The training loss is based on features from a facial recognition network, computed on-the-fly by rendering the predicted faces with a differentiable renderer. To make training from features feasible and avoid network fooling effects, we introduce three objecti… ▽ More

    Submitted 15 June, 2018; originally announced June 2018.

    Comments: CVPR 2018 version with supplemental material (http://openaccess.thecvf.com/content_cvpr_2018/html/Genova_Unsupervised_Training_for_CVPR_2018_paper.html)

    Journal ref: Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8377-8386

  18. arXiv:1704.02393  [pdf, other

    cs.CV

    Learning Where to Look: Data-Driven Viewpoint Set Selection for 3D Scenes

    Authors: Kyle Genova, Manolis Savva, Angel X. Chang, Thomas Funkhouser

    Abstract: The use of rendered images, whether from completely synthetic datasets or from 3D reconstructions, is increasingly prevalent in vision tasks. However, little attention has been given to how the selection of viewpoints affects the performance of rendered training sets. In this paper, we propose a data-driven approach to view set selection. Given a set of example images, we extract statistics descri… ▽ More

    Submitted 7 April, 2017; originally announced April 2017.

    Comments: ICCV submission, combined main paper and supplemental material

  19. arXiv:1506.07776  [pdf, other

    cs.DS math.OC

    An Experimental Evaluation of the Best-of-Many Christofides' Algorithm for the Traveling Salesman Problem

    Authors: Kyle Genova, David P. Williamson

    Abstract: Recent papers on approximation algorithms for the traveling salesman problem (TSP) have given a new variant on the well-known Christofides' algorithm for the TSP, called the Best-of-Many Christofides' algorithm. The algorithm involves sampling a spanning tree from the solution the standard LP relaxation of the TSP, subject to the condition that each edge is sampled with probability at most its val… ▽ More

    Submitted 25 June, 2015; originally announced June 2015.

    Comments: An extended abstract of this paper will appear in ESA 2015