Skip to main content

Showing 1–50 of 67 results for author: Tagliasacchi, A

.
  1. arXiv:2406.20055  [pdf, other

    cs.CV cs.LG

    SpotlessSplats: Ignoring Distractors in 3D Gaussian Splatting

    Authors: Sara Sabour, Lily Goli, George Kopanas, Mark Matthews, Dmitry Lagun, Leonidas Guibas, Alec Jacobson, David J. Fleet, Andrea Tagliasacchi

    Abstract: 3D Gaussian Splatting (3DGS) is a promising technique for 3D reconstruction, offering efficient training and rendering speeds, making it suitable for real-time applications.However, current methods require highly controlled environments (no moving people or wind-blown elements, and consistent lighting) to meet the inter-view consistency assumption of 3DGS. This makes reconstruction of real-world c… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2404.13024  [pdf, other

    cs.CV eess.IV

    BANF: Band-limited Neural Fields for Levels of Detail Reconstruction

    Authors: Ahan Shabanov, Shrisudhan Govindarajan, Cody Reading, Lily Goli, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi

    Abstract: Largely due to their implicit nature, neural fields lack a direct mechanism for filtering, as Fourier analysis from discrete signal processing is not directly applicable to these representations. Effective filtering of neural fields is critical to enable level-of-detail processing in downstream applications, and support operations that involve sampling the field on regular grids (e.g. marching cub… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: Project Page: https://theialab.github.io/banf

  3. arXiv:2404.12547  [pdf, other

    cs.CV

    Evaluating Alternatives to SFM Point Cloud Initialization for Gaussian Splatting

    Authors: Yalda Foroutan, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi

    Abstract: 3D Gaussian Splatting has recently been embraced as a versatile and effective method for scene reconstruction and novel view synthesis, owing to its high-quality results and compatibility with hardware rasterization. Despite its advantages, Gaussian Splatting's reliance on high-quality point cloud initialization by Structure-from-Motion (SFM) algorithms is a significant limitation to be overcome.… ▽ More

    Submitted 23 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  4. arXiv:2404.09591  [pdf, other

    cs.CV

    3D Gaussian Splatting as Markov Chain Monte Carlo

    Authors: Shakiba Kheradmand, Daniel Rebain, Gopal Sharma, Weiwei Sun, Jeff Tseng, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, Kwang Moo Yi

    Abstract: While 3D Gaussian Splatting has recently become popular for neural rendering, current methods rely on carefully engineered cloning and splitting strategies for placing Gaussians, which can lead to poor-quality renderings, and reliance on a good initialization. In this work, we rethink the set of 3D Gaussians as a random sample drawn from an underlying probability distribution describing the physic… ▽ More

    Submitted 16 June, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  5. arXiv:2403.17920  [pdf, other

    cs.CV

    TC4D: Trajectory-Conditioned Text-to-4D Generation

    Authors: Sherwin Bahmani, Xian Liu, Yifan Wang, Ivan Skorokhodov, Victor Rong, Ziwei Liu, Xihui Liu, Jeong Joon Park, Sergey Tulyakov, Gordon Wetzstein, Andrea Tagliasacchi, David B. Lindell

    Abstract: Recent techniques for text-to-4D generation synthesize dynamic 3D scenes using supervision from pre-trained text-to-video models. However, existing representations for motion, such as deformation models or time-dependent neural representations, are limited in the amount of motion they can generate-they cannot synthesize motion extending far beyond the bounding box used for volume rendering. The la… ▽ More

    Submitted 10 April, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Project Page: https://sherwinbahmani.github.io/tc4d

  6. arXiv:2312.12337  [pdf, other

    cs.CV cs.LG

    pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

    Authors: David Charatan, Sizhe Li, Andrea Tagliasacchi, Vincent Sitzmann

    Abstract: We introduce pixelSplat, a feed-forward model that learns to reconstruct 3D radiance fields parameterized by 3D Gaussian primitives from pairs of images. Our model features real-time and memory-efficient rendering for scalable training as well as fast 3D reconstruction at inference time. To overcome local minima inherent to sparse and locally supported representations, we predict a dense probabili… ▽ More

    Submitted 4 April, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: Project page: https://dcharatan.github.io/pixelsplat

  7. arXiv:2312.06799  [pdf, other

    cs.CV cs.LG

    Densify Your Labels: Unsupervised Clustering with Bipartite Matching for Weakly Supervised Point Cloud Segmentation

    Authors: Shaobo Xia, Jun Yue, Kacper Kania, Leyuan Fang, Andrea Tagliasacchi, Kwang Moo Yi, Weiwei Sun

    Abstract: We propose a weakly supervised semantic segmentation method for point clouds that predicts "per-point" labels from just "whole-scene" annotations while achieving the performance of recent fully supervised approaches. Our core idea is to propagate the scene-level labels to each point in the point cloud by creating pseudo labels in a conservative way. Specifically, we over-segment point cloud featur… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: The first two authors contributed equally; Project website: https://densify-your-labels.github.io/

  8. arXiv:2312.02362  [pdf, other

    cs.CV cs.GR

    PointNeRF++: A multi-scale, point-based Neural Radiance Field

    Authors: Weiwei Sun, Eduard Trulls, Yang-Che Tseng, Sneha Sambandam, Gopal Sharma, Andrea Tagliasacchi, Kwang Moo Yi

    Abstract: Point clouds offer an attractive source of information to complement images in neural scene representations, especially when few images are available. Neural rendering methods based on point clouds do exist, but they do not perform well when the point cloud quality is low -- e.g., sparse or incomplete, which is often the case with real-world data. We overcome these problems with a simple represent… ▽ More

    Submitted 21 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Project website: https://pointnerfpp.github.io/

  9. arXiv:2312.02202  [pdf, other

    cs.GR cs.CV

    Volumetric Rendering with Baked Quadrature Fields

    Authors: Gopal Sharma, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi

    Abstract: We propose a novel Neural Radiance Field (NeRF) representation for non-opaque scenes that allows fast inference by utilizing textured polygons. Despite the high-quality novel view rendering that NeRF provides, a critical limitation is that it relies on volume rendering that can be computationally expensive and does not utilize the advancements in modern graphics hardware. Existing methods for this… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

  10. arXiv:2312.00075  [pdf, other

    cs.CV

    Accelerating Neural Field Training via Soft Mining

    Authors: Shakiba Kheradmand, Daniel Rebain, Gopal Sharma, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, Kwang Moo Yi

    Abstract: We present an approach to accelerate Neural Field training by efficiently selecting sampling locations. While Neural Fields have recently become popular, it is often trained by uniformly sampling the training domain, or through handcrafted heuristics. We show that improved convergence and final training quality can be achieved by a soft mining technique based on importance sampling: rather than ei… ▽ More

    Submitted 29 November, 2023; originally announced December 2023.

  11. arXiv:2312.00065  [pdf, other

    cs.CV

    Unsupervised Keypoints from Pretrained Diffusion Models

    Authors: Eric Hedlin, Gopal Sharma, Shweta Mahajan, Xingzhe He, Hossam Isack, Abhishek Kar Helge Rhodin, Andrea Tagliasacchi, Kwang Moo Yi

    Abstract: Unsupervised learning of keypoints and landmarks has seen significant progress with the help of modern neural network architectures, but performance is yet to match the supervised counterpart, making their practicability questionable. We leverage the emergent knowledge within text-to-image diffusion models, towards more robust unsupervised keypoints. Our core idea is to find text embeddings that w… ▽ More

    Submitted 21 May, 2024; v1 submitted 29 November, 2023; originally announced December 2023.

  12. arXiv:2311.17984  [pdf, other

    cs.CV

    4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling

    Authors: Sherwin Bahmani, Ivan Skorokhodov, Victor Rong, Gordon Wetzstein, Leonidas Guibas, Peter Wonka, Sergey Tulyakov, Jeong Joon Park, Andrea Tagliasacchi, David B. Lindell

    Abstract: Recent breakthroughs in text-to-4D generation rely on pre-trained text-to-image and text-to-video models to generate dynamic 3D scenes. However, current text-to-4D methods face a three-way tradeoff between the quality of scene appearance, 3D structure, and motion. For example, text-to-image models and their 3D-aware variants are trained on internet-scale image datasets and can be used to produce s… ▽ More

    Submitted 26 May, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: CVPR 2024; Project page: https://sherwinbahmani.github.io/4dfy

  13. arXiv:2309.03185  [pdf, other

    cs.CV

    Bayes' Rays: Uncertainty Quantification for Neural Radiance Fields

    Authors: Lily Goli, Cody Reading, Silvia Sellán, Alec Jacobson, Andrea Tagliasacchi

    Abstract: Neural Radiance Fields (NeRFs) have shown promise in applications like view synthesis and depth estimation, but learning from multiview images faces inherent uncertainties. Current methods to quantify them are either heuristic or computationally demanding. We introduce BayesRays, a post-hoc framework to evaluate uncertainty in any pre-trained NeRF without modifying the training process. Our method… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

  14. arXiv:2306.08943  [pdf, other

    cs.LG math.NA

    Neural Fields with Hard Constraints of Arbitrary Differential Order

    Authors: Fangcheng Zhong, Kyle Fogarty, Param Hanji, Tianhao Wu, Alejandro Sztrajman, Andrew Spielberg, Andrea Tagliasacchi, Petra Bosilj, Cengiz Oztireli

    Abstract: While deep learning techniques have become extremely popular for solving a broad range of optimization problems, methods to enforce hard constraints during optimization, particularly on deep neural networks, remain underdeveloped. Inspired by the rich literature on meshless interpolation and its extension to spectral collocation methods in scientific computing, we develop a series of approaches fo… ▽ More

    Submitted 29 October, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  15. arXiv:2305.15581  [pdf, other

    cs.CV

    Unsupervised Semantic Correspondence Using Stable Diffusion

    Authors: Eric Hedlin, Gopal Sharma, Shweta Mahajan, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, Kwang Moo Yi

    Abstract: Text-to-image diffusion models are now capable of generating images that are often indistinguishable from real images. To generate such images, these models must understand the semantics of the objects they are asked to generate. In this work we show that, without any training, one can leverage this semantic knowledge within diffusion models to find semantic correspondences - locations in multiple… ▽ More

    Submitted 23 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Project website: https://github.com/ubc-vision/LDM_correspondences

  16. arXiv:2305.07514  [pdf, other

    cs.CV cs.GR

    BlendFields: Few-Shot Example-Driven Facial Modeling

    Authors: Kacper Kania, Stephan J. Garbin, Andrea Tagliasacchi, Virginia Estellers, Kwang Moo Yi, Julien Valentin, Tomasz Trzciński, Marek Kowalski

    Abstract: Generating faithful visualizations of human faces requires capturing both coarse and fine-level details of the face geometry and appearance. Existing methods are either data-driven, requiring an extensive corpus of data not publicly accessible to the research community, or fail to capture fine details because they rely on geometric face models that cannot represent fine-grained details in texture… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: Accepted to CVPR 2023. Project page: https://blendfields.github.io/

  17. arXiv:2303.12074  [pdf, other

    cs.CV

    CC3D: Layout-Conditioned Generation of Compositional 3D Scenes

    Authors: Sherwin Bahmani, Jeong Joon Park, Despoina Paschalidou, Xingguang Yan, Gordon Wetzstein, Leonidas Guibas, Andrea Tagliasacchi

    Abstract: In this work, we introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts, trained using single-view images. Different from most existing 3D GANs that limit their applicability to aligned single objects, we focus on generating complex scenes with multiple objects, by modeling the compositional nature of 3D scenes. By devising a 2D l… ▽ More

    Submitted 8 September, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: ICCV 2023; Webpage: https://sherwinbahmani.github.io/cc3d/

  18. arXiv:2302.00833  [pdf, other

    cs.CV cs.LG

    RobustNeRF: Ignoring Distractors with Robust Losses

    Authors: Sara Sabour, Suhani Vora, Daniel Duckworth, Ivan Krasin, David J. Fleet, Andrea Tagliasacchi

    Abstract: Neural radiance fields (NeRF) excel at synthesizing new views given multi-view, calibrated images of a static scene. When scenes include distractors, which are not persistent during image capture (moving objects, lighting variations, shadows), artifacts appear as view-dependent effects or 'floaters'. To cope with distractors, we advocate a form of robust estimation for NeRF training, modeling dist… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

  19. arXiv:2211.16991  [pdf, other

    cs.CV

    SparsePose: Sparse-View Camera Pose Regression and Refinement

    Authors: Samarth Sinha, Jason Y. Zhang, Andrea Tagliasacchi, Igor Gilitschenski, David B. Lindell

    Abstract: Camera pose estimation is a key step in standard 3D reconstruction pipelines that operate on a dense set of images of a single object or scene. However, methods for pose estimation often fail when only a few images are available because they rely on the ability to robustly identify and match visual features between image pairs. While these methods can work robustly with dense camera views, capturi… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

  20. arXiv:2211.15654  [pdf, other

    cs.CV

    OpenScene: 3D Scene Understanding with Open Vocabularies

    Authors: Songyou Peng, Kyle Genova, Chiyu "Max" Jiang, Andrea Tagliasacchi, Marc Pollefeys, Thomas Funkhouser

    Abstract: Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a model for a single task with supervision. We propose OpenScene, an alternative approach where a model predicts dense features for 3D scene points that are co-embedded with text and image pixels in CLIP feature space. This zero-shot approach enables task-agnostic training and open-vocabulary queries. For example, t… ▽ More

    Submitted 6 April, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: CVPR 2023. Project page: https://pengsongyou.github.io/openscene

  21. arXiv:2211.11177  [pdf, other

    cs.CV

    NeuMap: Neural Coordinate Map** by Auto-Transdecoder for Camera Localization

    Authors: Shitao Tang, Sicong Tang, Andrea Tagliasacchi, ** Tan, Yasutaka Furukawa

    Abstract: This paper presents an end-to-end neural map** method for camera localization, dubbed NeuMap, encoding a whole scene into a grid of latent codes, with which a Transformer-based auto-decoder regresses 3D coordinates of query pixels. State-of-the-art feature matching methods require each scene to be stored as a 3D point cloud with per-point features, consuming several gigabytes of storage per scen… ▽ More

    Submitted 26 March, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

    Comments: CVPR2023

  22. arXiv:2211.01600  [pdf, other

    cs.CV cs.AI cs.RO

    nerf2nerf: Pairwise Registration of Neural Radiance Fields

    Authors: Lily Goli, Daniel Rebain, Sara Sabour, Animesh Garg, Andrea Tagliasacchi

    Abstract: We introduce a technique for pairwise registration of neural fields that extends classical optimization-based local registration (i.e. ICP) to operate on Neural Radiance Fields (NeRF) -- neural 3D scene representations trained from collections of calibrated images. NeRF does not decompose illumination and color, so to make registration invariant to illumination, we introduce the concept of a ''sur… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

  23. arXiv:2210.06965  [pdf, other

    cs.LG cs.CV

    CUF: Continuous Upsampling Filters

    Authors: Cristina Vasconcelos, Cengiz Oztireli, Mark Matthews, Milad Hashemi, Kevin Swersky, Andrea Tagliasacchi

    Abstract: Neural fields have rapidly been adopted for representing 3D signals, but their application to more classical 2D image-processing has been relatively limited. In this paper, we consider one of the most important operations in image processing: upsampling. In deep learning, learnable upsampling layers have extensively been used for single image super-resolution. We propose to parameterize upsampling… ▽ More

    Submitted 20 October, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

  24. arXiv:2210.04628  [pdf, other

    cs.CV cs.GR cs.LG

    Novel View Synthesis with Diffusion Models

    Authors: Daniel Watson, William Chan, Ricardo Martin-Brualla, Jonathan Ho, Andrea Tagliasacchi, Mohammad Norouzi

    Abstract: We present 3DiM, a diffusion model for 3D novel view synthesis, which is able to translate a single input view into consistent and sharp completions across many views. The core component of 3DiM is a pose-conditional image-to-image diffusion model, which takes a source view and its pose as inputs, and generates a novel view for a target pose as output. 3DiM can generate multiple views that are 3D… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

  25. arXiv:2209.10684  [pdf, other

    cs.CV

    Attention Beats Concatenation for Conditioning Neural Fields

    Authors: Daniel Rebain, Mark J. Matthews, Kwang Moo Yi, Gopal Sharma, Dmitry Lagun, Andrea Tagliasacchi

    Abstract: Neural fields model signals by map** coordinate inputs to sampled values. They are becoming an increasingly important backbone architecture across many fields from vision and graphics to biology and astronomy. In this paper, we explore the differences between common conditioning mechanisms within these networks, an essential ingredient in shifting neural fields from memorization of signals to ge… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

  26. arXiv:2209.02417  [pdf, ps, other

    cs.CV cs.GR

    Volume Rendering Digest (for NeRF)

    Authors: Andrea Tagliasacchi, Ben Mildenhall

    Abstract: Neural Radiance Fields employ simple volume rendering as a way to overcome the challenges of differentiating through ray-triangle intersections by leveraging a probabilistic notion of visibility. This is achieved by assuming the scene is composed by a cloud of light-emitting particles whose density changes in space. This technical report summarizes the derivations for differentiable volume renderi… ▽ More

    Submitted 29 August, 2022; originally announced September 2022.

    Comments: Overleaf: https://www.overleaf.com/read/fkhpkzxhnyws

    ACM Class: I.3; I.4

  27. arXiv:2208.00277  [pdf, other

    cs.CV cs.GR cs.LG

    MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures

    Authors: Zhiqin Chen, Thomas Funkhouser, Peter Hedman, Andrea Tagliasacchi

    Abstract: Neural Radiance Fields (NeRFs) have demonstrated amazing ability to synthesize images of 3D scenes from novel views. However, they rely upon specialized volumetric rendering algorithms based on ray marching that are mismatched to the capabilities of widely deployed graphics hardware. This paper introduces a new NeRF representation based on textured polygons that can synthesize novel images efficie… ▽ More

    Submitted 29 May, 2023; v1 submitted 30 July, 2022; originally announced August 2022.

    Comments: CVPR 2023. Project page: https://mobile-nerf.github.io, code: https://github.com/google-research/jax3d/tree/main/jax3d/projects/mobilenerf

  28. arXiv:2207.09978  [pdf, other

    cs.CV cs.GR

    NeuralBF: Neural Bilateral Filtering for Top-down Instance Segmentation on Point Clouds

    Authors: Weiwei Sun, Daniel Rebain, Renjie Liao, Vladimir Tankovich, Soroosh Yazdani, Kwang Moo Yi, Andrea Tagliasacchi

    Abstract: We introduce a method for instance proposal generation for 3D point clouds. Existing techniques typically directly regress proposals in a single feed-forward step, leading to inaccurate estimation. We show that this serves as a critical bottleneck, and propose a method based on iterative bilateral filtering with learned kernels. Following the spirit of bilateral filtering, we consider both the dee… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Project website: https://neuralbf.github.io

  29. arXiv:2205.15838  [pdf, other

    cs.CV

    D$^2$NeRF: Self-Supervised Decoupling of Dynamic and Static Objects from a Monocular Video

    Authors: Tianhao Wu, Fangcheng Zhong, Andrea Tagliasacchi, Forrester Cole, Cengiz Oztireli

    Abstract: Given a monocular video, segmenting and decoupling dynamic objects while recovering the static environment is a widely studied problem in machine intelligence. Existing solutions usually approach this problem in the image domain, limiting their performance and understanding of the environment. We introduce Decoupled Dynamic Neural Radiance Field (D$^2$NeRF), a self-supervised approach that takes a… ▽ More

    Submitted 5 November, 2022; v1 submitted 31 May, 2022; originally announced May 2022.

  30. arXiv:2205.04334  [pdf, other

    cs.CV

    Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation

    Authors: Abhijit Kundu, Kyle Genova, Xiaoqi Yin, Alireza Fathi, Caroline Pantofaru, Leonidas Guibas, Andrea Tagliasacchi, Frank Dellaert, Thomas Funkhouser

    Abstract: We present Panoptic Neural Fields (PNF), an object-aware neural scene representation that decomposes a scene into a set of objects (things) and background (stuff). Each object is represented by an oriented 3D bounding box and a multi-layer perceptron (MLP) that takes position, direction, and time and outputs density and radiance. The background stuff is represented by a similar MLP that additional… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

    Comments: CVPR 2022 paper. See project page at https://abhijitkundu.info/projects/pnf

  31. arXiv:2203.03570  [pdf, other

    cs.CV cs.GR cs.LG

    Kubric: A scalable dataset generator

    Authors: Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J. Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam Laradji, Hsueh-Ti, Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Radwan, Daniel Rebain, Sara Sabour, Mehdi S. M. Sajjadi , et al. (10 additional authors not shown)

    Abstract: Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details. But collecting, processing and annotating real data at scale is difficult, expensive, and frequently raises additional privacy, fairness and legal concerns. Synthetic data is a powerful tool with the potential… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: 21 pages, CVPR2022

  32. arXiv:2202.01999  [pdf, other

    cs.CV cs.GR cs.LG

    Neural Dual Contouring

    Authors: Zhiqin Chen, Andrea Tagliasacchi, Thomas Funkhouser, Hao Zhang

    Abstract: We introduce neural dual contouring (NDC), a new data-driven approach to mesh reconstruction based on dual contouring (DC). Like traditional DC, it produces exactly one vertex per grid cell and one quad for each grid edge intersection, a natural and efficient structure for reproducing sharp features. However, rather than computing vertex locations and edge crossings with hand-crafted functions tha… ▽ More

    Submitted 31 May, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

    Comments: Accepted to SIGGRAPH (journal) 2022. Code: https://github.com/czq142857/NDC

    Journal ref: Neural Dual Contouring. ACM Trans. Graph. 41, 4, Article 104 (July 2022), 13 pages

  33. arXiv:2112.05124  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation

    Authors: Anthony Simeonov, Yilun Du, Andrea Tagliasacchi, Joshua B. Tenenbaum, Alberto Rodriguez, Pulkit Agrawal, Vincent Sitzmann

    Abstract: We present Neural Descriptor Fields (NDFs), an object representation that encodes both points and relative poses between an object and a target (such as a robot gripper or a rack used for hanging) via category-level descriptors. We employ this representation for object manipulation, where given a task demonstration, we want to repeat the same task on a new object instance from the same category. W… ▽ More

    Submitted 9 December, 2021; originally announced December 2021.

    Comments: Website: https://yilundu.github.io/ndf/. First two authors contributed equally (order determined by coin flip), last two authors equal advising

  34. arXiv:2112.01983  [pdf, other

    cs.CV cs.GR

    CoNeRF: Controllable Neural Radiance Fields

    Authors: Kacper Kania, Kwang Moo Yi, Marek Kowalski, Tomasz Trzciński, Andrea Tagliasacchi

    Abstract: We extend neural 3D representations to allow for intuitive and interpretable user control beyond novel view rendering (i.e. camera control). We allow the user to annotate which part of the scene one wishes to control with just a small number of mask annotations in the training images. Our key idea is to treat the attributes as latent variables that are regressed by the neural network given the sce… ▽ More

    Submitted 6 December, 2021; v1 submitted 3 December, 2021; originally announced December 2021.

    Comments: Project page: https://conerf.github.io/

  35. arXiv:2111.14643  [pdf, other

    cs.CV cs.GR

    Urban Radiance Fields

    Authors: Konstantinos Rematas, Andrew Liu, Pratul P. Srinivasan, Jonathan T. Barron, Andrea Tagliasacchi, Thomas Funkhouser, Vittorio Ferrari

    Abstract: The goal of this work is to perform 3D reconstruction and novel view synthesis from data captured by scanning platforms commonly deployed for world map** in urban outdoor environments (e.g., Street View). Given a sequence of posed RGB images and lidar sweeps acquired by cameras and scanners moving through an outdoor scene, we produce a model from which 3D surfaces can be extracted and novel RGB… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

    Comments: Project: https://urban-radiance-fields.github.io/

  36. arXiv:2111.13260  [pdf, other

    cs.CV cs.RO

    NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes

    Authors: Suhani Vora, Noha Radwan, Klaus Greff, Henning Meyer, Kyle Genova, Mehdi S. M. Sajjadi, Etienne Pot, Andrea Tagliasacchi, Daniel Duckworth

    Abstract: We present NeSF, a method for producing 3D semantic fields from posed RGB images alone. In place of classical 3D representations, our method builds on recent work in implicit neural scene representations wherein 3D structure is captured by point-wise functions. We leverage this methodology to recover 3D density fields upon which we then train a 3D semantic segmentation model supervised by posed 2D… ▽ More

    Submitted 2 December, 2021; v1 submitted 25 November, 2021; originally announced November 2021.

    Comments: Project website: https://nesf3d.github.io/. Updated with minor edits to text

  37. arXiv:2111.13152  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.RO

    Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations

    Authors: Mehdi S. M. Sajjadi, Henning Meyer, Etienne Pot, Urs Bergmann, Klaus Greff, Noha Radwan, Suhani Vora, Mario Lucic, Daniel Duckworth, Alexey Dosovitskiy, Jakob Uszkoreit, Thomas Funkhouser, Andrea Tagliasacchi

    Abstract: A classical problem in computer vision is to infer a 3D scene representation from few images that can be used to render novel views at interactive rates. Previous work focuses on reconstructing pre-defined 3D representations, e.g. textured meshes, or implicit representations, e.g. radiance fields, and often requires input images with precise camera poses and long processing times for each novel sc… ▽ More

    Submitted 29 March, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

    Comments: Accepted to CVPR 2022, Project website: https://srt-paper.github.io/

    Journal ref: CVPR 2022

  38. arXiv:2111.13112  [pdf, other

    cs.CV

    VaxNeRF: Revisiting the Classic for Voxel-Accelerated Neural Radiance Field

    Authors: Naruya Kondo, Yuya Ikeda, Andrea Tagliasacchi, Yutaka Matsuo, Yoichi Ochiai, Shixiang Shane Gu

    Abstract: Neural Radiance Field (NeRF) is a popular method in data-driven 3D reconstruction. Given its simplicity and high quality rendering, many NeRF applications are being developed. However, NeRF's big limitation is its slow speed. Many attempts are made to speeding up NeRF training and inference, including intricate code-level optimization and caching, use of sophisticated data structures, and amortiza… ▽ More

    Submitted 25 November, 2021; originally announced November 2021.

  39. arXiv:2111.09996  [pdf, other

    cs.CV

    LOLNeRF: Learn from One Look

    Authors: Daniel Rebain, Mark Matthews, Kwang Moo Yi, Dmitry Lagun, Andrea Tagliasacchi

    Abstract: We present a method for learning a generative 3D model based on neural radiance fields, trained solely from data with only single views of each object. While generating realistic images is no longer a difficult task, producing the corresponding 3D structure such that they can be rendered from different views is non-trivial. We show that, unlike existing methods, one does not need multi-view data t… ▽ More

    Submitted 25 April, 2022; v1 submitted 18 November, 2021; originally announced November 2021.

    Comments: See https://lolnerf.github.io for additional results

  40. arXiv:2108.04869  [pdf, other

    cs.CV cs.LG

    MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision

    Authors: Ben Usman, Andrea Tagliasacchi, Kate Saenko, Avneesh Sud

    Abstract: In the era of deep learning, human pose estimation from multiple cameras with unknown calibration has received little attention to date. We show how to train a neural model to perform this task with high precision and minimal latency overhead. The proposed model takes into account joint location uncertainty due to occlusion from multiple views, and requires only 2D keypoint data for training. Our… ▽ More

    Submitted 25 November, 2021; v1 submitted 10 August, 2021; originally announced August 2021.

  41. Learning Mesh Representations via Binary Space Partitioning Tree Networks

    Authors: Zhiqin Chen, Andrea Tagliasacchi, Hao Zhang

    Abstract: Polygonal meshes are ubiquitous, but have only played a relatively minor role in the deep learning revolution. State-of-the-art neural generative models for 3D shapes learn implicit functions and generate meshes via expensive iso-surfacing. We overcome these challenges by employing a classical spatial data structure from computer graphics, Binary Space Partitioning (BSP), to facilitate 3D learning… ▽ More

    Submitted 1 July, 2021; v1 submitted 27 June, 2021; originally announced June 2021.

    Comments: Accepted to TPAMI. This is the extended journal version of BSP-Net (arXiv:1911.06971) from CVPR 2020

  42. arXiv:2106.03804  [pdf, other

    cs.GR cs.CV

    Deep Medial Fields

    Authors: Daniel Rebain, Ke Li, Vincent Sitzmann, Soroosh Yazdani, Kwang Moo Yi, Andrea Tagliasacchi

    Abstract: Implicit representations of geometry, such as occupancy fields or signed distance fields (SDF), have recently re-gained popularity in encoding 3D solid shape in a functional form. In this work, we introduce medial fields: a field function derived from the medial axis transform (MAT) that makes available information about the underlying 3D geometry that is immediately useful for a number of downstr… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

  43. arXiv:2104.12229  [pdf, other

    cs.CV

    Vector Neurons: A General Framework for SO(3)-Equivariant Networks

    Authors: Congyue Deng, Or Litany, Yueqi Duan, Adrien Poulenard, Andrea Tagliasacchi, Leonidas Guibas

    Abstract: Invariance and equivariance to the rotation group have been widely discussed in the 3D deep learning community for pointclouds. Yet most proposed methods either use complex mathematical tools that may limit their accessibility, or are tied to specific input data types and network architectures. In this paper, we introduce a general framework built on top of what we call Vector Neuron representatio… ▽ More

    Submitted 25 April, 2021; originally announced April 2021.

  44. arXiv:2103.14167  [pdf, other

    cs.CV

    COTR: Correspondence Transformer for Matching Across Images

    Authors: Wei Jiang, Eduard Trulls, Jan Hosang, Andrea Tagliasacchi, Kwang Moo Yi

    Abstract: We propose a novel framework for finding correspondences in images based on a deep neural network that, given two images and a query point in one of them, finds its correspondence in the other. By doing so, one has the option to query only the points of interest and retrieve sparse correspondences, or to query all points in an image and obtain dense map**s. Importantly, in order to capture both… ▽ More

    Submitted 17 August, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

  45. arXiv:2012.10518  [pdf, other

    cs.CV

    Human 3D keypoints via spatial uncertainty modeling

    Authors: Francis Williams, Or Litany, Avneesh Sud, Kevin Swersky, Andrea Tagliasacchi

    Abstract: We introduce a technique for 3D human keypoint estimation that directly models the notion of spatial uncertainty of a keypoint. Our technique employs a principled approach to modelling spatial uncertainty inspired from techniques in robust statistics. Furthermore, our pipeline requires no 3D ground truth labels, relying instead on (possibly noisy) 2D image-level keypoints. Our method achieves near… ▽ More

    Submitted 18 December, 2020; originally announced December 2020.

  46. arXiv:2012.04718  [pdf, other

    cs.CV cs.LG

    Canonical Capsules: Self-Supervised Capsules in Canonical Pose

    Authors: Weiwei Sun, Andrea Tagliasacchi, Boyang Deng, Sara Sabour, Soroosh Yazdani, Geoffrey Hinton, Kwang Moo Yi

    Abstract: We propose a self-supervised capsule architecture for 3D point clouds. We compute capsule decompositions of objects through permutation-equivariant attention, and self-supervise the process by training with pairs of randomly rotated objects. Our key idea is to aggregate the attention masks into semantic keypoints, and use these to supervise a decomposition that satisfies the capsule invariance/equ… ▽ More

    Submitted 24 November, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: NeurIPS 2021; The first two authors contributed equally; Project website: https://canonical-capsules.github.io

  47. arXiv:2011.13920  [pdf, other

    cs.CV cs.LG

    Unsupervised part representation by Flow Capsules

    Authors: Sara Sabour, Andrea Tagliasacchi, Soroosh Yazdani, Geoffrey E. Hinton, David J. Fleet

    Abstract: Capsule networks aim to parse images into a hierarchy of objects, parts and relations. While promising, they remain limited by an inability to learn effective low level part descriptions. To address this issue we propose a way to learn primary capsule encoders that detect atomic parts from a single image. During training we exploit motion as a powerful perceptual cue for part definition, with an e… ▽ More

    Submitted 19 February, 2021; v1 submitted 27 November, 2020; originally announced November 2020.

  48. arXiv:2011.12490  [pdf, other

    cs.CV cs.GR

    DeRF: Decomposed Radiance Fields

    Authors: Daniel Rebain, Wei Jiang, Soroosh Yazdani, Ke Li, Kwang Moo Yi, Andrea Tagliasacchi

    Abstract: With the advent of Neural Radiance Fields (NeRF), neural networks can now render novel views of a 3D scene with quality that fools the human eye. Yet, generating these images is very computationally intensive, limiting their applicability in practical scenarios. In this paper, we propose a technique based on spatial decomposition capable of mitigating this issue. Our key observation is that there… ▽ More

    Submitted 24 November, 2020; originally announced November 2020.

  49. arXiv:2010.11339  [pdf, ps, other

    cs.CV

    Voronoi Convolutional Neural Networks

    Authors: Soroosh Yazdani, Andrea Tagliasacchi

    Abstract: In this technical report, we investigate extending convolutional neural networks to the setting where functions are not sampled in a grid pattern. We show that by treating the samples as the average of a function within a cell, we can find a natural equivalent of most layers used in CNN. We also present an algorithm for running inference for these models exactly using standard convex geometry algo… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: Technical report

    ACM Class: I.4.0

  50. LSMAT Least Squares Medial Axis Transform

    Authors: Daniel Rebain, Baptiste Angles, Julien Valentin, Nicholas Vining, Jiju Peethambaran, Shahram Izadi, Andrea Tagliasacchi

    Abstract: The medial axis transform has applications in numerous fields including visualization, computer graphics, and computer vision. Unfortunately, traditional medial axis transformations are usually brittle in the presence of outliers, perturbations and/or noise along the boundary of objects. To overcome this limitation, we introduce a new formulation of the medial axis transform which is naturally rob… ▽ More

    Submitted 10 October, 2020; originally announced October 2020.

    Journal ref: Computer Graphics Forum 38 (2019) 5-18