Skip to main content

Showing 1–50 of 158 results for author: Fidler, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12095  [pdf, other

    cs.CV cs.AI cs.RO

    DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features

    Authors: Letian Wang, Seung Wook Kim, Jiawei Yang, Cunjun Yu, Boris Ivanovic, Steven L. Waslander, Yue Wang, Sanja Fidler, Marco Pavone, Peter Karkus

    Abstract: We propose DistillNeRF, a self-supervised learning framework addressing the challenge of understanding 3D environments from limited 2D observations in autonomous driving. Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs, and is trained self-supervised with differentiable rendering to reconstruct RGB,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2406.10324  [pdf, other

    cs.CV cs.LG

    L4GM: Large 4D Gaussian Reconstruction Model

    Authors: Jiawei Ren, Kevin Xie, Ashkan Mirzaei, Hanxue Liang, Xiaohui Zeng, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim, Huan Ling

    Abstract: We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input -- in a single feed-forward pass that takes only a second. Key to our success is a novel dataset of multiview videos containing curated, rendered animated objects from Objaverse. This dataset depicts 44K diverse objects with 110K animations rendered in 48 viewpoints, resulting in… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Project page: https://research.nvidia.com/labs/toronto-ai/l4gm

  3. arXiv:2406.08292  [pdf, other

    cs.CV

    Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata

    Authors: Dongsu Zhang, Francis Williams, Zan Gojcic, Karsten Kreis, Sanja Fidler, Young Min Kim, Amlan Kar

    Abstract: We aim to generate fine-grained 3D geometry from large-scale sparse LiDAR scans, abundantly captured by autonomous vehicles (AV). Contrary to prior work on AV scene completion, we aim to extrapolate fine geometry from unlabeled and beyond spatial limits of LiDAR scans, taking a step towards generating realistic, high-resolution simulation-ready 3D street environments. We propose hierarchical Gener… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024 as highlight

  4. arXiv:2404.16221  [pdf, other

    cs.CV cs.DC cs.GR

    NeRF-XL: Scaling NeRFs with Multiple GPUs

    Authors: Ruilong Li, Sanja Fidler, Angjoo Kanazawa, Francis Williams

    Abstract: We present NeRF-XL, a principled method for distributing Neural Radiance Fields (NeRFs) across multiple GPUs, thus enabling the training and rendering of NeRFs with an arbitrarily large capacity. We begin by revisiting existing multi-GPU approaches, which decompose large scenes into multiple independently trained NeRFs, and identify several fundamental issues with these methods that hinder improve… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Webpage: https://research.nvidia.com/labs/toronto-ai/nerfxl/

  5. arXiv:2404.14507  [pdf, other

    cs.CV cs.LG

    Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

    Authors: Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis

    Abstract: Diffusion models (DMs) have established themselves as the state-of-the-art generative modeling approach in the visual domain and beyond. A crucial drawback of DMs is their slow sampling speed, relying on many sequential function evaluations through large neural networks. Sampling from DMs can be seen as solving a differential equation through a discretized set of noise levels known as the sampling… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Project page: https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/

  6. arXiv:2404.10765  [pdf, other

    cs.CV

    RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting

    Authors: Ashkan Mirzaei, Riccardo De Lutio, Seung Wook Kim, David Acuna, Jonathan Kelly, Sanja Fidler, Igor Gilitschenski, Zan Gojcic

    Abstract: Neural reconstruction approaches are rapidly emerging as the preferred representation for 3D scenes, but their limited editability is still posing a challenge. In this work, we propose an approach for 3D scene inpainting -- the task of coherently replacing parts of the reconstructed scene with desired content. Scene inpainting is an inherently ill-posed task as there exist many solutions that plau… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Project page: https://reffusion.github.io

  7. arXiv:2404.06510  [pdf, other

    cs.CV

    Can Feedback Enhance Semantic Grounding in Large Vision-Language Models?

    Authors: Yuan-Hong Liao, Rafid Mahmood, Sanja Fidler, David Acuna

    Abstract: Enhancing semantic grounding abilities in Vision-Language Models (VLMs) often involves collecting domain-specific training data, refining the network architectures, or modifying the training recipes. In this work, we venture into an orthogonal direction and explore whether VLMs can improve their semantic grounding by "receiving" feedback, without requiring in-domain data, fine-tuning, or modificat… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 31 pages, 15 figures

  8. arXiv:2403.15385  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis

    Authors: Kevin Xie, Jonathan Lorraine, Tianshi Cao, Jun Gao, James Lucas, Antonio Torralba, Sanja Fidler, Xiaohui Zeng

    Abstract: Recent text-to-3D generation approaches produce impressive 3D results but require time-consuming optimization that can take up to an hour per prompt. Amortized methods like ATT3D optimize multiple prompts simultaneously to improve efficiency, enabling fast text-to-3D synthesis. However, they cannot capture high-frequency geometry and texture details and struggle to scale to large prompt sets, so t… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: See the project website at https://research.nvidia.com/labs/toronto-ai/LATTE3D/

    MSC Class: 68T45 ACM Class: I.2.6; I.2.7; I.3.6; I.3.7

  9. arXiv:2403.15370  [pdf, other

    cs.CV cs.LG cs.RO

    Augmented Reality based Simulated Data (ARSim) with multi-view consistency for AV perception networks

    Authors: Aqeel Anwar, Tae Eun Choe, Zian Wang, Sanja Fidler, Minwoo Park

    Abstract: Detecting a diverse range of objects under various driving scenarios is essential for the effectiveness of autonomous driving systems. However, the real-world data collected often lacks the necessary diversity presenting a long-tail distribution. Although synthetic data has been utilized to overcome this issue by generating virtual scenes, it faces hurdles such as a significant domain gap and the… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 17 pages, 15 figures, 7 tables

  10. arXiv:2401.11739  [pdf, other

    cs.CV cs.LG

    EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models

    Authors: Koichi Namekata, Amirmojtaba Sabour, Sanja Fidler, Seung Wook Kim

    Abstract: Diffusion models have recently received increasing research attention for their remarkable transfer abilities in semantic segmentation tasks. However, generating fine-grained segmentation masks with diffusion models often requires additional training on annotated datasets, leaving it unclear to what extent pre-trained diffusion models alone understand the semantic relations of their generated imag… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: ICLR 2024. Project page: https://kmcode1.github.io/Projects/EmerDiff/

  11. arXiv:2312.17241  [pdf, other

    cs.CV cs.GR

    Compact Neural Graphics Primitives with Learned Hash Probing

    Authors: Towaki Takikawa, Thomas Müller, Merlin Nimier-David, Alex Evans, Sanja Fidler, Alec Jacobson, Alexander Keller

    Abstract: Neural graphics primitives are faster and achieve higher quality when their neural networks are augmented by spatial data structures that hold trainable features arranged in a grid. However, existing feature grids either come with a large memory footprint (dense or factorized grids, trees, and hash tables) or slow performance (index learning and vector quantization). In this paper, we show that a… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: Project Page: https://research.nvidia.com/labs/toronto-ai/compact-ngp

  12. arXiv:2312.13763  [pdf, other

    cs.CV cs.LG

    Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models

    Authors: Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, Karsten Kreis

    Abstract: Text-guided diffusion models have revolutionized image and video generation and have also been successfully used for optimization-based 3D object synthesis. Here, we instead focus on the underexplored text-to-4D setting and synthesize dynamic, animated 3D objects using score distillation methods with an additional temporal dimension. Compared to previous work, we pursue a novel compositional gener… ▽ More

    Submitted 3 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Project page: https://research.nvidia.com/labs/toronto-ai/AlignYourGaussians/

  13. arXiv:2312.04535  [pdf, other

    cs.LG cs.RO

    Trajeglish: Traffic Modeling as Next-Token Prediction

    Authors: Jonah Philion, Xue Bin Peng, Sanja Fidler

    Abstract: A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs. In pursuit of this functionality, we apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios. Using a simple data-driven tokenization scheme, we discretize trajectories to centimeter-level resolution using… ▽ More

    Submitted 14 April, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: ICLR 2024

  14. arXiv:2312.03806  [pdf, other

    cs.CV cs.GR cs.LG

    XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies

    Authors: Xuanchi Ren, Jiahui Huang, Xiaohui Zeng, Ken Museth, Sanja Fidler, Francis Williams

    Abstract: We present XCube (abbreviated as $\mathcal{X}^3$), a novel generative model for high-resolution sparse 3D voxel grids with arbitrary attributes. Our model can generate millions of voxels with a finest effective resolution of up to $1024^3$ in a feed-forward fashion without time-consuming test-time optimization. To achieve this, we employ a hierarchical voxel latent diffusion model which generates… ▽ More

    Submitted 25 June, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 Highlight. Code: https://github.com/nv-tlabs/XCube/ Website: https://research.nvidia.com/labs/toronto-ai/xcube/

  15. arXiv:2311.13570  [pdf, other

    cs.CV

    WildFusion: Learning 3D-Aware Latent Diffusion Models in View Space

    Authors: Katja Schwarz, Seung Wook Kim, Jun Gao, Sanja Fidler, Andreas Geiger, Karsten Kreis

    Abstract: Modern learning-based approaches to 3D-aware image synthesis achieve high photorealism and 3D-consistent viewpoint changes for the generated images. Existing approaches represent instances in a shared canonical space. However, for in-the-wild datasets a shared canonical system can be difficult to define or might not even exist. In this work, we instead model instances in view space, alleviating th… ▽ More

    Submitted 12 April, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

  16. arXiv:2311.10091  [pdf, other

    cs.CV cs.GR

    Adaptive Shells for Efficient Neural Radiance Field Rendering

    Authors: Zian Wang, Tianchang Shen, Merlin Nimier-David, Nicholas Sharp, Jun Gao, Alexander Keller, Sanja Fidler, Thomas Müller, Zan Gojcic

    Abstract: Neural radiance fields achieve unprecedented quality for novel view synthesis, but their volumetric formulation remains expensive, requiring a huge number of samples to render high-resolution images. Volumetric encodings are essential to represent fuzzy geometry such as foliage and hair, and they are well-suited for stochastic optimization. Yet, many scenes ultimately consist largely of solid surf… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: SIGGRAPH Asia 2023. Project page: research.nvidia.com/labs/toronto-ai/adaptive-shells/

  17. arXiv:2311.04391  [pdf, other

    cs.CV

    3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features

    Authors: Chenfeng Xu, Huan Ling, Sanja Fidler, Or Litany

    Abstract: We present 3DiffTection, a state-of-the-art method for 3D object detection from single images, leveraging features from a 3D-aware diffusion model. Annotating large-scale image data for 3D detection is resource-intensive and time-consuming. Recently, pretrained large image diffusion models have become prominent as effective feature extractors for 2D perception tasks. However, these features are in… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: Project page: \url{https://research.nvidia.com/labs/toronto-ai/3difftection/}

  18. arXiv:2311.02077  [pdf, other

    cs.CV

    EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision

    Authors: Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim, Boyi Li, Tong Che, Danfei Xu, Sanja Fidler, Marco Pavone, Yue Wang

    Abstract: We present EmerNeRF, a simple yet powerful approach for learning spatial-temporal representations of dynamic driving scenes. Grounded in neural fields, EmerNeRF simultaneously captures scene geometry, appearance, motion, and semantics via self-bootstrap**. EmerNeRF hinges upon two core components: First, it stratifies scenes into static and dynamic fields. This decomposition emerges purely from… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: See the project page for code, data, and request pre-trained models: https://emernerf.github.io

  19. arXiv:2310.19731  [pdf, other

    cs.CV cs.AI cs.LG

    ViR: Towards Efficient Vision Retention Backbones

    Authors: Ali Hatamizadeh, Michael Ranzinger, Shiyi Lan, Jose M. Alvarez, Sanja Fidler, Jan Kautz

    Abstract: Vision Transformers (ViTs) have attracted a lot of popularity in recent years, due to their exceptional capabilities in modeling long-range spatial dependencies and scalability for large scale training. Although the training parallelism of self-attention mechanism plays an important role in retaining great performance, its quadratic complexity baffles the application of ViTs in many scenarios whic… ▽ More

    Submitted 26 January, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Introduction of Vision Retention Networks (ViR) for Efficient Visual Modeling

  20. arXiv:2310.13772  [pdf, other

    cs.CV cs.LG

    TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models

    Authors: Tianshi Cao, Karsten Kreis, Sanja Fidler, Nicholas Sharp, Kangxue Yin

    Abstract: We present TexFusion (Texture Diffusion), a new method to synthesize textures for given 3D geometries, using large-scale text-guided image diffusion models. In contrast to recent works that leverage 2D text-to-image diffusion models to distill 3D objects using a slow and fragile optimization process, TexFusion introduces a new 3D-consistent generation technique specifically designed for texture sy… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Videos and more results on https://research.nvidia.com/labs/toronto-ai/texfusion/

    ACM Class: I.3.3

    Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023) 4169-4181

  21. arXiv:2309.05192  [pdf, other

    cs.CV

    Towards Viewpoint Robustness in Bird's Eye View Segmentation

    Authors: Tzofi Klinghoffer, Jonah Philion, Wenzheng Chen, Or Litany, Zan Gojcic, Jungseock Joo, Ramesh Raskar, Sanja Fidler, Jose M. Alvarez

    Abstract: Autonomous vehicles (AV) require that neural networks used for perception be robust to different viewpoints if they are to be deployed across many types of vehicles without the repeated cost of data collection and labeling for each. AV companies typically focus on collecting data from diverse scenarios and locations, but not camera rig configurations, due to cost. As a result, only a small number… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: ICCV 2023. Project Page: https://nvlabs.github.io/viewpoint-robustness

  22. arXiv:2308.05371  [pdf, other

    cs.GR cs.CV cs.LG

    Flexible Isosurface Extraction for Gradient-Based Mesh Optimization

    Authors: Tianchang Shen, Jacob Munkberg, Jon Hasselgren, Kangxue Yin, Zian Wang, Wenzheng Chen, Zan Gojcic, Sanja Fidler, Nicholas Sharp, Jun Gao

    Abstract: This work considers gradient-based mesh optimization, where we iteratively optimize for a 3D surface mesh by representing it as the isosurface of a scalar field, an increasingly common paradigm in applications including photogrammetry, generative modeling, and inverse physics. Existing implementations adapt classic isosurface extraction algorithms like Marching Cubes or Dual Contouring; these tech… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

    Comments: SIGGRAPH 2023. Project page: https://research.nvidia.com/labs/toronto-ai/flexicubes/

    Journal ref: ACM Transactions on Graphics, Volume 42, Issue 4, Article No.: 37, August 2023

  23. arXiv:2307.07487  [pdf, other

    cs.CV cs.LG

    DreamTeacher: Pretraining Image Backbones with Deep Generative Models

    Authors: Daiqing Li, Huan Ling, Amlan Kar, David Acuna, Seung Wook Kim, Karsten Kreis, Antonio Torralba, Sanja Fidler

    Abstract: In this work, we introduce a self-supervised feature representation learning framework DreamTeacher that utilizes generative networks for pre-training downstream image backbones. We propose to distill knowledge from a trained generative model into standard image backbones that have been well engineered for specific perception tasks. We investigate two types of knowledge distillation: 1) distilling… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: Project page: https://research.nvidia.com/labs/toronto-ai/DreamTeacher/

  24. arXiv:2306.07349  [pdf, other

    cs.LG cs.AI cs.CV

    ATT3D: Amortized Text-to-3D Object Synthesis

    Authors: Jonathan Lorraine, Kevin Xie, Xiaohui Zeng, Chen-Hsuan Lin, Towaki Takikawa, Nicholas Sharp, Tsung-Yi Lin, Ming-Yu Liu, Sanja Fidler, James Lucas

    Abstract: Text-to-3D modelling has seen exciting progress by combining generative text-to-image models with image-to-3D methods like Neural Radiance Fields. DreamFusion recently achieved high-quality results but requires a lengthy, per-prompt optimization to create 3D objects. To address this, we amortize optimization over text prompts by training on many prompts simultaneously with a unified model, instead… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: 22 pages, 20 figures

    MSC Class: 68T45 ACM Class: I.2.6; I.2.7; I.3.6; I.3.7

  25. arXiv:2305.19590  [pdf, other

    cs.CV

    Neural Kernel Surface Reconstruction

    Authors: Jiahui Huang, Zan Gojcic, Matan Atzmon, Or Litany, Sanja Fidler, Francis Williams

    Abstract: We present a novel method for reconstructing a 3D implicit surface from a large-scale, sparse, and noisy point cloud. Our approach builds upon the recently introduced Neural Kernel Fields (NKF) representation. It enjoys similar generalization capabilities to NKF, while simultaneously addressing its main limitations: (a) We can scale to large scenes through compactly supported kernel functions, whi… ▽ More

    Submitted 9 June, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: CVPR 2023

  26. arXiv:2305.01643  [pdf, other

    cs.CV

    Neural LiDAR Fields for Novel View Synthesis

    Authors: Shengyu Huang, Zan Gojcic, Zian Wang, Francis Williams, Yoni Kasten, Sanja Fidler, Konrad Schindler, Or Litany

    Abstract: We present Neural Fields for LiDAR (NFL), a method to optimise a neural field scene representation from LiDAR measurements, with the goal of synthesizing realistic LiDAR scans from novel viewpoints. NFL combines the rendering power of neural fields with a detailed, physically motivated model of the LiDAR sensing process, thus enabling it to accurately reproduce key sensor behaviors like beam diver… ▽ More

    Submitted 13 August, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: ICCV 2023 - camera ready. Project page: https://research.nvidia.com/labs/toronto-ai/nfl/

  27. arXiv:2304.09787  [pdf, other

    cs.CV

    NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

    Authors: Seung Wook Kim, Bradley Brown, Kangxue Yin, Karsten Kreis, Katja Schwarz, Daiqing Li, Robin Rombach, Antonio Torralba, Sanja Fidler

    Abstract: Automatically generating high-quality real world 3D scenes is of enormous interest for applications such as virtual reality and robotics simulation. Towards this goal, we introduce NeuralField-LDM, a generative model capable of synthesizing complex 3D environments. We leverage Latent Diffusion Models that have been successfully utilized for efficient high-quality 2D content creation. We first trai… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

    Comments: CVPR 2023

  28. arXiv:2304.08818  [pdf, other

    cs.CV cs.LG

    Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

    Authors: Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis

    Abstract: Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space. Here, we apply the LDM paradigm to high-resolution video generation, a particularly resource-intensive task. We first pre-train an LDM on images only; then, we turn the image generator into a video generator by int… ▽ More

    Submitted 27 December, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: Conference on Computer Vision and Pattern Recognition (CVPR) 2023. Project page: https://research.nvidia.com/labs/toronto-ai/VideoLDM/

  29. arXiv:2304.03266  [pdf, other

    cs.CV

    Neural Fields meet Explicit Geometric Representation for Inverse Rendering of Urban Scenes

    Authors: Zian Wang, Tianchang Shen, Jun Gao, Shengyu Huang, Jacob Munkberg, Jon Hasselgren, Zan Gojcic, Wenzheng Chen, Sanja Fidler

    Abstract: Reconstruction and intrinsic decomposition of scenes from captured imagery would enable many applications such as relighting and virtual object insertion. Recent NeRF based methods achieve impressive fidelity of 3D reconstruction, but bake the lighting and shadows into the radiance field, while mesh-based methods that facilitate intrinsic decomposition through differentiable rendering have not yet… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: CVPR 2023. Project page: https://nv-tlabs.github.io/fegr/

  30. arXiv:2304.01893  [pdf, other

    cs.CV cs.GR cs.LG

    Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion

    Authors: Davis Rempe, Zhengyi Luo, Xue Bin Peng, Ye Yuan, Kris Kitani, Karsten Kreis, Sanja Fidler, Or Litany

    Abstract: We introduce a method for generating realistic pedestrian trajectories and full-body animations that can be controlled to meet user-defined goals. We draw on recent advances in guided diffusion modeling to achieve test-time controllability of trajectories, which is normally only associated with rule-based systems. Our guided diffusion model allows users to constrain trajectories through target way… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

    Comments: Conference on Computer Vision and Pattern Recognition (CVPR) 2023

  31. arXiv:2302.12251  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion

    Authors: Yiming Li, Zhiding Yu, Christopher Choy, Chaowei Xiao, Jose M. Alvarez, Sanja Fidler, Chen Feng, Anima Anandkumar

    Abstract: Humans can easily imagine the complete 3D geometry of occluded objects and scenes. This appealing ability is vital for recognition and understanding. To enable such capability in AI systems, we propose VoxFormer, a Transformer-based semantic scene completion framework that can output complete 3D volumetric semantics from only 2D images. Our framework adopts a two-stage design where we start from a… ▽ More

    Submitted 25 March, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

    Comments: CVPR 2023 Highlight (10% of accepted papers, 2.5% of submissions)

  32. arXiv:2302.04832  [pdf, other

    cs.CV

    Bridging the Sim2Real gap with CARE: Supervised Detection Adaptation with Conditional Alignment and Reweighting

    Authors: Viraj Prabhu, David Acuna, Andrew Liao, Rafid Mahmood, Marc T. Law, Judy Hoffman, Sanja Fidler, James Lucas

    Abstract: Sim2Real domain adaptation (DA) research focuses on the constrained setting of adapting from a labeled synthetic source domain to an unlabeled or sparsely labeled real target domain. However, for high-stakes applications (e.g. autonomous driving), it is common to have a modest amount of human-labeled real data in addition to plentiful auto-labeled source data (e.g. from a driving simulator). We st… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

  33. arXiv:2302.00883  [pdf, other

    cs.GR cs.AI cs.LG

    Synthesizing Physical Character-Scene Interactions

    Authors: Mohamed Hassan, Yunrong Guo, Tingwu Wang, Michael Black, Sanja Fidler, Xue Bin Peng

    Abstract: Movement is how people interact with and affect their environment. For realistic character animation, it is necessary to synthesize such interactions between virtual characters and their surroundings. Despite recent progress in character animation using machine learning, most systems focus on controlling an agent's movements in fairly simple and homogeneous environments, with limited interactions… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

  34. arXiv:2301.13868  [pdf, other

    cs.LG cs.AI cs.CL cs.GR

    PADL: Language-Directed Physics-Based Character Control

    Authors: Jordan Juravsky, Yunrong Guo, Sanja Fidler, Xue Bin Peng

    Abstract: Develo** systems that can synthesize natural and life-like motions for simulated characters has long been a focus for computer animation. But in order for these systems to be useful for downstream applications, they need not only produce high-quality motions, but must also provide an accessible and versatile interface through which users can direct a character's behaviors. Natural language provi… ▽ More

    Submitted 31 January, 2023; originally announced January 2023.

  35. arXiv:2211.10440  [pdf, other

    cs.CV cs.GR cs.LG

    Magic3D: High-Resolution Text-to-3D Content Creation

    Authors: Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi Lin

    Abstract: DreamFusion has recently demonstrated the utility of a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF), achieving remarkable text-to-3D synthesis results. However, the method has two inherent limitations: (a) extremely slow optimization of NeRF and (b) low-resolution image space supervision on NeRF, leading to low-quality 3D models with a long processing time. I… ▽ More

    Submitted 25 March, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

    Comments: Accepted to CVPR 2023 as highlight. Project website: https://research.nvidia.com/labs/dir/magic3d

  36. arXiv:2210.06978  [pdf, other

    cs.CV cs.LG stat.ML

    LION: Latent Point Diffusion Models for 3D Shape Generation

    Authors: Xiaohui Zeng, Arash Vahdat, Francis Williams, Zan Gojcic, Or Litany, Sanja Fidler, Karsten Kreis

    Abstract: Denoising diffusion models (DDMs) have shown promising results in 3D point cloud synthesis. To advance 3D DDMs and make them useful for digital artists, we require (i) high generation quality, (ii) flexibility for manipulation and applications such as conditional synthesis and shape interpolation, and (iii) the ability to output smooth surfaces or meshes. To this end, we introduce the hierarchical… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022

  37. arXiv:2210.03007  [pdf, other

    cs.CV cs.GR cs.LG

    XDGAN: Multi-Modal 3D Shape Generation in 2D Space

    Authors: Hassan Abu Alhaija, Alara Dirik, André Knörig, Sanja Fidler, Maria Shugrina

    Abstract: Generative models for 2D images has recently seen tremendous progress in quality, resolution and speed as a result of the efficiency of 2D convolutional architectures. However it is difficult to extend this progress into the 3D domain since most current 3D representations rely on custom network components. This paper addresses a central question: Is it possible to directly leverage 2D image genera… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

  38. arXiv:2210.01234  [pdf, other

    cs.LG cs.AI cs.CV

    Optimizing Data Collection for Machine Learning

    Authors: Rafid Mahmood, James Lucas, Jose M. Alvarez, Sanja Fidler, Marc T. Law

    Abstract: Modern deep learning systems require huge data sets to achieve impressive performance, but there is little guidance on how much or what kind of data to collect. Over-collecting data incurs unnecessary present costs, while under-collecting may incur future costs and delay workflows. We propose a new paradigm for modeling the data collection workflow as a formal optimal data collection problem that… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS 2022

  39. arXiv:2209.13064  [pdf, other

    cs.CV cs.AI cs.LG

    EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations

    Authors: Ahmad Darkhalil, Dandan Shan, Bin Zhu, Jian Ma, Amlan Kar, Richard Higgins, Sanja Fidler, David Fouhey, Dima Damen

    Abstract: We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transf… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: 10 pages main, 38 pages appendix. Accepted at NeurIPS 2022 Track on Datasets and Benchmarks Data, code and leaderboards from: http://epic-kitchens.github.io/VISOR

  40. arXiv:2209.11163  [pdf, other

    cs.CV

    GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images

    Authors: Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, Sanja Fidler

    Abstract: As several industries are moving towards modeling massive 3D virtual worlds, the need for content creation tools that can scale in terms of the quantity, quality, and diversity of 3D content is becoming evident. In our work, we aim to train performant 3D generative models that synthesize textured meshes which can be directly consumed by 3D rendering engines, thus immediately usable in downstream a… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

    Comments: NeurIPS 2022, Project Page: https://nv-tlabs.github.io/GET3D/

  41. arXiv:2208.09480  [pdf, other

    cs.CV

    Neural Light Field Estimation for Street Scenes with Differentiable Virtual Object Insertion

    Authors: Zian Wang, Wenzheng Chen, David Acuna, Jan Kautz, Sanja Fidler

    Abstract: We consider the challenging problem of outdoor lighting estimation for the goal of photorealistic virtual object insertion into photographs. Existing works on outdoor lighting estimation typically simplify the scene lighting into an environment map which cannot capture the spatially-varying lighting effects in outdoor scenes. In this work, we propose a neural approach that estimates the 5D HDR lig… ▽ More

    Submitted 19 August, 2022; originally announced August 2022.

    Comments: Webpage: https://nv-tlabs.github.io/outdoor-ar/

    Journal ref: ECCV 2022

  42. arXiv:2208.08580  [pdf, other

    cs.CV cs.AI cs.GR

    MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D Segmentation

    Authors: Gopal Sharma, Kangxue Yin, Subhransu Maji, Evangelos Kalogerakis, Or Litany, Sanja Fidler

    Abstract: We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks. This is inspired by the observation that view-based surface representations are more effective at modeling high-resolution surface details and texture than their 3D counterparts based on point clouds or voxel occupancy. Specifically, given a 3D shape, we render it from multiple views, an… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: project page: https://nv-tlabs.github.io/MvDeCor/

  43. arXiv:2207.02126  [pdf, other

    cs.CV

    Improving Semantic Segmentation in Transformers using Hierarchical Inter-Level Attention

    Authors: Gary Leung, Jun Gao, Xiaohui Zeng, Sanja Fidler

    Abstract: Existing transformer-based image backbones typically propagate feature information in one direction from lower to higher-levels. This may not be ideal since the localization ability to delineate accurate object boundaries, is most prominent in the lower, high-resolution feature maps, while the semantics that can disambiguate image signals belonging to one object vs. another, typically emerges in a… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

  44. arXiv:2207.01725  [pdf, other

    cs.CV cs.LG

    How Much More Data Do I Need? Estimating Requirements for Downstream Tasks

    Authors: Rafid Mahmood, James Lucas, David Acuna, Daiqing Li, Jonah Philion, Jose M. Alvarez, Zhiding Yu, Sanja Fidler, Marc T. Law

    Abstract: Given a small training data set and a learning algorithm, how much more data is necessary to reach a target validation or test performance? This question is of critical importance in applications such as autonomous driving or medical imaging where collecting data is expensive and time-consuming. Overestimating or underestimating data requirements incurs substantial costs that could be avoided with… ▽ More

    Submitted 13 July, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

    Comments: Accepted to CVPR 2022

  45. arXiv:2206.09386  [pdf, other

    cs.LG cs.CV

    Scalable Neural Data Server: A Data Recommender for Transfer Learning

    Authors: Tianshi Cao, Sasha Doubov, David Acuna, Sanja Fidler

    Abstract: Absence of large-scale labeled data in the practitioner's target domain can be a bottleneck to applying machine learning algorithms in practice. Transfer learning is a popular strategy for leveraging additional data to improve the downstream performance, but finding the most relevant data to transfer from can be challenging. Neural Data Server (NDS), a search engine that recommends relevant data f… ▽ More

    Submitted 19 June, 2022; originally announced June 2022.

    Comments: Neurips 2021

    Journal ref: Advances in Neural Information Processing Systems, Volume 34, pages 8984-8997, year 2021

  46. arXiv:2206.07707  [pdf, other

    cs.CV cs.GR cs.LG cs.MM

    Variable Bitrate Neural Fields

    Authors: Towaki Takikawa, Alex Evans, Jonathan Tremblay, Thomas Müller, Morgan McGuire, Alec Jacobson, Sanja Fidler

    Abstract: Neural approximations of scalar and vector fields, such as signed distance functions and radiance fields, have emerged as accurate, high-quality representations. State-of-the-art results are obtained by conditioning a neural approximation with a lookup from trainable feature grids that take on part of the learning task and allow for smaller, more efficient neural networks. Unfortunately, these fea… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

    Comments: SIGGRAPH 2022. Project Page: https://nv-tlabs.github.io/vqad/

  47. arXiv:2206.02903  [pdf, other

    cs.CV

    Polymorphic-GAN: Generating Aligned Samples across Multiple Domains with Learned Morph Maps

    Authors: Seung Wook Kim, Karsten Kreis, Daiqing Li, Antonio Torralba, Sanja Fidler

    Abstract: Modern image generative models show remarkable sample quality when trained on a single domain or class of objects. In this work, we introduce a generative adversarial network that can simultaneously generate aligned image samples from multiple related domains. We leverage the fact that a variety of object classes share common attributes, with certain geometric differences. We propose Polymorphic-G… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

    Comments: CVPR 2022 Oral

  48. arXiv:2205.01906  [pdf, other

    cs.GR cs.AI cs.LG

    ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters

    Authors: Xue Bin Peng, Yunrong Guo, Lina Halper, Sergey Levine, Sanja Fidler

    Abstract: The incredible feats of athleticism demonstrated by humans are made possible in part by a vast repertoire of general-purpose motor skills, acquired through years of practice and experience. These skills not only enable humans to perform complex tasks, but also provide powerful priors for guiding their behaviors when learning new tasks. This is in stark contrast to what is common practice in physic… ▽ More

    Submitted 5 May, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

  49. arXiv:2204.05088  [pdf, other

    cs.CV

    M$^2$BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation

    Authors: Enze Xie, Zhiding Yu, Daquan Zhou, Jonah Philion, Anima Anandkumar, Sanja Fidler, ** Luo, Jose M. Alvarez

    Abstract: In this paper, we propose M$^2$BEV, a unified framework that jointly performs 3D object detection and map segmentation in the Birds Eye View~(BEV) space with multi-camera image inputs. Unlike the majority of previous works which separately process detection and segmentation, M$^2$BEV infers both tasks with a unified model and improves efficiency. M$^2$BEV efficiently transforms multi-view 2D image… ▽ More

    Submitted 19 April, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Tech Report

  50. arXiv:2204.03105  [pdf, other

    cs.CV cs.GR cs.LG

    AUV-Net: Learning Aligned UV Maps for Texture Transfer and Synthesis

    Authors: Zhiqin Chen, Kangxue Yin, Sanja Fidler

    Abstract: In this paper, we address the problem of texture representation for 3D shapes for the challenging and underexplored tasks of texture transfer and synthesis. Previous works either apply spherical texture maps which may lead to large distortions, or use continuous texture fields that yield smooth outputs lacking details. We argue that the traditional way of representing textures with images and link… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

    Comments: CVPR 2022. Project page: https://nv-tlabs.github.io/AUV-NET