Skip to main content

Showing 1–50 of 157 results for author: Nießner, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18524  [pdf, other

    cs.CV

    MultiDiff: Consistent Novel View Synthesis from a Single Image

    Authors: Norman Müller, Katja Schwarz, Barbara Roessle, Lorenzo Porzi, Samuel Rota Bulò, Matthias Nießner, Peter Kontschieder

    Abstract: We introduce MultiDiff, a novel approach for consistent novel view synthesis of scenes from a single RGB image. The task of synthesizing novel views from a single reference image is highly ill-posed by nature, as there exist multiple, plausible explanations for unobserved areas. To address this issue, we incorporate strong priors in form of monocular depth predictors and video-diffusion models. Mo… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Project page: https://sirwyver.github.io/MultiDiff Video: https://youtu.be/zBC4z4qXW_4 - CVPR 2024

  2. arXiv:2406.09377  [pdf, other

    cs.CV

    GGHead: Fast and Generalizable 3D Gaussian Heads

    Authors: Tobias Kirschstein, Simon Giebenhain, Jiapeng Tang, Markos Georgopoulos, Matthias Nießner

    Abstract: Learning 3D head priors from large 2D image collections is an important step towards high-quality 3D-aware human modeling. A core requirement is an efficient architecture that scales well to large-scale datasets and large image resolutions. Unfortunately, existing 3D GANs struggle to scale to generate samples at high resolutions due to their relatively slow train and render speeds, and typically h… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project Page: https://tobias-kirschstein.github.io/gghead/ ; YouTube Video: https://www.youtube.com/watch?v=1iyC74neQXc

  3. arXiv:2405.19331  [pdf, other

    cs.CV cs.AI cs.GR

    NPGA: Neural Parametric Gaussian Avatars

    Authors: Simon Giebenhain, Tobias Kirschstein, Martin Rünz, Lourdes Agapito, Matthias Nießner

    Abstract: The creation of high-fidelity, digital versions of human heads is an important step** stone in the process of further integrating virtual components into our everyday lives. Constructing such avatars is a challenging research problem, due to a high demand for photo-realism and real-time rendering performance. In this work, we propose Neural Parametric Gaussian Avatars (NPGA), a data-driven appro… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Project Page: see https://simongiebenhain.github.io/NPGA/ ; Youtube Video: see https://www.youtube.com/watch?v=NGRxAYbIkus

  4. arXiv:2405.10255  [pdf, other

    cs.CV cs.RO

    When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

    Authors: Xianzheng Ma, Yash Bhalgat, Brandon Smart, Shuai Chen, Xinghui Li, Jian Ding, **dong Gu, Dave Zhenyu Chen, Songyou Peng, Jia-Wang Bian, Philip H Torr, Marc Pollefeys, Matthias Nießner, Ian D Reid, Angel X. Chang, Iro Laina, Victor Adrian Prisacariu

    Abstract: As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the methodologies enabling LLMs to process, understand, and generate 3D data. Highlighting the unique advantages of LLMs, such as in-context lear… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  5. arXiv:2403.19319  [pdf, other

    cs.CV

    Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation

    Authors: Yu** Chen, Yinyu Nie, Benjamin Ummenhofer, Reiner Birkl, Michael Paulitsch, Matthias Müller, Matthias Nießner

    Abstract: We present Mesh2NeRF, an approach to derive ground-truth radiance fields from textured meshes for 3D generation tasks. Many 3D generative approaches represent 3D scenes as radiance fields for training. Their ground-truth radiance fields are usually fitted from multi-view renderings from a large-scale synthetic 3D dataset, which often results in artifacts due to occlusions or under-fitting issues.… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Project page: https://terencecyj.github.io/projects/Mesh2NeRF/ Video: https://youtu.be/oufv1N3f7iY

  6. arXiv:2403.17550  [pdf, other

    cs.CV cs.LG cs.RO

    DeepMIF: Deep Monotonic Implicit Fields for Large-Scale LiDAR 3D Map**

    Authors: Kutay Yılmaz, Matthias Nießner, Anastasiia Kornilova, Alexey Artemov

    Abstract: Recently, significant progress has been achieved in sensing real large-scale outdoor 3D environments, particularly by using modern acquisition equipment such as LiDAR sensors. Unfortunately, they are fundamentally limited in their ability to produce dense, complete 3D scenes. To address this issue, recent learning-based methods integrate neural implicit representations and optimizable feature grid… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 8 pages, 6 figures

  7. arXiv:2403.16318  [pdf, other

    cs.CV

    AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D Scans

    Authors: Cedric Perauer, Laurenz Adrian Heidrich, Haifan Zhang, Matthias Nießner, Anastasiia Kornilova, Alexey Artemov

    Abstract: Recently, progress in acquisition equipment such as LiDAR sensors has enabled sensing increasingly spacious outdoor 3D environments. Making sense of such 3D acquisitions requires fine-grained scene understanding, such as constructing instance-based 3D scene segmentations. Commonly, a neural network is trained for this task; however, this requires access to a large, densely annotated dataset, which… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: 9 pages, 7 figures

    ACM Class: I.4.6; I.2.9

  8. arXiv:2403.10615  [pdf, other

    cs.CV cs.GR cs.LG

    LightIt: Illumination Modeling and Control for Diffusion Models

    Authors: Peter Kocsis, Julien Philip, Kalyan Sunkavalli, Matthias Nießner, Yannick Hold-Geoffroy

    Abstract: We introduce LightIt, a method for explicit illumination control for image generation. Recent generative methods lack lighting control, which is crucial to numerous artistic aspects of image generation such as setting the overall mood or cinematic appearance. To overcome these limitations, we propose to condition the generation on shading and normal maps. We model the lighting with single bounce s… ▽ More

    Submitted 25 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: Project page: https://peter-kocsis.github.io/LightIt/ Video: https://youtu.be/cCfSBD5aPLI

    ACM Class: I.4.8; I.2.10

  9. arXiv:2403.01807  [pdf, other

    cs.CV

    ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models

    Authors: Lukas Höllein, Aljaž Božič, Norman Müller, David Novotny, Hung-Yu Tseng, Christian Richardt, Michael Zollhöfer, Matthias Nießner

    Abstract: 3D asset generation is getting massive amounts of attention, inspired by the recent success of text-guided 2D content creation. Existing text-to-3D methods use pretrained text-to-image diffusion models in an optimization problem or fine-tune them on synthetic data, which often results in non-photorealistic 3D objects without backgrounds. In this paper, we present a method that leverages pretrained… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024, project page: https://lukashoel.github.io/ViewDiff/, video: https://www.youtube.com/watch?v=SdjoCqHzMMk, code: https://github.com/facebookresearch/ViewDiff

  10. arXiv:2401.06614  [pdf, other

    cs.CV

    Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking

    Authors: Wei Cao, Chang Luo, Biao Zhang, Matthias Nießner, Jiapeng Tang

    Abstract: We introduce Motion2VecSets, a 4D diffusion model for dynamic surface reconstruction from point cloud sequences. While existing state-of-the-art methods have demonstrated success in reconstructing non-rigid objects using neural field representations, conventional feed-forward networks encounter challenges with ambiguous observations from noisy, partial, or sparse point clouds. To address these cha… ▽ More

    Submitted 13 April, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

  11. arXiv:2312.14140  [pdf, other

    cs.CV

    HeadCraft: Modeling High-Detail Shape Variations for Animated 3DMMs

    Authors: Artem Sevastopolsky, Philip-William Grassal, Simon Giebenhain, ShahRukh Athar, Luisa Verdoliva, Matthias Niessner

    Abstract: Current advances in human head modeling allow to generate plausible-looking 3D head models via neural representations. Nevertheless, constructing complete high-fidelity head models with explicitly controlled animation remains an issue. Furthermore, completing the head geometry based on a partial observation, e.g. coming from a depth sensor, while preserving details is often problematic for the exi… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Project page: https://seva100.github.io/headcraft. Video: https://youtu.be/uBeBT2f1CL0. 23 pages, 19 figures, 2 tables

  12. arXiv:2312.12274  [pdf, other

    cs.CV cs.AI cs.GR

    Intrinsic Image Diffusion for Indoor Single-view Material Estimation

    Authors: Peter Kocsis, Vincent Sitzmann, Matthias Nießner

    Abstract: We present Intrinsic Image Diffusion, a generative model for appearance decomposition of indoor scenes. Given a single input view, we sample multiple possible material explanations represented as albedo, roughness, and metallic maps. Appearance decomposition poses a considerable challenge in computer vision due to the inherent ambiguity between lighting and material properties and the lack of real… ▽ More

    Submitted 21 March, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: Project page: https://peter-kocsis.github.io/IntrinsicImageDiffusion/ Video: https://youtu.be/lz0meJlj5cA

    ACM Class: I.4.8; I.2.10

  13. arXiv:2312.11417  [pdf, other

    cs.CV

    PolyDiff: Generating 3D Polygonal Meshes with Diffusion Models

    Authors: Antonio Alliegro, Yawar Siddiqui, Tatiana Tommasi, Matthias Nießner

    Abstract: We introduce PolyDiff, the first diffusion-based approach capable of directly generating realistic and diverse 3D polygonal meshes. In contrast to methods that use alternate 3D shape representations (e.g. implicit representations), our approach is a discrete denoising diffusion probabilistic model that operates natively on the polygonal mesh data structure. This enables learning of both the geomet… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  14. arXiv:2312.08459  [pdf, other

    cs.CV cs.AI cs.GR cs.SD eess.AS

    FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models

    Authors: Shivangi Aneja, Justus Thies, Angela Dai, Matthias Nießner

    Abstract: We introduce FaceTalk, a novel generative approach designed for synthesizing high-fidelity 3D motion sequences of talking human heads from input audio signal. To capture the expressive, detailed nature of human heads, including hair, ears, and finer-scale eye movements, we propose to couple speech signal with the latent space of neural parametric head models to create high-fidelity, temporally coh… ▽ More

    Submitted 17 March, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: Paper Video: https://youtu.be/7Jf0kawrA3Q Project Page: https://shivangi-aneja.github.io/projects/facetalk/

    Journal ref: CVPR 2024

  15. arXiv:2312.07231  [pdf, other

    cs.CV cs.AI cs.LG

    Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation

    Authors: Shentong Mo, Enze Xie, Yue Wu, Junsong Chen, Matthias Nießner, Zhenguo Li

    Abstract: Diffusion Transformers have recently shown remarkable effectiveness in generating high-quality 3D point clouds. However, training voxel-based diffusion models for high-resolution 3D voxels remains prohibitively expensive due to the cubic complexity of attention operators, which arises from the additional dimension of voxels. Motivated by the inherent redundancy of 3D compared to 2D, we propose Fas… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: Project Page: https://dit-3d.github.io/FastDiT-3D/

  16. arXiv:2312.06740  [pdf, other

    cs.CV

    MonoNPHM: Dynamic Head Reconstruction from Monocular Videos

    Authors: Simon Giebenhain, Tobias Kirschstein, Markos Georgopoulos, Martin Rünz, Lourdes Agapito, Matthias Nießner

    Abstract: We present Monocular Neural Parametric Head Models (MonoNPHM) for dynamic 3D head reconstructions from monocular RGB videos. To this end, we propose a latent appearance space that parameterizes a texture field on top of a neural parametric model. We constrain predicted color values to be correlated with the underlying geometry such that gradients from RGB effectively influence latent geometry code… ▽ More

    Submitted 29 May, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Project Page: see https://simongiebenhain.github.io/MonoNPHM/ ; Video: see https://youtu.be/n-wjaC3UIeE

  17. arXiv:2312.02069  [pdf, other

    cs.CV

    GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians

    Authors: Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Simon Giebenhain, Matthias Nießner

    Abstract: We introduce GaussianAvatars, a new method to create photorealistic head avatars that are fully controllable in terms of expression, pose, and viewpoint. The core idea is a dynamic 3D representation based on 3D Gaussian splats that are rigged to a parametric morphable face model. This combination facilitates photorealistic rendering while allowing for precise animation control via the underlying p… ▽ More

    Submitted 28 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Project page: https://shenhanqian.github.io/gaussian-avatars

  18. arXiv:2312.01068  [pdf, other

    cs.CV

    DPHMs: Diffusion Parametric Head Models for Depth-based Tracking

    Authors: Jiapeng Tang, Angela Dai, Yinyu Nie, Lev Markhasin, Justus Thies, Matthias Niessner

    Abstract: We introduce Diffusion Parametric Head Models (DPHMs), a generative model that enables robust volumetric head reconstruction and tracking from monocular depth sequences. While recent volumetric head models, such as NPHMs, can now excel in representing high-fidelity head geometries, tracking and reconstructing heads from real-world single-view depth sequences remains very challenging, as the fittin… ▽ More

    Submitted 8 April, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

    Comments: CVPR 2024; homepage: https://tangjiapeng.github.io/projects/DPHMs/

  19. arXiv:2312.00195  [pdf, other

    cs.CV

    Raising the Bar of AI-generated Image Detection with CLIP

    Authors: Davide Cozzolino, Giovanni Poggi, Riccardo Corvi, Matthias Nießner, Luisa Verdoliva

    Abstract: The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images. We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios. We find that, contrary to previous beliefs, it is neither necessary nor convenient to use a large domain-specific dataset… ▽ More

    Submitted 29 April, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

  20. arXiv:2311.18635  [pdf, other

    cs.CV

    DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars

    Authors: Tobias Kirschstein, Simon Giebenhain, Matthias Nießner

    Abstract: DiffusionAvatars synthesizes a high-fidelity 3D head avatar of a person, offering intuitive control over both pose and expression. We propose a diffusion-based neural renderer that leverages generic 2D priors to produce compelling images of faces. For coarse guidance of the expression and head pose, we render a neural parametric head model (NPHM) from the target viewpoint, which acts as a proxy ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Project Page: https://tobias-kirschstein.github.io/diffusion-avatars/ , Video: https://youtu.be/nSjDiiTnp2E

  21. arXiv:2311.18494  [pdf, other

    cs.CV

    PRS: Sharp Feature Priors for Resolution-Free Surface Remeshing

    Authors: Natalia Soboleva, Olga Gorbunova, Maria Ivanova, Evgeny Burnaev, Matthias Nießner, Denis Zorin, Alexey Artemov

    Abstract: Surface reconstruction with preservation of geometric features is a challenging computer vision task. Despite significant progress in implicit shape reconstruction, state-of-the-art mesh extraction methods often produce aliased, perceptually distorted surfaces and lack scalability to high-resolution 3D shapes. We present a data-driven approach for automatic feature detection and remeshing that req… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  22. arXiv:2311.17261  [pdf, other

    cs.CV

    SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors

    Authors: Dave Zhenyu Chen, Haoxuan Li, Hsin-Ying Lee, Sergey Tulyakov, Matthias Nießner

    Abstract: We propose SceneTex, a novel method for effectively generating high-quality and style-consistent textures for indoor scenes using depth-to-image diffusion priors. Unlike previous methods that either iteratively warp 2D views onto a mesh surface or distillate diffusion latent features without accurate geometric and style cues, SceneTex formulates the texture synthesis task as an optimization proble… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: Project website: https://daveredrum.github.io/SceneTex/

  23. arXiv:2311.15475  [pdf, other

    cs.CV cs.LG

    MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers

    Authors: Yawar Siddiqui, Antonio Alliegro, Alexey Artemov, Tatiana Tommasi, Daniele Sirigatti, Vladislav Rosov, Angela Dai, Matthias Nießner

    Abstract: We introduce MeshGPT, a new approach for generating triangle meshes that reflects the compactness typical of artist-created meshes, in contrast to dense triangle meshes extracted by iso-surfacing methods from neural fields. Inspired by recent advances in powerful large language models, we adopt a sequence-based approach to autoregressively generate triangle meshes as sequences of triangles. We fir… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: Project Page: https://nihalsid.github.io/mesh-gpt/, Video: https://youtu.be/UV90O1_69_o

  24. arXiv:2310.19516  [pdf, other

    cs.CV

    Generating Context-Aware Natural Answers for Questions in 3D Scenes

    Authors: Mohammed Munzer Dwedari, Matthias Niessner, Dave Zhenyu Chen

    Abstract: 3D question answering is a young field in 3D vision-language that is yet to be explored. Previous methods are limited to a pre-defined answer space and cannot generate answers naturally. In this work, we pivot the question answering task to a sequence generation task to generate free-form natural answers for questions in 3D scenes (Gen3DQA). To this end, we optimize our model directly on the langu… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

  25. arXiv:2310.07204  [pdf, other

    cs.AI cs.CV cs.GR cs.LG

    State of the Art on Diffusion Models for Visual Computing

    Authors: Ryan Po, Wang Yifan, Vladislav Golyanik, Kfir Aberman, Jonathan T. Barron, Amit H. Bermano, Eric Ryan Chan, Tali Dekel, Aleksander Holynski, Angjoo Kanazawa, C. Karen Liu, Lingjie Liu, Ben Mildenhall, Matthias Nießner, Björn Ommer, Christian Theobalt, Peter Wonka, Gordon Wetzstein

    Abstract: The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. In these domains, diffusion models are the generative AI architecture of choice. Within the last year alone, the literature on diffusion-based tools and applicat… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  26. arXiv:2308.11417  [pdf, other

    cs.CV

    ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes

    Authors: Chandan Yeshwanth, Yueh-Cheng Liu, Matthias Nießner, Angela Dai

    Abstract: We present ScanNet++, a large-scale dataset that couples together capture of high-quality and commodity-level geometry and color of indoor scenes. Each scene is captured with a high-end laser scanner at sub-millimeter resolution, along with registered 33-megapixel images from a DSLR camera, and RGB-D streams from an iPhone. Scene reconstructions are further annotated with an open vocabulary of sem… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: ICCV 2023. Video: https://youtu.be/E6P9e2r6M8I , Project page: https://cy94.github.io/scannetpp/

  27. arXiv:2307.01831  [pdf, other

    cs.CV cs.AI cs.LG

    DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation

    Authors: Shentong Mo, Enze Xie, Ruihang Chu, Lewei Yao, Lanqing Hong, Matthias Nießner, Zhenguo Li

    Abstract: Recent Diffusion Transformers (e.g., DiT) have demonstrated their powerful effectiveness in generating high-quality 2D images. However, it is still being determined whether the Transformer architecture performs equally well in 3D shape generation, as previous 3D diffusion methods mostly adopted the U-Net architecture. To bridge this gap, we propose a novel Diffusion Transformer for 3D shape genera… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: Project Page: https://dit-3d.github.io/

  28. arXiv:2306.16329  [pdf, other

    cs.CV

    DiffComplete: Diffusion-based Generative 3D Shape Completion

    Authors: Ruihang Chu, Enze Xie, Shentong Mo, Zhenguo Li, Matthias Nießner, Chi-Wing Fu, Jiaya Jia

    Abstract: We introduce a new diffusion-based approach for shape completion on 3D range scans. Compared with prior deterministic and probabilistic methods, we strike a balance between realism, multi-modality, and high fidelity. We propose DiffComplete by casting shape completion as a generative task conditioned on the incomplete shape. Our key designs are two-fold. First, we devise a hierarchical feature agg… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: Project Page: https://ruihangchu.com/diffcomplete.html

  29. arXiv:2306.09077  [pdf, other

    cs.CV cs.GR

    Estimating Generic 3D Room Structures from 2D Annotations

    Authors: Denys Rozumnyi, Stefan Popov, Kevis-Kokitsi Maninis, Matthias Nießner, Vittorio Ferrari

    Abstract: Indoor rooms are among the most common use cases in 3D scene understanding. Current state-of-the-art methods for this task are driven by large annotated datasets. Room layouts are especially important, consisting of structural elements in 3D, such as wall, floor, and ceiling. However, they are difficult to annotate, especially on pure RGB video. We propose a novel method to produce generic 3D room… ▽ More

    Submitted 21 December, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: https://github.com/google-research/cad-estate Accepted at 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks

  30. arXiv:2306.09011  [pdf, other

    cs.CV

    CAD-Estate: Large-scale CAD Model Annotation in RGB Videos

    Authors: Kevis-Kokitsi Maninis, Stefan Popov, Matthias Nießner, Vittorio Ferrari

    Abstract: We propose a method for annotating videos of complex multi-object scenes with a globally-consistent 3D representation of the objects. We annotate each object with a CAD model from a database, and place it in the 3D coordinate frame of the scene with a 9-DoF pose transformation. Our method is semi-automatic and works on commonly-available RGB videos, without requiring a depth sensor. Many steps are… ▽ More

    Submitted 14 August, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Project page: https://github.com/google-research/cad-estate

  31. arXiv:2306.06044  [pdf, other

    cs.CV cs.GR eess.IV

    GANeRF: Leveraging Discriminators to Optimize Neural Radiance Fields

    Authors: Barbara Roessle, Norman Müller, Lorenzo Porzi, Samuel Rota Bulò, Peter Kontschieder, Matthias Nießner

    Abstract: Neural Radiance Fields (NeRF) have shown impressive novel view synthesis results; nonetheless, even thorough recordings yield imperfections in reconstructions, for instance due to poorly observed areas or minor lighting changes. Our goal is to mitigate these imperfections from various sources with a joint solution: we take advantage of the ability of generative adversarial networks (GANs) to produ… ▽ More

    Submitted 18 September, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: SIGGRAPH Asia 2023, project page: https://barbararoessle.github.io/ganerf , video: https://youtu.be/352ccXWxQVE

    Journal ref: ACM Transactions on Graphics, Vol. 42, No. 6, Article 207 (2023) 1-14

  32. arXiv:2305.06356  [pdf, other

    cs.CV cs.GR cs.LG

    HumanRF: High-Fidelity Neural Radiance Fields for Humans in Motion

    Authors: Mustafa Işık, Martin Rünz, Markos Georgopoulos, Taras Khakhulin, Jonathan Starck, Lourdes Agapito, Matthias Nießner

    Abstract: Representing human performance at high-fidelity is an essential building block in diverse applications, such as film production, computer games or videoconferencing. To close the gap to production-level quality, we introduce HumanRF, a 4D dynamic neural scene representation that captures full-body appearance in motion from multi-view video input, and enables playback from novel, unseen viewpoints.… ▽ More

    Submitted 11 May, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: Project webpage: https://synthesiaresearch.github.io/humanrf Dataset webpage: https://www.actors-hq.com/ Video: https://www.youtube.com/watch?v=OTnhiLLE7io Code: https://github.com/synthesiaresearch/humanrf

  33. NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads

    Authors: Tobias Kirschstein, Shenhan Qian, Simon Giebenhain, Tim Walter, Matthias Nießner

    Abstract: We focus on reconstructing high-fidelity radiance fields of human heads, capturing their animations over time, and synthesizing re-renderings from novel viewpoints at arbitrary time steps. To this end, we propose a new multi-view capture setup composed of 16 calibrated machine vision cameras that record time-synchronized images at 7.1 MP resolution and 73 frames per second. With our setup, we coll… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: Siggraph 2023, Project Page: https://tobias-kirschstein.github.io/nersemble/ , Video: https://youtu.be/a-OAWqBzldU

    Journal ref: ACM Transactions on Graphics, Volume 42, Issue 4, Article No. 161 (2023) 1-14

  34. arXiv:2303.17015  [pdf, other

    cs.CV cs.LG

    HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion

    Authors: Ziya Erkoç, Fangchang Ma, Qi Shan, Matthias Nießner, Angela Dai

    Abstract: Implicit neural fields, typically encoded by a multilayer perceptron (MLP) that maps from coordinates (e.g., xyz) to signals (e.g., signed distances), have shown remarkable promise as a high-fidelity and compact representation. However, the lack of a regular and explicit grid structure also makes it challenging to apply generative modeling directly on implicit neural fields in order to synthesize… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: Project page: https://ziyaerkoc.com/hyperdiffusion/ Video: https://www.youtube.com/watch?v=wjFpsKdo-II

  35. arXiv:2303.14207  [pdf, other

    cs.CV

    DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis

    Authors: Jiapeng Tang, Yinyu Nie, Lev Markhasin, Angela Dai, Justus Thies, Matthias Nießner

    Abstract: We present DiffuScene for indoor 3D scene synthesis based on a novel scene configuration denoising diffusion model. It generates 3D instance properties stored in an unordered object set and retrieves the most similar geometry for each object configuration, which is characterized as a concatenation of different attributes, including location, size, orientation, semantics, and geometry features. We… ▽ More

    Submitted 12 March, 2024; v1 submitted 24 March, 2023; originally announced March 2023.

    Comments: CVPR 2024

  36. arXiv:2303.13497  [pdf, other

    cs.CV cs.AI

    TriPlaneNet: An Encoder for EG3D Inversion

    Authors: Ananta R. Bhattarai, Matthias Nießner, Artem Sevastopolsky

    Abstract: Recent progress in NeRF-based GANs has introduced a number of approaches for high-resolution and high-fidelity generative modeling of human heads with a possibility for novel view rendering. At the same time, one must solve an inverse problem to be able to re-render or modify an existing image or video. Despite the success of universal optimization-based methods for 2D GAN inversion, those applied… ▽ More

    Submitted 25 October, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: Project page: https://anantarb.github.io/triplanenet

  37. arXiv:2303.11989  [pdf, other

    cs.CV

    Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models

    Authors: Lukas Höllein, Ang Cao, Andrew Owens, Justin Johnson, Matthias Nießner

    Abstract: We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input. To this end, we leverage pre-trained 2D text-to-image models to synthesize a sequence of images from different poses. In order to lift these outputs into a consistent 3D scene representation, we combine monocular depth estimation with a text-conditioned inpainting model. The core idea of… ▽ More

    Submitted 10 September, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: Accepted to ICCV 2023 (Oral) video: https://youtu.be/fjRnFL91EZc project page: https://lukashoel.github.io/text-to-room/ code: https://github.com/lukasHoel/text2room

  38. arXiv:2303.11396  [pdf, other

    cs.CV

    Text2Tex: Text-driven Texture Synthesis via Diffusion Models

    Authors: Dave Zhenyu Chen, Yawar Siddiqui, Hsin-Ying Lee, Sergey Tulyakov, Matthias Nießner

    Abstract: We present Text2Tex, a novel method for generating high-quality textures for 3D meshes from the given text prompts. Our method incorporates inpainting into a pre-trained depth-aware image diffusion model to progressively synthesize high resolution partial textures from multiple viewpoints. To avoid accumulating inconsistent and stretched artifacts across views, we dynamically segment the rendered… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

    Comments: Project page: https://daveredrum.github.io/Text2Tex/

  39. arXiv:2302.14746  [pdf, other

    cs.CV

    Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors

    Authors: Ji Hou, Xiaoliang Dai, Zijian He, Angela Dai, Matthias Nießner

    Abstract: Current popular backbones in computer vision, such as Vision Transformers (ViT) and ResNets are trained to perceive the world from 2D images. However, to more effectively understand 3D structural priors in 2D backbones, we propose Mask3D to leverage existing large-scale RGB-D data in a self-supervised pre-training to embed these 3D priors into 2D learned feature representations. In contrast to tra… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

    Comments: accepted to CVPR2023

  40. arXiv:2302.03640  [pdf, other

    cs.CV

    SSR-2D: Semantic 3D Scene Reconstruction from 2D Images

    Authors: Junwen Huang, Alexey Artemov, Yu** Chen, Shuaifeng Zhi, Kai Xu, Matthias Nießner

    Abstract: Most deep learning approaches to comprehensive semantic modeling of 3D indoor spaces require costly dense annotations in the 3D domain. In this work, we explore a central 3D scene modeling task, namely, semantic scene reconstruction without using any 3D annotations. The key idea of our approach is to design a trainable model that employs both incomplete 3D reconstructions and their corresponding s… ▽ More

    Submitted 5 June, 2024; v1 submitted 7 February, 2023; originally announced February 2023.

  41. arXiv:2301.11445  [pdf, other

    cs.CV cs.GR

    3DShape2VecSet: A 3D Shape Representation for Neural Fields and Generative Diffusion Models

    Authors: Biao Zhang, Jiapeng Tang, Matthias Niessner, Peter Wonka

    Abstract: We introduce 3DShape2VecSet, a novel shape representation for neural fields designed for generative diffusion models. Our shape representation can encode 3D shapes given as surface models or point clouds, and represents them as neural fields. The concept of neural fields has previously been combined with a global latent vector, a regular grid of latent vectors, or an irregular grid of latent vecto… ▽ More

    Submitted 1 May, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: Accepted by SIGGRAPH 2023 (Journal Track), Project website: https://1zb.github.io/3DShape2VecSet/, Project demo: https://youtu.be/KKQsQccpBFk

  42. arXiv:2212.09802  [pdf, other

    cs.CV cs.LG

    Panoptic Lifting for 3D Scene Understanding with Neural Fields

    Authors: Yawar Siddiqui, Lorenzo Porzi, Samuel Rota Buló, Norman Müller, Matthias Nießner, Angela Dai, Peter Kontschieder

    Abstract: We propose Panoptic Lifting, a novel approach for learning panoptic 3D volumetric representations from images of in-the-wild scenes. Once trained, our model can render color images together with 3D-consistent panoptic segmentation from novel viewpoints. Unlike existing approaches which use 3D input directly or indirectly, our method requires only machine-generated 2D panoptic segmentation masks… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

    Comments: Project Page: https://nihalsid.github.io/panoptic-lifting/, Video: https://youtu.be/QtsiL-6rSuM

  43. SPARF: Large-Scale Learning of 3D Sparse Radiance Fields from Few Input Images

    Authors: Abdullah Hamdi, Bernard Ghanem, Matthias Nießner

    Abstract: Recent advances in Neural Radiance Fields (NeRFs) treat the problem of novel view synthesis as Sparse Radiance Field (SRF) optimization using sparse voxels for efficient and fast rendering (plenoxels,InstantNGP). In order to leverage machine learning and adoption of SRFs as a 3D representation, we present SPARF, a large-scale ShapeNet-based synthetic dataset for novel view synthesis consisting of… ▽ More

    Submitted 21 August, 2023; v1 submitted 18 December, 2022; originally announced December 2022.

    Comments: published at ICCV 2023 workshop proceedings

    MSC Class: 68T45

  44. arXiv:2212.02761  [pdf, other

    cs.CV

    Learning Neural Parametric Head Models

    Authors: Simon Giebenhain, Tobias Kirschstein, Markos Georgopoulos, Martin Rünz, Lourdes Agapito, Matthias Nießner

    Abstract: We propose a novel 3D morphable model for complete human heads based on hybrid neural fields. At the core of our model lies a neural parametric representation that disentangles identity and expressions in disjoint latent spaces. To this end, we capture a person's identity in a canonical space as a signed distance field (SDF), and model facial expressions with a neural deformation field. In additio… ▽ More

    Submitted 14 April, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

    Comments: Project Page: https://simongiebenhain.github.io/NPHM ; Project Video: https://www.youtube.com/watch?v=0mDk2tFOJCg ; Camer-Ready Version; Added Experiments

  45. arXiv:2212.01985  [pdf, other

    cs.CV cs.LG

    ObjectMatch: Robust Registration using Canonical Object Correspondences

    Authors: Can Gümeli, Angela Dai, Matthias Nießner

    Abstract: We present ObjectMatch, a semantic and object-centric camera pose estimator for RGB-D SLAM pipelines. Modern camera pose estimators rely on direct correspondences of overlap** regions between frames; however, they cannot align camera frames with little or no overlap. In this work, we propose to leverage indirect correspondences obtained via semantic object identification. For instance, when an o… ▽ More

    Submitted 24 March, 2023; v1 submitted 4 December, 2022; originally announced December 2022.

    Comments: Project Page: http://cangumeli.github.io/ObjectMatch Video: https://www.youtube.com/watch?v=kuXoKVrzURk

  46. ClipFace: Text-guided Editing of Textured 3D Morphable Models

    Authors: Shivangi Aneja, Justus Thies, Angela Dai, Matthias Nießner

    Abstract: We propose ClipFace, a novel self-supervised approach for text-guided editing of textured 3D morphable model of faces. Specifically, we employ user-friendly language prompts to enable control of the expressions as well as appearance of 3D faces. We leverage the geometric expressiveness of 3D morphable models, which inherently possess limited controllability and texture expressivity, and develop a… ▽ More

    Submitted 24 April, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

    Comments: Paper Video: https://youtu.be/toGOQqFuNmA Project website: https://shivangi-aneja.github.io/projects/clipface/

    Journal ref: SIGGRAPH 2023

  47. arXiv:2212.01206  [pdf, other

    cs.CV

    DiffRF: Rendering-Guided 3D Radiance Field Diffusion

    Authors: Norman Müller, Yawar Siddiqui, Lorenzo Porzi, Samuel Rota Bulò, Peter Kontschieder, Matthias Nießner

    Abstract: We introduce DiffRF, a novel approach for 3D radiance field synthesis based on denoising diffusion probabilistic models. While existing diffusion-based methods operate on images, latent codes, or point cloud data, we are the first to directly generate volumetric radiance fields. To this end, we propose a 3D denoising model which directly operates on an explicit voxel grid representation. However,… ▽ More

    Submitted 27 March, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

    Comments: Project page: https://sirwyver.github.io/DiffRF/ Video: https://youtu.be/qETBcLu8SUk - CVPR 2023 Highlight - updated evaluations after fixing initial data map** error on all methods

  48. arXiv:2212.01160  [pdf, other

    cs.CV cs.GR

    High-Res Facial Appearance Capture from Polarized Smartphone Images

    Authors: Dejan Azinović, Olivier Maury, Christophe Hery, Mathias Nießner, Justus Thies

    Abstract: We propose a novel method for high-quality facial texture reconstruction from RGB images using a novel capturing routine based on a single smartphone which we equip with an inexpensive polarization foil. Specifically, we turn the flashlight into a polarized light source and add a polarization filter on top of the camera. Leveraging this setup, we capture the face of a subject with cross-polarized… ▽ More

    Submitted 15 March, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

    Comments: Project page: https://dazinovic.github.io/polface/ Video: https://www.youtube.com/watch?v=jnb4V0qURtc

  49. arXiv:2212.00836  [pdf, other

    cs.CV

    UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding

    Authors: Dave Zhenyu Chen, Ronghang Hu, Xinlei Chen, Matthias Nießner, Angel X. Chang

    Abstract: Performing 3D dense captioning and visual grounding requires a common and shared understanding of the underlying multimodal relationships. However, despite some previous attempts on connecting these two related tasks with highly task-specific neural modules, it remains understudied how to explicitly depict their shared nature to learn them simultaneously. In this work, we propose UniT3D, a simple… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

  50. arXiv:2211.14249  [pdf, other

    cs.CV

    Neural Poisson: Indicator Functions for Neural Fields

    Authors: Angela Dai, Matthias Nießner

    Abstract: Implicit neural field generating signed distance field representations (SDFs) of 3D shapes have shown remarkable progress in 3D shape reconstruction and generation. We introduce a new paradigm for neural field representations of 3D scenes; rather than characterizing surfaces as SDFs, we propose a Poisson-inspired characterization for surfaces as indicator functions optimized by neural fields. Cruc… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Comments: Video: https://youtu.be/swVsWp1-00c