Skip to main content

Showing 1–50 of 99 results for author: Wetzstein, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19126  [pdf, other

    physics.optics cs.AI

    Super-resolution imaging using super-oscillatory diffractive neural networks

    Authors: Hang Chen, Sheng Gao, Zejia Zhao, Zhengyang Duan, Haiou Zhang, Gordon Wetzstein, Xing Lin

    Abstract: Optical super-oscillation enables far-field super-resolution imaging beyond diffraction limits. However, the existing super-oscillatory lens for the spatial super-resolution imaging system still confronts critical limitations in performance due to the lack of a more advanced design method and the limited design degree of freedom. Here, we propose an optical super-oscillatory diffractive neural net… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 18 pages, 7 figures, 1 table

  2. arXiv:2406.18717  [pdf, other

    cs.CV

    Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos

    Authors: Colton Stearns, Adam Harley, Mikaela Uy, Florian Dubost, Federico Tombari, Gordon Wetzstein, Leonidas Guibas

    Abstract: Gaussian splatting has become a popular representation for novel-view synthesis, exhibiting clear strengths in efficiency, photometric quality, and compositional edibility. Following its success, many works have extended Gaussians to 4D, showing that dynamic Gaussians maintain these benefits while also tracking scene geometry far better than alternative representations. Yet, these methods assume d… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.11819  [pdf, other

    cs.CV

    MegaScenes: Scene-Level View Synthesis at Scale

    Authors: Joseph Tung, Gene Chou, Ruo** Cai, Guandao Yang, Kai Zhang, Gordon Wetzstein, Bharath Hariharan, Noah Snavely

    Abstract: Scene-level novel view synthesis (NVS) is fundamental to many vision and graphics applications. Recently, pose-conditioned diffusion models have led to significant progress by extracting 3D information from 2D foundation models, but these methods are limited by the lack of scene-level training data. Common dataset choices either consist of isolated objects (Objaverse), or of object-centric scenes… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Our project page is at https://megascenes.github.io

  4. arXiv:2406.10454  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    HumanPlus: Humanoid Shadowing and Imitation from Humans

    Authors: Zipeng Fu, Qingqing Zhao, Qi Wu, Gordon Wetzstein, Chelsea Finn

    Abstract: One of the key arguments for building robots that have similar form factors to human beings is that we can leverage the massive human data for training. Yet, doing so has remained challenging in practice due to the complexities in humanoid perception and control, lingering physical gaps between humanoids and humans in morphologies and actuation, and lack of a data pipeline for humanoids to learn a… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: project website: https://humanoid-ai.github.io/

  5. arXiv:2406.09413  [pdf, other

    cs.CV cs.GR cs.LG

    Interpreting the Weight Space of Customized Diffusion Models

    Authors: Amil Dravid, Yossi Gandelsman, Kuan-Chieh Wang, Rameen Abdal, Gordon Wetzstein, Alexei A. Efros, Kfir Aberman

    Abstract: We investigate the space of weights spanned by a large collection of customized diffusion models. We populate this space by creating a dataset of over 60,000 models, each of which is a base model fine-tuned to insert a different person's visual identity. We model the underlying manifold of these weights as a subspace, which we term weights2weights. We demonstrate three immediate applications of th… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project Page: https://snap-research.github.io/weights2weights

  6. arXiv:2406.04239  [pdf, other

    cs.LG

    Solving Inverse Problems in Protein Space Using Diffusion-Based Priors

    Authors: Axel Levy, Eric R. Chan, Sara Fridovich-Keil, Frédéric Poitevin, Ellen D. Zhong, Gordon Wetzstein

    Abstract: The interaction of a protein with its environment can be understood and controlled via its 3D structure. Experimental methods for protein structure determination, such as X-ray crystallography or cryogenic electron microscopy, shed light on biological processes but introduce challenging inverse problems. Learning-based approaches have emerged as accurate and efficient methods to solve these invers… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  7. arXiv:2405.18424  [pdf, other

    cs.CV

    3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting

    Authors: Qihang Zhang, Yinghao Xu, Chaoyang Wang, Hsin-Ying Lee, Gordon Wetzstein, Bolei Zhou, Ceyuan Yang

    Abstract: Scene image editing is crucial for entertainment, photography, and advertising design. Existing methods solely focus on either 2D individual object or 3D global scene editing. This results in a lack of a unified approach to effectively control and manipulate scenes at the 3D level with different levels of granularity. In this work, we propose 3DitScene, a novel and unified scene editing framework… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  8. arXiv:2405.17531  [pdf, other

    cs.CV

    Evolutive Rendering Models

    Authors: Fangneng Zhan, Hanxue Liang, Yifan Wang, Michael Niemeyer, Michael Oechsle, Adam Kortylewski, Cengiz Oztireli, Gordon Wetzstein, Christian Theobalt

    Abstract: The landscape of computer graphics has undergone significant transformations with the recent advances of differentiable rendering models. These rendering models often rely on heuristic designs that may not fully align with the final rendering objectives. We address this gap by pioneering \textit{evolutive rendering models}, a methodology where rendering models possess the ability to evolve and ada… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Project page: https://fnzhan.com/Evolutive-Rendering-Models/

  9. arXiv:2405.17414  [pdf, other

    cs.CV cs.GR

    Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

    Authors: Zhengfei Kuang, Shengqu Cai, Hao He, Yinghao Xu, Hongsheng Li, Leonidas Guibas, Gordon Wetzstein

    Abstract: Research on video generation has recently made tremendous progress, enabling high-quality videos to be generated from text prompts or images. Adding control to the video generation process is an important goal moving forward and recent approaches that condition video generation models on camera trajectories make strides towards it. Yet, it remains challenging to generate a video of the same scene… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  10. arXiv:2404.16829  [pdf, other

    cs.CV cs.AI cs.CL

    Make-it-Real: Unleashing Large Multimodal Model for Painting 3D Objects with Realistic Materials

    Authors: Ye Fang, Zeyi Sun, Tong Wu, Jiaqi Wang, Ziwei Liu, Gordon Wetzstein, Dahua Lin

    Abstract: Physically realistic materials are pivotal in augmenting the realism of 3D assets across various applications and lighting conditions. However, existing 3D assets and generative models often lack authentic material properties. Manual assignment of materials using graphic software is a tedious and time-consuming task. In this paper, we exploit advancements in Multimodal Large Language Models (MLLMs… ▽ More

    Submitted 23 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Project Page: https://sunzey.github.io/Make-it-Real/

  11. arXiv:2404.11810  [pdf, other

    cs.GR

    Holographic Parallax Improves 3D Perceptual Realism

    Authors: Dongyeon Kim, Seung-Woo Nam, Suyeon Choi, Jong-Mo Seo, Gordon Wetzstein, Yoonchan Jeong

    Abstract: Holographic near-eye displays are a promising technology to solve long-standing challenges in virtual and augmented reality display systems. Over the last few years, many different computer-generated holography (CGH) algorithms have been proposed that are supervised by different types of target content, such as 2.5D RGB-depth maps, 3D focal stacks, and 4D light fields. It is unclear, however, what… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 33 pages, 34 figures

  12. arXiv:2404.06493  [pdf, other

    cs.CV eess.IV

    Flying with Photons: Rendering Novel Views of Propagating Light

    Authors: Anagh Malik, Noah Juravsky, Ryan Po, Gordon Wetzstein, Kiriakos N. Kutulakos, David B. Lindell

    Abstract: We present an imaging and neural rendering technique that seeks to synthesize videos of light propagating through a scene from novel, moving camera viewpoints. Our approach relies on a new ultrafast imaging setup to capture a first-of-its kind, multi-viewpoint video dataset with picosecond-level temporal resolution. Combined with this dataset, we introduce an efficient neural volume rendering fram… ▽ More

    Submitted 9 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: Project page: https://anaghmalik.com/FlyingWithPhotons/

  13. arXiv:2404.04421  [pdf, other

    cs.GR cs.CV

    PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

    Authors: Yang Zheng, Qingqing Zhao, Guandao Yang, Wang Yifan, Donglai Xiang, Florian Dubost, Dmitry Lagun, Thabo Beeler, Federico Tombari, Leonidas Guibas, Gordon Wetzstein

    Abstract: Modeling and rendering photorealistic avatars is of crucial importance in many applications. Existing methods that build a 3D avatar from visual observations, however, struggle to reconstruct clothed humans. We introduce PhysAvatar, a novel framework that combines inverse rendering with inverse physics to automatically estimate the shape and appearance of a human from multi-view video data along w… ▽ More

    Submitted 9 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: Project Page: https://qingqing-zhao.github.io/PhysAvatar

  14. arXiv:2404.02101  [pdf, other

    cs.CV

    CameraCtrl: Enabling Camera Control for Text-to-Video Generation

    Authors: Hao He, Yinghao Xu, Yuwei Guo, Gordon Wetzstein, Bo Dai, Hongsheng Li, Ceyuan Yang

    Abstract: Controllability plays a crucial role in video generation since it allows users to create desired content. However, existing models largely overlooked the precise control of camera pose that serves as a cinematic language to express deeper narrative nuances. To alleviate this issue, we introduce CameraCtrl, enabling accurate camera pose control for text-to-video(T2V) models. After precisely paramet… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Project page: https://hehao13.github.io/projects-CameraCtrl/ Code: https://github.com/hehao13/CameraCtrl

  15. arXiv:2403.17920  [pdf, other

    cs.CV

    TC4D: Trajectory-Conditioned Text-to-4D Generation

    Authors: Sherwin Bahmani, Xian Liu, Yifan Wang, Ivan Skorokhodov, Victor Rong, Ziwei Liu, Xihui Liu, Jeong Joon Park, Sergey Tulyakov, Gordon Wetzstein, Andrea Tagliasacchi, David B. Lindell

    Abstract: Recent techniques for text-to-4D generation synthesize dynamic 3D scenes using supervision from pre-trained text-to-video models. However, existing representations for motion, such as deformation models or time-dependent neural representations, are limited in the amount of motion they can generate-they cannot synthesize motion extending far beyond the bounding box used for volume rendering. The la… ▽ More

    Submitted 10 April, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Project Page: https://sherwinbahmani.github.io/tc4d

  16. arXiv:2403.14621  [pdf, other

    cs.CV

    GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

    Authors: Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, Gordon Wetzstein

    Abstract: We introduce GRM, a large-scale reconstructor capable of recovering a 3D asset from sparse-view images in around 0.1s. GRM is a feed-forward transformer-based model that efficiently incorporates multi-view information to translate the input pixels into pixel-aligned Gaussians, which are unprojected to create a set of densely distributed 3D Gaussians representing a scene. Together, our transformer… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Project page: https://justimyhxu.github.io/projects/grm/ Code: https://github.com/justimyhxu/GRM

  17. arXiv:2403.12032  [pdf, other

    cs.CV cs.GR

    Generic 3D Diffusion Adapter Using Controlled Multi-View Editing

    Authors: Hansheng Chen, Ruoxi Shi, Yulin Liu, Bokui Shen, Jiayuan Gu, Gordon Wetzstein, Hao Su, Leonidas Guibas

    Abstract: Open-domain 3D object synthesis has been lagging behind image synthesis due to limited data and higher computational complexity. To bridge this gap, recent works have investigated multi-view diffusion but often fall short in either 3D consistency, visual quality, or efficiency. This paper proposes MVEdit, which functions as a 3D counterpart of SDEdit, employing ancestral sampling to jointly denois… ▽ More

    Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: V2 note: Fix missing acknowledgements. Project page: https://lakonik.github.io/mvedit

  18. arXiv:2402.14000  [pdf, other

    cs.CV

    Real-time 3D-aware Portrait Editing from a Single Image

    Authors: Qingyan Bai, Zifan Shi, Yinghao Xu, Hao Ouyang, Qiuyu Wang, Ceyuan Yang, Xuan Wang, Gordon Wetzstein, Yujun Shen, Qifeng Chen

    Abstract: This work presents 3DPE, a practical method that can efficiently edit a face image following given prompts, like reference images or text descriptions, in a 3D-aware manner. To this end, a lightweight module is distilled from a 3D portrait generator and a text-to-image model, which provide prior knowledge of face geometry and superior editing capability, respectively. Such a design brings two comp… ▽ More

    Submitted 2 April, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  19. arXiv:2401.17217  [pdf, other

    cs.HC cs.CV

    GazeGPT: Augmenting Human Capabilities using Gaze-contingent Contextual AI for Smart Eyewear

    Authors: Robert Konrad, Nitish Padmanaban, J. Gabriel Buckmaster, Kevin C. Boyle, Gordon Wetzstein

    Abstract: Multimodal large language models (LMMs) excel in world knowledge and problem-solving abilities. Through the use of a world-facing camera and contextual AI, emerging smart accessories aim to provide a seamless interface between humans and LMMs. Yet, these wearable computing systems lack an understanding of the user's attention. We introduce GazeGPT as a new user interaction paradigm for contextual… ▽ More

    Submitted 31 January, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: Project video: https://youtu.be/AuDFHHTK_m8

  20. arXiv:2401.04092  [pdf, other

    cs.CV

    GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

    Authors: Tong Wu, Guandao Yang, Zhibing Li, Kai Zhang, Ziwei Liu, Leonidas Guibas, Dahua Lin, Gordon Wetzstein

    Abstract: Despite recent advances in text-to-3D generative methods, there is a notable absence of reliable evaluation metrics. Existing metrics usually focus on a single criterion each, such as how well the asset aligned with the input text. These metrics lack the flexibility to generalize to different evaluation criteria and might not align well with human preferences. Conducting user preference studies is… ▽ More

    Submitted 9 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: Project page: https://gpteval3d.github.io/ ; Code: https://github.com/3DTopia/GPTEval3D

  21. arXiv:2312.14432  [pdf, other

    cs.CV cs.LG q-bio.BM

    Scalable 3D Reconstruction From Single Particle X-Ray Diffraction Images Based on Online Machine Learning

    Authors: Jay Shenoy, Axel Levy, Frédéric Poitevin, Gordon Wetzstein

    Abstract: X-ray free-electron lasers (XFELs) offer unique capabilities for measuring the structure and dynamics of biomolecules, hel** us understand the basic building blocks of life. Notably, high-repetition-rate XFELs enable single particle imaging (X-ray SPI) where individual, weakly scattering biomolecules are imaged under near-physiological conditions with the opportunity to access fleeting states th… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Project page: http://jayshenoy.com/xrai

  22. arXiv:2312.02432  [pdf, other

    cs.CV

    Orthogonal Adaptation for Modular Customization of Diffusion Models

    Authors: Ryan Po, Guandao Yang, Kfir Aberman, Gordon Wetzstein

    Abstract: Customization techniques for text-to-image models have paved the way for a wide range of previously unattainable applications, enabling the generation of specific concepts across diverse contexts and styles. While existing methods facilitate high-fidelity customization for individual concepts or a limited, pre-defined set of them, they fall short of achieving scalability, where a single model can… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Project page: https://ryanpo.com/ortha/

  23. arXiv:2312.01409  [pdf, other

    cs.CV cs.AI cs.GR

    Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

    Authors: Shengqu Cai, Duygu Ceylan, Matheus Gadelha, Chun-Hao Paul Huang, Tuanfeng Yang Wang, Gordon Wetzstein

    Abstract: Traditional 3D content creation tools empower users to bring their imagination to life by giving them direct control over a scene's geometry, appearance, motion, and camera path. Creating computer-generated videos, however, is a tedious manual process, which can be automated by emerging text-to-video diffusion models. Despite great promise, video diffusion models are difficult to control, hinderin… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: Project page: https://primecai.github.io/generative_rendering/

  24. arXiv:2311.17984  [pdf, other

    cs.CV

    4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling

    Authors: Sherwin Bahmani, Ivan Skorokhodov, Victor Rong, Gordon Wetzstein, Leonidas Guibas, Peter Wonka, Sergey Tulyakov, Jeong Joon Park, Andrea Tagliasacchi, David B. Lindell

    Abstract: Recent breakthroughs in text-to-4D generation rely on pre-trained text-to-image and text-to-video models to generate dynamic 3D scenes. However, current text-to-4D methods face a three-way tradeoff between the quality of scene appearance, 3D structure, and motion. For example, text-to-image models and their 3D-aware variants are trained on internet-scale image datasets and can be used to produce s… ▽ More

    Submitted 26 May, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: CVPR 2024; Project page: https://sherwinbahmani.github.io/4dfy

  25. arXiv:2311.17857  [pdf, other

    cs.CV cs.GR

    Gaussian Shell Maps for Efficient 3D Human Generation

    Authors: Rameen Abdal, Wang Yifan, Zifan Shi, Yinghao Xu, Ryan Po, Zhengfei Kuang, Qifeng Chen, Dit-Yan Yeung, Gordon Wetzstein

    Abstract: Efficient generation of 3D digital humans is important in several industries, including virtual reality, social media, and cinematic production. 3D generative adversarial networks (GANs) have demonstrated state-of-the-art (SOTA) quality and diversity for generated assets. Current 3D GAN architectures, however, typically rely on volume representations, which are slow to render, thereby hampering th… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: Project page : https://rameenabdal.github.io/GaussianShellMaps/

  26. arXiv:2311.13177  [pdf, other

    physics.med-ph cs.CV

    Volumetric Reconstruction Resolves Off-Resonance Artifacts in Static and Dynamic PROPELLER MRI

    Authors: Annesha Ghosh, Gordon Wetzstein, Mert Pilanci, Sara Fridovich-Keil

    Abstract: Off-resonance artifacts in magnetic resonance imaging (MRI) are visual distortions that occur when the actual resonant frequencies of spins within the imaging volume differ from the expected frequencies used to encode spatial information. These discrepancies can be caused by a variety of factors, including magnetic field inhomogeneities, chemical shifts, or susceptibility differences within the ti… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: Code is available at https://github.com/sarafridov/volumetric-propeller

  27. arXiv:2311.09217  [pdf, other

    cs.CV

    DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

    Authors: Yinghao Xu, Hao Tan, Fujun Luan, Sai Bi, Peng Wang, Jiahao Li, Zifan Shi, Kalyan Sunkavalli, Gordon Wetzstein, Zexiang Xu, Kai Zhang

    Abstract: We propose \textbf{DMV3D}, a novel 3D generation approach that uses a transformer-based 3D large reconstruction model to denoise multi-view diffusion. Our reconstruction model incorporates a triplane NeRF representation and can denoise noisy multi-view images via NeRF reconstruction and rendering, achieving single-stage 3D generation in $\sim$30s on single A100 GPU. We train \textbf{DMV3D} on larg… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: Project Page: https://justimyhxu.github.io/projects/dmv3d/

  28. arXiv:2310.20249  [pdf, other

    cs.CV cs.GR cs.LG

    Pose-to-Motion: Cross-Domain Motion Retargeting with Pose Prior

    Authors: Qingqing Zhao, Peizhuo Li, Wang Yifan, Olga Sorkine-Hornung, Gordon Wetzstein

    Abstract: Creating believable motions for various characters has long been a goal in computer graphics. Current learning-based motion synthesis methods depend on extensive motion datasets, which are often challenging, if not impossible, to obtain. On the other hand, pose data is more accessible, since static posed characters are easier to create and can even be extracted from images using recent advancement… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: Project page: https://cyanzhao42.github.io/pose2motion

  29. arXiv:2310.07204  [pdf, other

    cs.AI cs.CV cs.GR cs.LG

    State of the Art on Diffusion Models for Visual Computing

    Authors: Ryan Po, Wang Yifan, Vladislav Golyanik, Kfir Aberman, Jonathan T. Barron, Amit H. Bermano, Eric Ryan Chan, Tali Dekel, Aleksander Holynski, Angjoo Kanazawa, C. Karen Liu, Lingjie Liu, Ben Mildenhall, Matthias Nießner, Björn Ommer, Christian Theobalt, Peter Wonka, Gordon Wetzstein

    Abstract: The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. In these domains, diffusion models are the generative AI architecture of choice. Within the last year alone, the literature on diffusion-based tools and applicat… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  30. arXiv:2310.03956  [pdf, other

    cs.CV math.OC physics.med-ph

    Gradient Descent Provably Solves Nonlinear Tomographic Reconstruction

    Authors: Sara Fridovich-Keil, Fabrizio Valdivia, Gordon Wetzstein, Benjamin Recht, Mahdi Soltanolkotabi

    Abstract: In computed tomography (CT), the forward model consists of a linear Radon transform followed by an exponential nonlinearity based on the attenuation of light according to the Beer-Lambert Law. Conventional reconstruction often involves inverting this nonlinearity as a preprocessing step and then solving a convex inverse problem. However, this nonlinear measurement preprocessing required to use the… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  31. arXiv:2309.01811  [pdf, other

    cs.CV

    Instant Continual Learning of Neural Radiance Fields

    Authors: Ryan Po, Zhengyang Dong, Alexander W. Bergman, Gordon Wetzstein

    Abstract: Neural radiance fields (NeRFs) have emerged as an effective method for novel-view synthesis and 3D scene reconstruction. However, conventional training methods require access to all training views during scene optimization. This assumption may be prohibitive in continual learning scenarios, where new data is acquired in a sequential manner and a continuous update of the NeRF is desired, as in auto… ▽ More

    Submitted 5 September, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

    Comments: For project page please visit https://ryanpo.com/icngp/

  32. arXiv:2307.15055  [pdf, other

    cs.CV

    PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking

    Authors: Yang Zheng, Adam W. Harley, Bokui Shen, Gordon Wetzstein, Leonidas J. Guibas

    Abstract: We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to m… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

  33. arXiv:2307.05462  [pdf, other

    cs.CV

    Efficient 3D Articulated Human Generation with Layered Surface Volumes

    Authors: Yinghao Xu, Wang Yifan, Alexander W. Bergman, Menglei Chai, Bolei Zhou, Gordon Wetzstein

    Abstract: Access to high-quality and diverse 3D articulated digital human assets is crucial in various applications, ranging from virtual reality to social platforms. Generative approaches, such as 3D generative adversarial networks (GANs), are rapidly replacing laborious manual content creation tools. However, existing 3D GAN frameworks typically rely on scene representations that leverage either template… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

    Comments: Project page: https://www.computationalimaging.org/publications/lsv/ Demo: https://www.youtube.com/watch?v=vahgMFCM3j4

  34. arXiv:2307.04859  [pdf, other

    cs.CV cs.GR cs.LG

    Articulated 3D Head Avatar Generation using Text-to-Image Diffusion Models

    Authors: Alexander W. Bergman, Wang Yifan, Gordon Wetzstein

    Abstract: The ability to generate diverse 3D articulated head avatars is vital to a plethora of applications, including augmented reality, cinematography, and education. Recent work on text-guided 3D object generation has shown great promise in addressing these needs. These methods directly leverage pre-trained 2D text-to-image diffusion models to generate 3D-multi-view-consistent radiance fields of generic… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: Project website: http://www.computationalimaging.org/publications/articulated-diffusion/

  35. Single-Shot Implicit Morphable Faces with Consistent Texture Parameterization

    Authors: Connor Z. Lin, Koki Nagano, Jan Kautz, Eric R. Chan, Umar Iqbal, Leonidas Guibas, Gordon Wetzstein, Sameh Khamis

    Abstract: There is a growing demand for the accessible creation of high-quality 3D avatars that are animatable and customizable. Although 3D morphable models provide intuitive control for editing and animation, and robustness for single-view face reconstruction, they cannot easily capture geometric and appearance details. Methods based on neural implicit representations, such as signed distance functions (S… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: SIGGRAPH 2023, Project Page: https://research.nvidia.com/labs/toronto-ai/ssif

  36. arXiv:2305.01122  [pdf, other

    cs.LG cs.CE

    Learning Controllable Adaptive Simulation for Multi-resolution Physics

    Authors: Tailin Wu, Takashi Maruyama, Qingqing Zhao, Gordon Wetzstein, Jure Leskovec

    Abstract: Simulating the time evolution of physical systems is pivotal in many scientific and engineering problems. An open challenge in simulating such systems is their multi-resolution dynamics: a small fraction of the system is extremely dynamic, and requires very fine-grained resolution, while a majority of the system is changing slowly and can be modeled by coarser spatial scales. Typical learning-base… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Comments: ICLR 2023, notable top-25% (spotlight), 19 pages, 9 figures

  37. arXiv:2304.13153  [pdf, other

    cs.CV cs.GR cs.LG

    LumiGAN: Unconditional Generation of Relightable 3D Human Faces

    Authors: Boyang Deng, Yifan Wang, Gordon Wetzstein

    Abstract: Unsupervised learning of 3D human faces from unstructured 2D image data is an active research area. While recent works have achieved an impressive level of photorealism, they commonly lack control of lighting, which prevents the generated assets from being deployed in novel environments. To this end, we introduce LumiGAN, an unconditional Generative Adversarial Network (GAN) for 3D human faces wit… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: Project page: https://boyangdeng.com/projects/lumigan

  38. arXiv:2304.05440  [pdf, other

    cs.CV

    PixelRNN: In-pixel Recurrent Neural Networks for End-to-end-optimized Perception with Neural Sensors

    Authors: Haley M. So, Laurie Bose, Piotr Dudek, Gordon Wetzstein

    Abstract: Conventional image sensors digitize high-resolution images at fast frame rates, producing a large amount of data that needs to be transmitted off the sensor for further processing. This is challenging for perception systems operating on edge devices, because communication is power inefficient and induces latency. Fueled by innovations in stacked image sensor fabrication, emerging sensor-processors… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  39. arXiv:2304.02602  [pdf, other

    cs.CV cs.AI cs.GR

    Generative Novel View Synthesis with 3D-Aware Diffusion Models

    Authors: Eric R. Chan, Koki Nagano, Matthew A. Chan, Alexander W. Bergman, Jeong Joon Park, Axel Levy, Miika Aittala, Shalini De Mello, Tero Karras, Gordon Wetzstein

    Abstract: We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image. Our model samples from the distribution of possible renderings consistent with the input and, even in the presence of ambiguity, is capable of rendering diverse and plausible novel views. To achieve this, our method makes use of existing 2D diffusion backbones but, crucially, incorp… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

    Comments: Project page: https://nvlabs.github.io/genvs

  40. arXiv:2303.12218  [pdf, other

    cs.CV

    Compositional 3D Scene Generation using Locally Conditioned Diffusion

    Authors: Ryan Po, Gordon Wetzstein

    Abstract: Designing complex 3D scenes has been a tedious, manual process requiring domain expertise. Emerging text-to-3D generative models show great promise for making this task more intuitive, but existing approaches are limited to object-level generation. We introduce \textbf{locally conditioned diffusion} as an approach to compositional scene diffusion, providing control over semantic parts using text p… ▽ More

    Submitted 22 March, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: For project page, see https://ryanpo.com/comp3d/

  41. arXiv:2303.12074  [pdf, other

    cs.CV

    CC3D: Layout-Conditioned Generation of Compositional 3D Scenes

    Authors: Sherwin Bahmani, Jeong Joon Park, Despoina Paschalidou, Xingguang Yan, Gordon Wetzstein, Leonidas Guibas, Andrea Tagliasacchi

    Abstract: In this work, we introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts, trained using single-view images. Different from most existing 3D GANs that limit their applicability to aligned single objects, we focus on generating complex scenes with multiple objects, by modeling the compositional nature of 3D scenes. By devising a 2D l… ▽ More

    Submitted 8 September, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: ICCV 2023; Webpage: https://sherwinbahmani.github.io/cc3d/

  42. arXiv:2303.11364  [pdf, other

    cs.CV

    DehazeNeRF: Multiple Image Haze Removal and 3D Shape Reconstruction using Neural Radiance Fields

    Authors: Wei-Ting Chen, Wang Yifan, Sy-Yen Kuo, Gordon Wetzstein

    Abstract: Neural radiance fields (NeRFs) have demonstrated state-of-the-art performance for 3D computer vision tasks, including novel view synthesis and 3D shape reconstruction. However, these methods fail in adverse weather conditions. To address this challenge, we introduce DehazeNeRF as a framework that robustly operates in hazy conditions. DehazeNeRF extends the volume rendering equation by adding physi… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

    Comments: including supplemental material; project page: https://www.computationalimaging.org/publications/dehazenerf

  43. arXiv:2303.08096  [pdf, other

    cs.CV

    MELON: NeRF with Unposed Images in SO(3)

    Authors: Axel Levy, Mark Matthews, Matan Sela, Gordon Wetzstein, Dmitry Lagun

    Abstract: Neural radiance fields enable novel-view synthesis and scene reconstruction with photorealistic quality from a few images, but require known and accurate camera poses. Conventional pose estimation algorithms fail on smooth or self-similar scenes, while methods performing inverse rendering from unposed views require a rough initialization of the camera orientations. The main difficulty of pose esti… ▽ More

    Submitted 19 July, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

  44. arXiv:2303.04291  [pdf, other

    eess.IV cs.CV

    Diffusion in the Dark: A Diffusion Model for Low-Light Text Recognition

    Authors: Cindy M. Nguyen, Eric R. Chan, Alexander W. Bergman, Gordon Wetzstein

    Abstract: Capturing images is a key part of automation for high-level tasks such as scene text recognition. Low-light conditions pose a challenge for high-level perception stacks, which are often optimized on well-lit, artifact-free images. Reconstruction methods for low-light images can produce well-lit counterparts, but typically at the cost of high-frequency details critical for downstream tasks. We prop… ▽ More

    Submitted 30 October, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: WACV 2024. Project website: https://ccnguyen.github.io/diffusion-in-the-dark/

  45. arXiv:2302.01368  [pdf, other

    cs.HC cs.GR eess.IV

    Towards Attention-aware Foveated Rendering

    Authors: Brooke Krajancich, Petr Kellnhofer, Gordon Wetzstein

    Abstract: Foveated graphics is a promising approach to solving the bandwidth challenges of immersive virtual and augmented reality displays by exploiting the falloff in spatial acuity in the periphery of the visual field. However, the perceptual models used in these applications neglect the effects of higher-level cognitive processing, namely the allocation of visual attention, and are thus overestimating s… ▽ More

    Submitted 10 May, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: 10 pages, 6 figures

  46. arXiv:2212.10699  [pdf, other

    cs.CV cs.GR

    PaletteNeRF: Palette-based Appearance Editing of Neural Radiance Fields

    Authors: Zhengfei Kuang, Fujun Luan, Sai Bi, Zhixin Shu, Gordon Wetzstein, Kalyan Sunkavalli

    Abstract: Recent advances in neural radiance fields have enabled the high-fidelity 3D reconstruction of complex scenes for novel view synthesis. However, it remains underexplored how the appearance of such representations can be efficiently edited while maintaining photorealism. In this work, we present PaletteNeRF, a novel method for photorealistic appearance editing of neural radiance fields (NeRF) base… ▽ More

    Submitted 24 January, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  47. arXiv:2212.08377  [pdf, other

    cs.CV cs.GR

    PointAvatar: Deformable Point-based Head Avatars from Videos

    Authors: Yufeng Zheng, Wang Yifan, Gordon Wetzstein, Michael J. Black, Otmar Hilliges

    Abstract: The ability to create realistic, animatable and relightable head avatars from casual video sequences would open up wide ranging applications in communication and entertainment. Current methods either build on explicit 3D morphable meshes (3DMM) or exploit neural implicit representations. The former are limited by fixed topology, while the latter are non-trivial to deform and inefficient to render.… ▽ More

    Submitted 28 February, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: Project page: https://zhengyuf.github.io/PointAvatar/ Code base: https://github.com/zhengyuf/pointavatar

  48. arXiv:2212.04096  [pdf, other

    cs.CV

    ALTO: Alternating Latent Topologies for Implicit 3D Reconstruction

    Authors: Zhen Wang, Shijie Zhou, Jeong Joon Park, Despoina Paschalidou, Suya You, Gordon Wetzstein, Leonidas Guibas, Achuta Kadambi

    Abstract: This work introduces alternating latent topologies (ALTO) for high-fidelity reconstruction of implicit 3D surfaces from noisy point clouds. Previous work identifies that the spatial arrangement of latent encodings is important to recover detail. One school of thought is to encode a latent vector for each point (point latents). Another school of thought is to project point latents into a grid (grid… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

  49. arXiv:2211.17260  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    SinGRAF: Learning a 3D Generative Radiance Field for a Single Scene

    Authors: Minjung Son, Jeong Joon Park, Leonidas Guibas, Gordon Wetzstein

    Abstract: Generative models have shown great promise in synthesizing photorealistic 3D objects, but they require large amounts of training data. We introduce SinGRAF, a 3D-aware generative model that is trained with a few input images of a single scene. Once trained, SinGRAF generates different realizations of this 3D scene that preserve the appearance of the input while varying scene layout. For this purpo… ▽ More

    Submitted 2 April, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

    Comments: CVPR 2023. Project page: https://www.computationalimaging.org/publications/singraf/

  50. arXiv:2211.16677  [pdf, other

    cs.CV cs.AI cs.GR

    3D Neural Field Generation using Triplane Diffusion

    Authors: J. Ryan Shue, Eric Ryan Chan, Ryan Po, Zachary Ankner, Jiajun Wu, Gordon Wetzstein

    Abstract: Diffusion models have emerged as the state-of-the-art for image generation, among other tasks. Here, we present an efficient diffusion-based model for 3D-aware generation of neural fields. Our approach pre-processes training data, such as ShapeNet meshes, by converting them to continuous occupancy fields and factoring them into a set of axis-aligned triplane feature representations. Thus, our 3D t… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: Project page: https://jryanshue.com/nfd