Skip to main content

Showing 1–50 of 209 results for author: Guibas, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.20055  [pdf, other

    cs.CV cs.LG

    SpotlessSplats: Ignoring Distractors in 3D Gaussian Splatting

    Authors: Sara Sabour, Lily Goli, George Kopanas, Mark Matthews, Dmitry Lagun, Leonidas Guibas, Alec Jacobson, David J. Fleet, Andrea Tagliasacchi

    Abstract: 3D Gaussian Splatting (3DGS) is a promising technique for 3D reconstruction, offering efficient training and rendering speeds, making it suitable for real-time applications.However, current methods require highly controlled environments (no moving people or wind-blown elements, and consistent lighting) to meet the inter-view consistency assumption of 3DGS. This makes reconstruction of real-world c… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.18717  [pdf, other

    cs.CV

    Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos

    Authors: Colton Stearns, Adam Harley, Mikaela Uy, Florian Dubost, Federico Tombari, Gordon Wetzstein, Leonidas Guibas

    Abstract: Gaussian splatting has become a popular representation for novel-view synthesis, exhibiting clear strengths in efficiency, photometric quality, and compositional edibility. Following its success, many works have extended Gaussians to 4D, showing that dynamic Gaussians maintain these benefits while also tracking scene geometry far better than alternative representations. Yet, these methods assume d… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.05897  [pdf, other

    cs.CV

    InfoGaussian: Structure-Aware Dynamic Gaussians through Lightweight Information Sha**

    Authors: Yunchao Zhang, Guandao Yang, Leonidas Guibas, Yanchao Yang

    Abstract: 3D Gaussians, as a low-level scene representation, typically involve thousands to millions of Gaussians. This makes it difficult to control the scene in ways that reflect the underlying dynamic structure, where the number of independent entities is typically much smaller. In particular, it can be challenging to animate and move objects in the scene, which requires coordination among many Gaussians… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  4. arXiv:2405.19678  [pdf, other

    cs.CV cs.AI

    View-Consistent Hierarchical 3D SegmentationUsing Ultrametric Feature Fields

    Authors: Haodi He, Colton Stearns, Adam W. Harley, Leonidas J. Guibas

    Abstract: Large-scale vision foundation models such as Segment Anything (SAM) demonstrate impressive performance in zero-shot image segmentation at multiple levels of granularity. However, these zero-shot predictions are rarely 3D-consistent. As the camera viewpoint changes in a scene, so do the segmentation predictions, as well as the characterizations of ``coarse" or ``fine" granularity. In this work, we… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  5. arXiv:2405.17421  [pdf, other

    cs.CV cs.GR

    MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds

    Authors: Jiahui Lei, Yijia Weng, Adam Harley, Leonidas Guibas, Kostas Daniilidis

    Abstract: We introduce 4D Motion Scaffolds (MoSca), a neural information processing system designed to reconstruct and synthesize novel views of dynamic scenes from monocular videos captured casually in the wild. To address such a challenging and ill-posed inverse problem, we leverage prior knowledge from foundational vision models, lift the video data to a novel Motion Scaffold (MoSca) representation, whic… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: project page: https://www.cis.upenn.edu/~leijh/projects/mosca

  6. arXiv:2405.17414  [pdf, other

    cs.CV cs.GR

    Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

    Authors: Zhengfei Kuang, Shengqu Cai, Hao He, Yinghao Xu, Hongsheng Li, Leonidas Guibas, Gordon Wetzstein

    Abstract: Research on video generation has recently made tremendous progress, enabling high-quality videos to be generated from text prompts or images. Adding control to the video generation process is an important goal moving forward and recent approaches that condition video generation models on camera trajectories make strides towards it. Yet, it remains challenging to generate a video of the same scene… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  7. arXiv:2404.17672  [pdf, other

    cs.CV cs.GR

    BlenderAlchemy: Editing 3D Graphics with Vision-Language Models

    Authors: Ian Huang, Guandao Yang, Leonidas Guibas

    Abstract: Graphics design is important for various applications, including movie production and game design. To create a high-quality scene, designers usually need to spend hours in software like Blender, in which they might need to interleave and repeat operations, such as connecting material nodes, hundreds of times. Moreover, slightly different design goals may require completely different sequences, mak… ▽ More

    Submitted 22 May, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

  8. arXiv:2404.11987  [pdf, other

    cs.CV

    MultiPhys: Multi-Person Physics-aware 3D Motion Estimation

    Authors: Nicolas Ugrinovic, Boxiao Pan, Georgios Pavlakos, Despoina Paschalidou, Bokui Shen, Jordi Sanchez-Riera, Francesc Moreno-Noguer, Leonidas Guibas

    Abstract: We introduce MultiPhys, a method designed for recovering multi-person motion from monocular videos. Our focus lies in capturing coherent spatial placement between pairs of individuals across varying degrees of engagement. MultiPhys, being physically aware, exhibits robustness to jittering and occlusions, and effectively eliminates penetration issues between the two individuals. We devise a pipelin… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  9. arXiv:2404.08636  [pdf, other

    cs.CV

    Probing the 3D Awareness of Visual Foundation Models

    Authors: Mohamed El Banani, Amit Raj, Kevis-Kokitsi Maninis, Abhishek Kar, Yuanzhen Li, Michael Rubinstein, Deqing Sun, Leonidas Guibas, Justin Johnson, Varun Jampani

    Abstract: Recent advances in large-scale pretraining have yielded visual foundation models with strong capabilities. Not only can recent models generalize to arbitrary images for their training task, their intermediate representations are useful for other visual tasks such as detection and segmentation. Given that such models can classify, delineate, and localize objects in 2D, we ask whether they also repr… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024. Project page: https://github.com/mbanani/probe3d

  10. arXiv:2404.04421  [pdf, other

    cs.GR cs.CV

    PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

    Authors: Yang Zheng, Qingqing Zhao, Guandao Yang, Wang Yifan, Donglai Xiang, Florian Dubost, Dmitry Lagun, Thabo Beeler, Federico Tombari, Leonidas Guibas, Gordon Wetzstein

    Abstract: Modeling and rendering photorealistic avatars is of crucial importance in many applications. Existing methods that build a 3D avatar from visual observations, however, struggle to reconstruct clothed humans. We introduce PhysAvatar, a novel framework that combines inverse rendering with inverse physics to automatically estimate the shape and appearance of a human from multi-view video data along w… ▽ More

    Submitted 9 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: Project Page: https://qingqing-zhao.github.io/PhysAvatar

  11. arXiv:2404.01440  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects

    Authors: Yijia Weng, Bowen Wen, Jonathan Tremblay, Valts Blukis, Dieter Fox, Leonidas Guibas, Stan Birchfield

    Abstract: We address the problem of building digital twins of unknown articulated objects from two RGBD scans of the object at different articulation states. We decompose the problem into two stages, each addressing distinct aspects. Our method first reconstructs object-level shape at each state, then recovers the underlying articulation model including part segmentation and joint articulations that associa… ▽ More

    Submitted 6 June, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  12. arXiv:2403.12038  [pdf, other

    cs.CV

    Zero-Shot Image Feature Consensus with Deep Functional Maps

    Authors: Xinle Cheng, Congyue Deng, Adam Harley, Yixin Zhu, Leonidas Guibas

    Abstract: Correspondences emerge from large-scale vision models trained for generative and discriminative tasks. This has been revealed and benchmarked by computing correspondence maps between pairs of images, using nearest neighbors on the feature grids. Existing work has attempted to improve the quality of these correspondence maps by carefully mixing features from different sources, such as by combining… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  13. arXiv:2403.12032  [pdf, other

    cs.CV cs.GR

    Generic 3D Diffusion Adapter Using Controlled Multi-View Editing

    Authors: Hansheng Chen, Ruoxi Shi, Yulin Liu, Bokui Shen, Jiayuan Gu, Gordon Wetzstein, Hao Su, Leonidas Guibas

    Abstract: Open-domain 3D object synthesis has been lagging behind image synthesis due to limited data and higher computational complexity. To bridge this gap, recent works have investigated multi-view diffusion but often fall short in either 3D consistency, visual quality, or efficiency. This paper proposes MVEdit, which functions as a 3D counterpart of SDEdit, employing ancestral sampling to jointly denois… ▽ More

    Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: V2 note: Fix missing acknowledgements. Project page: https://lakonik.github.io/mvedit

  14. arXiv:2402.15321  [pdf, other

    cs.CV cs.AI cs.LG

    OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding

    Authors: Francis Engelmann, Ayca Takmaz, Jonas Schult, Elisabetta Fedele, Johanna Wald, Songyou Peng, Xi Wang, Or Litany, Siyu Tang, Federico Tombari, Marc Pollefeys, Leonidas Guibas, Hongbo Tian, Chunjie Wang, Xiaosheng Yan, Bingwen Wang, Xuanyang Zhang, Xiao Liu, Phuc Nguyen, Khoi Nguyen, Anh Tran, Cuong Pham, Zhening Huang, Xiaoyang Wu, Xi Chen , et al. (3 additional authors not shown)

    Abstract: This report provides an overview of the challenge hosted at the OpenSUN3D Workshop on Open-Vocabulary 3D Scene Understanding held in conjunction with ICCV 2023. The goal of this workshop series is to provide a platform for exploration and discussion of open-vocabulary 3D scene understanding tasks, including but not limited to segmentation, detection and map**. We provide an overview of the chall… ▽ More

    Submitted 17 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: Our OpenSUN3D workshop website for ICCV 2023: https://opensun3d.github.io/index_iccv23.html

  15. arXiv:2401.12168  [pdf, other

    cs.CV cs.CL cs.LG cs.RO

    SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

    Authors: Boyuan Chen, Zhuo Xu, Sean Kirmani, Brian Ichter, Danny Driess, Pete Florence, Dorsa Sadigh, Leonidas Guibas, Fei Xia

    Abstract: Understanding and reasoning about spatial relationships is a fundamental capability for Visual Question Answering (VQA) and robotics. While Vision Language Models (VLM) have demonstrated remarkable performance in certain VQA benchmarks, they still lack capabilities in 3D spatial reasoning, such as recognizing quantitative relationships of physical objects like distances or size differences. We hyp… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  16. arXiv:2401.10822  [pdf, other

    cs.CV

    ActAnywhere: Subject-Aware Video Background Generation

    Authors: Boxiao Pan, Zhan Xu, Chun-Hao Paul Huang, Krishna Kumar Singh, Yang Zhou, Leonidas J. Guibas, Jimei Yang

    Abstract: Generating video background that tailors to foreground subject motion is an important problem for the movie industry and visual effects community. This task involves synthesizing background that aligns with the motion and appearance of the foreground subject, while also complies with the artist's creative intention. We introduce ActAnywhere, a generative model that automates this process which tra… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  17. arXiv:2401.08140  [pdf, other

    cs.CV

    ProvNeRF: Modeling per Point Provenance in NeRFs as a Stochastic Process

    Authors: Kiyohiro Nakayama, Mikaela Angelina Uy, Yang You, Ke Li, Leonidas Guibas

    Abstract: Neural radiance fields (NeRFs) have gained popularity across various applications. However, they face challenges in the sparse view setting, lacking sufficient constraints from volume rendering. Reconstructing and understanding a 3D scene from sparse and unconstrained cameras is a long-standing problem in classical computer vision with diverse applications. While recent works have explored NeRFs i… ▽ More

    Submitted 18 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

  18. arXiv:2401.04092  [pdf, other

    cs.CV

    GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

    Authors: Tong Wu, Guandao Yang, Zhibing Li, Kai Zhang, Ziwei Liu, Leonidas Guibas, Dahua Lin, Gordon Wetzstein

    Abstract: Despite recent advances in text-to-3D generative methods, there is a notable absence of reliable evaluation metrics. Existing metrics usually focus on a single criterion each, such as how well the asset aligned with the input text. These metrics lack the flexibility to generalize to different evaluation criteria and might not align well with human preferences. Conducting user preference studies is… ▽ More

    Submitted 9 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: Project page: https://gpteval3d.github.io/ ; Code: https://github.com/3DTopia/GPTEval3D

  19. arXiv:2401.00850  [pdf, other

    cs.CV cs.AI

    Refining Pre-Trained Motion Models

    Authors: Xinglong Sun, Adam W. Harley, Leonidas J. Guibas

    Abstract: Given the difficulty of manually annotating motion in video, the current best motion estimation methods are trained with synthetic data, and therefore struggle somewhat due to a train/test gap. Self-supervised methods hold the promise of training directly on real video, but typically perform worse. These include methods trained with warp error (i.e., color constancy) combined with smoothness terms… ▽ More

    Submitted 16 February, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

    Comments: Accepted at ICRA 2024

  20. arXiv:2312.15610  [pdf, other

    cs.CV

    Towards Learning Geometric Eigen-Lengths Crucial for Fitting Tasks

    Authors: Yijia Weng, Kaichun Mo, Ruoxi Shi, Yanchao Yang, Leonidas J. Guibas

    Abstract: Some extremely low-dimensional yet crucial geometric eigen-lengths often determine the success of some geometric tasks. For example, the height of an object is important to measure to check if it can fit between the shelves of a cabinet, while the width of a couch is crucial when trying to move it through a doorway. Humans have materialized such crucial geometric eigen-lengths in common sense sinc… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

    Comments: ICML 2023. Project page: https://yijiaweng.github.io/geo-eigen-length

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:36958-36977, 2023

  21. arXiv:2312.15130  [pdf, other

    cs.CV

    PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments

    Authors: Yang You, Kai Xiong, Zhening Yang, Zhengxiang Huang, Junwei Zhou, Ruoxi Shi, Zhou Fang, Adam W. Harley, Leonidas Guibas, Cewu Lu

    Abstract: Pose estimation is a crucial task in computer vision and robotics, enabling the tracking and manipulation of objects in images or videos. While several datasets exist for pose estimation, there is a lack of large-scale datasets specifically focusing on cluttered scenes with occlusions. We introduce PACE (Pose Annotations in Cluttered Environments), a large-scale benchmark designed to advance the d… ▽ More

    Submitted 31 March, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

  22. arXiv:2312.06663  [pdf, other

    cs.CV cs.GR

    CAD: Photorealistic 3D Generation via Adversarial Distillation

    Authors: Ziyu Wan, Despoina Paschalidou, Ian Huang, Hongyu Liu, Bokui Shen, Xiaoyu Xiang, **g Liao, Leonidas Guibas

    Abstract: The increased demand for 3D data in AR/VR, robotics and gaming applications, gave rise to powerful generative pipelines capable of synthesizing high-quality 3D objects. Most of these models rely on the Score Distillation Sampling (SDS) algorithm to optimize a 3D representation such that the rendered image maintains a high likelihood as evaluated by a pre-trained diffusion model. However, finding a… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Project page: http://raywzy.com/CAD/

  23. arXiv:2312.01307  [pdf, other

    cs.RO cs.CV

    SAGE: Bridging Semantic and Actionable Parts for GEneralizable Manipulation of Articulated Objects

    Authors: Haoran Geng, Songlin Wei, Congyue Deng, Bokui Shen, He Wang, Leonidas Guibas

    Abstract: To interact with daily-life articulated objects of diverse structures and functionalities, understanding the object parts plays a central role in both user instruction comprehension and task execution. However, the possible discordance between the semantic meaning and physics functionalities of the parts poses a challenge for designing a general system. To address this problem, we propose SAGE, a… ▽ More

    Submitted 30 March, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

  24. arXiv:2311.17984  [pdf, other

    cs.CV

    4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling

    Authors: Sherwin Bahmani, Ivan Skorokhodov, Victor Rong, Gordon Wetzstein, Leonidas Guibas, Peter Wonka, Sergey Tulyakov, Jeong Joon Park, Andrea Tagliasacchi, David B. Lindell

    Abstract: Recent breakthroughs in text-to-4D generation rely on pre-trained text-to-image and text-to-video models to generate dynamic 3D scenes. However, current text-to-4D methods face a three-way tradeoff between the quality of scene appearance, 3D structure, and motion. For example, text-to-image models and their 3D-aware variants are trained on internet-scale image datasets and can be used to produce s… ▽ More

    Submitted 26 May, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: CVPR 2024; Project page: https://sherwinbahmani.github.io/4dfy

  25. arXiv:2311.16504  [pdf, other

    cs.CV cs.GR

    Rethinking Directional Integration in Neural Radiance Fields

    Authors: Congyue Deng, Jiawei Yang, Leonidas Guibas, Yue Wang

    Abstract: Recent works use the Neural radiance field (NeRF) to perform multi-view 3D reconstruction, providing a significant leap in rendering photorealistic scenes. However, despite its efficacy, NeRF exhibits limited capability of learning view-dependent effects compared to light field rendering or image-based view synthesis. To that end, we introduce a modification to the NeRF rendering equation which is… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  26. arXiv:2311.02787  [pdf, other

    cs.RO cs.AI

    Make a Donut: Hierarchical EMD-Space Planning for Zero-Shot Deformable Manipulation with Tools

    Authors: Yang You, Bokui Shen, Congyue Deng, Haoran Geng, Songlin Wei, He Wang, Leonidas Guibas

    Abstract: Deformable object manipulation stands as one of the most captivating yet formidable challenges in robotics. While previous techniques have predominantly relied on learning latent dynamics through demonstrations, typically represented as either particles or images, there exists a pertinent limitation: acquiring suitable demonstrations, especially for long-horizon tasks, can be elusive. Moreover, ba… ▽ More

    Submitted 24 March, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

    Comments: 8 pages

  27. arXiv:2310.20685  [pdf, other

    cs.CV

    NeRF Revisited: Fixing Quadrature Instability in Volume Rendering

    Authors: Mikaela Angelina Uy, Kiyohiro Nakayama, Guandao Yang, Rahul Krishna Thomas, Leonidas Guibas, Ke Li

    Abstract: Neural radiance fields (NeRF) rely on volume rendering to synthesize novel views. Volume rendering requires evaluating an integral along each ray, which is numerically approximated with a finite sum that corresponds to the exact integral along the ray under piecewise constant volume density. As a consequence, the rendered result is unstable w.r.t. the choice of samples along the ray, a phenomenon… ▽ More

    Submitted 19 January, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: Neurips 2023

  28. arXiv:2310.16838  [pdf, other

    cs.RO cs.CV

    SparseDFF: Sparse-View Feature Distillation for One-Shot Dexterous Manipulation

    Authors: Qianxu Wang, Haotong Zhang, Congyue Deng, Yang You, Hao Dong, Yixin Zhu, Leonidas Guibas

    Abstract: Humans demonstrate remarkable skill in transferring manipulation abilities across objects of varying shapes, poses, and appearances, a capability rooted in their understanding of semantic correspondences between different instances. To equip robots with a similar high-level comprehension, we present SparseDFF, a novel DFF for 3D scenes utilizing large 2D vision models to extract semantic features… ▽ More

    Submitted 18 March, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

  29. arXiv:2310.16050  [pdf, other

    cs.RO

    EquivAct: SIM(3)-Equivariant Visuomotor Policies beyond Rigid Object Manipulation

    Authors: **gyun Yang, Congyue Deng, Jimmy Wu, Rika Antonova, Leonidas Guibas, Jeannette Bohg

    Abstract: If a robot masters folding a kitchen towel, we would expect it to master folding a large beach towel. However, existing policy learning methods that rely on data augmentation still don't guarantee such generalization. Our insight is to add equivariance to both the visual object representation and policy architecture. We propose EquivAct which utilizes SIM(3)-equivariant network structures that gua… ▽ More

    Submitted 14 May, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: ICRA 2024; The first two authors contributed equally

  30. arXiv:2310.15928  [pdf, other

    cs.RO

    AO-Grasp: Articulated Object Grasp Generation

    Authors: Carlota Parés Morlans, Claire Chen, Yijia Weng, Michelle Yi, Yuying Huang, Nick Heppert, Linqi Zhou, Leonidas Guibas, Jeannette Bohg

    Abstract: We introduce AO-Grasp, a grasp proposal method that generates 6 DoF grasps that enable robots to interact with articulated objects, such as opening and closing cabinets and appliances. AO-Grasp consists of two main contributions: the AO-Grasp Model and the AO-Grasp Dataset. Given a segmented partial point cloud of a single articulated object, the AO-Grasp Model predicts the best grasp points on th… ▽ More

    Submitted 18 March, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: Project website: https://stanford-iprl-lab.github.io/ao-grasp

  31. arXiv:2310.06992  [pdf, other

    cs.CV

    Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models

    Authors: Wen-Hsuan Chu, Adam W. Harley, Pavel Tokmakov, Achal Dave, Leonidas Guibas, Katerina Fragkiadaki

    Abstract: Object tracking is central to robot perception and scene understanding. Tracking-by-detection has long been a dominant paradigm for object tracking of specific object categories. Recently, large-scale pre-trained models have shown promising advances in detecting and segmenting objects and parts in 2D static images in the wild. This begs the question: can we re-purpose these large-scale pre-trained… ▽ More

    Submitted 25 January, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Project page available at https://wenhsuanchu.github.io/ovtracktor/

  32. OptCtrlPoints: Finding the Optimal Control Points for Biharmonic 3D Shape Deformation

    Authors: Kunho Kim, Mikaela Angelina Uy, Despoina Paschalidou, Alec Jacobson, Leonidas J. Guibas, Minhyuk Sung

    Abstract: We propose OptCtrlPoints, a data-driven framework designed to identify the optimal sparse set of control points for reproducing target shapes using biharmonic 3D shape deformation. Control-point-based 3D deformation methods are widely utilized for interactive shape editing, and their usability is enhanced when the control points are sparse yet strategically distributed across the shape. With this… ▽ More

    Submitted 13 October, 2023; v1 submitted 22 September, 2023; originally announced September 2023.

    Comments: Pacific Graphics 2023 (Full Paper). Project page: https://soulmates2.github.io/publications/OptCtrlPoints/

  33. arXiv:2309.03468  [pdf, other

    cs.CV cs.AI cs.LG

    Cross-Image Context Matters for Bongard Problems

    Authors: Nikhil Raghuraman, Adam W. Harley, Leonidas Guibas

    Abstract: Current machine learning methods struggle to solve Bongard problems, which are a type of IQ test that requires deriving an abstract "concept" from a set of positive and negative "support" images, and then classifying whether or not a new query image depicts the key concept. On Bongard-HOI, a benchmark for natural-image Bongard problems, existing methods have only reached 66% accuracy (where chance… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: Main paper: 7 pages, Appendix: 10 pages, 30 figures. Code: https://github.com/nraghuraman/bongard-context

  34. arXiv:2307.15055  [pdf, other

    cs.CV

    PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking

    Authors: Yang Zheng, Adam W. Harley, Bokui Shen, Gordon Wetzstein, Leonidas J. Guibas

    Abstract: We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to m… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

  35. arXiv:2307.07511  [pdf, other

    cs.CV

    NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis

    Authors: Nilesh Kulkarni, Davis Rempe, Kyle Genova, Abhijit Kundu, Justin Johnson, David Fouhey, Leonidas Guibas

    Abstract: We address the problem of generating realistic 3D motions of humans interacting with objects in a scene. Our key idea is to create a neural interaction field attached to a specific object, which outputs the distance to the valid interaction manifold given a human pose as input. This interaction field guides the sampling of an object-conditioned human motion diffusion model, so as to encourage plau… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: Project Page with additional results available https://nileshkulkarni.github.io/nifty

  36. arXiv:2306.06212  [pdf, other

    cs.CV cs.GR

    Aladdin: Zero-Shot Hallucination of Stylized 3D Assets from Abstract Scene Descriptions

    Authors: Ian Huang, Vrishab Krishna, Omoruyi Atekha, Leonidas Guibas

    Abstract: What constitutes the "vibe" of a particular scene? What should one find in "a busy, dirty city street", "an idyllic countryside", or "a crime scene in an abandoned living room"? The translation from abstract scene descriptions to stylized scene elements cannot be done with any generality by extant systems trained on rigid and limited indoor datasets. In this paper, we propose to leverage the knowl… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

  37. arXiv:2305.16315  [pdf, other

    cs.CV

    NAP: Neural 3D Articulation Prior

    Authors: Jiahui Lei, Congyue Deng, Bokui Shen, Leonidas Guibas, Kostas Daniilidis

    Abstract: We propose Neural 3D Articulation Prior (NAP), the first 3D deep generative model to synthesize 3D articulated object models. Despite the extensive research on generating 3D objects, compositions, or scenes, there remains a lack of focus on capturing the distribution of articulated objects, a common object category for human and robot interaction. To generate articulated objects, we first design a… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: project page: https://www.cis.upenn.edu/~leijh/projects/nap

  38. arXiv:2305.16314  [pdf, other

    cs.CV

    Banana: Banach Fixed-Point Network for Pointcloud Segmentation with Inter-Part Equivariance

    Authors: Congyue Deng, Jiahui Lei, Bokui Shen, Kostas Daniilidis, Leonidas Guibas

    Abstract: Equivariance has gained strong interest as a desirable network property that inherently ensures robust generalization. However, when dealing with complex systems such as articulated objects or multi-object scenes, effectively capturing inter-part transformations poses a challenge, as it becomes entangled with the overall structure and local transformations. The interdependence of part assignment a… ▽ More

    Submitted 26 May, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

  39. Single-Shot Implicit Morphable Faces with Consistent Texture Parameterization

    Authors: Connor Z. Lin, Koki Nagano, Jan Kautz, Eric R. Chan, Umar Iqbal, Leonidas Guibas, Gordon Wetzstein, Sameh Khamis

    Abstract: There is a growing demand for the accessible creation of high-quality 3D avatars that are animatable and customizable. Although 3D morphable models provide intuitive control for editing and animation, and robustness for single-view face reconstruction, they cannot easily capture geometric and appearance details. Methods based on neural implicit representations, such as signed distance functions (S… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: SIGGRAPH 2023, Project Page: https://research.nvidia.com/labs/toronto-ai/ssif

  40. arXiv:2305.01921  [pdf, other

    cs.CV

    DiffFacto: Controllable Part-Based 3D Point Cloud Generation with Cross Diffusion

    Authors: Kiyohiro Nakayama, Mikaela Angelina Uy, Jiahui Huang, Shi-Min Hu, Ke Li, Leonidas J Guibas

    Abstract: While the community of 3D point cloud generation has witnessed a big growth in recent years, there still lacks an effective way to enable intuitive user control in the generation process, hence limiting the general utility of such methods. Since an intuitive way of decomposing a shape is through its parts, we propose to tackle the task of controllable part-based point cloud generation. We introduc… ▽ More

    Submitted 20 August, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

  41. arXiv:2304.14473  [pdf, other

    cs.CV cs.AI cs.LG

    Learning a Diffusion Prior for NeRFs

    Authors: Guandao Yang, Abhijit Kundu, Leonidas J. Guibas, Jonathan T. Barron, Ben Poole

    Abstract: Neural Radiance Fields (NeRFs) have emerged as a powerful neural 3D representation for objects and scenes derived from 2D data. Generating NeRFs, however, remains difficult in many scenarios. For instance, training a NeRF with only a small number of views as supervision remains challenging since it is an under-constrained problem. In such settings, it calls for some inductive prior to filter out b… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

  42. arXiv:2304.02163  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    GINA-3D: Learning to Generate Implicit Neural Assets in the Wild

    Authors: Bokui Shen, Xinchen Yan, Charles R. Qi, Mahyar Najibi, Boyang Deng, Leonidas Guibas, Yin Zhou, Dragomir Anguelov

    Abstract: Modeling the 3D world from sensor data for simulation is a scalable way of develo** testing and validation environments for robotic learning problems such as autonomous driving. However, manually creating or re-creating real-world-like environments is difficult, expensive, and not scalable. Recent generative model techniques have shown promising progress to address such challenges by learning 3D… ▽ More

    Submitted 28 August, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: Accepted by CVPR 2023; Our WOD-ObjectAsset can be accessed through waymo.com/open

  43. arXiv:2304.01732  [pdf, other

    physics.comp-ph cs.LG physics.flu-dyn

    Adaptive learning of effective dynamics: Adaptive real-time, online modeling for complex systems

    Authors: Ivica Kičić, Pantelis R. Vlachas, Georgios Arampatzis, Michail Chatzimanolakis, Leonidas Guibas, Petros Koumoutsakos

    Abstract: Predictive simulations are essential for applications ranging from weather forecasting to material design. The veracity of these simulations hinges on their capacity to capture the effective system dynamics. Massively parallel simulations predict the systems dynamics by resolving all spatiotemporal scales, often at a cost that prevents experimentation. On the other hand, reduced order models are f… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

    Comments: 34 pages

  44. arXiv:2304.00341  [pdf, other

    cs.CV

    JacobiNeRF: NeRF Sha** with Mutual Information Gradients

    Authors: Xiaomeng Xu, Yanchao Yang, Kaichun Mo, Boxiao Pan, Li Yi, Leonidas Guibas

    Abstract: We propose a method that trains a neural radiance field (NeRF) to encode not only the appearance of the scene but also semantic correlations between scene points, regions, or entities -- aiming to capture their mutual co-variation patterns. In contrast to the traditional first-order photometric reconstruction objective, our method explicitly regularizes the learning dynamics to align the Jacobians… ▽ More

    Submitted 1 April, 2023; originally announced April 2023.

  45. arXiv:2303.17968  [pdf, other

    cs.CV

    VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization

    Authors: Bingfan Zhu, Yanchao Yang, Xulong Wang, Youyi Zheng, Leonidas Guibas

    Abstract: We propose VDN-NeRF, a method to train neural radiance fields (NeRFs) for better geometry under non-Lambertian surface and dynamic lighting conditions that cause significant variation in the radiance of a point when viewed from different angles. Instead of explicitly modeling the underlying factors that result in the view-dependent phenomenon, which could be complex yet not inclusive, we develop a… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

  46. arXiv:2303.15440  [pdf, other

    cs.CV

    EFEM: Equivariant Neural Field Expectation Maximization for 3D Object Segmentation Without Scene Supervision

    Authors: Jiahui Lei, Congyue Deng, Karl Schmeckpeper, Leonidas Guibas, Kostas Daniilidis

    Abstract: We introduce Equivariant Neural Field Expectation Maximization (EFEM), a simple, effective, and robust geometric algorithm that can segment objects in 3D scenes without annotations or training on scenes. We achieve such unsupervised segmentation by exploiting single object shape priors. We make two novel steps in that direction. First, we introduce equivariant shape representations to this problem… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR2023, project page https://www.cis.upenn.edu/~leijh/projects/efem

  47. arXiv:2303.13634  [pdf, other

    cs.LG cs.CE

    Physics-informed PointNet: On how many irregular geometries can it solve an inverse problem simultaneously? Application to linear elasticity

    Authors: Ali Kashefi, Leonidas J. Guibas, Tapan Mukerji

    Abstract: Regular physics-informed neural networks (PINNs) predict the solution of partial differential equations using sparse labeled data but only over a single domain. On the other hand, fully supervised learning models are first trained usually over a few thousand domains with known solutions (i.e., labeled data) and then predict the solution over a few hundred unseen domains. Physics-informed PointNet… ▽ More

    Submitted 18 September, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

  48. arXiv:2303.13582  [pdf, other

    cs.CV

    SCADE: NeRFs from Space Carving with Ambiguity-Aware Depth Estimates

    Authors: Mikaela Angelina Uy, Ricardo Martin-Brualla, Leonidas Guibas, Ke Li

    Abstract: Neural radiance fields (NeRFs) have enabled high fidelity 3D reconstruction from multiple 2D input views. However, a well-known drawback of NeRFs is the less-than-ideal performance under a small number of views, due to insufficient constraints enforced by volumetric rendering. To address this issue, we introduce SCADE, a novel technique that improves NeRF reconstruction quality on sparse, unconstr… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  49. arXiv:2303.12074  [pdf, other

    cs.CV

    CC3D: Layout-Conditioned Generation of Compositional 3D Scenes

    Authors: Sherwin Bahmani, Jeong Joon Park, Despoina Paschalidou, Xingguang Yan, Gordon Wetzstein, Leonidas Guibas, Andrea Tagliasacchi

    Abstract: In this work, we introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts, trained using single-view images. Different from most existing 3D GANs that limit their applicability to aligned single objects, we focus on generating complex scenes with multiple objects, by modeling the compositional nature of 3D scenes. By devising a 2D l… ▽ More

    Submitted 8 September, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: ICCV 2023; Webpage: https://sherwinbahmani.github.io/cc3d/

  50. arXiv:2303.12050  [pdf, other

    cs.CV

    CurveCloudNet: Processing Point Clouds with 1D Structure

    Authors: Colton Stearns, Davis Rempe, Jiateng Liu, Alex Fu, Sebastien Mascha, Jeong Joon Park, Despoina Paschalidou, Leonidas J. Guibas

    Abstract: Modern depth sensors such as LiDAR operate by swee** laser-beams across the scene, resulting in a point cloud with notable 1D curve-like structures. In this work, we introduce a new point cloud processing scheme and backbone, called CurveCloudNet, which takes advantage of the curve-like structure inherent to these sensors. While existing backbones discard the rich 1D traversal patterns and rely… ▽ More

    Submitted 1 February, 2024; v1 submitted 21 March, 2023; originally announced March 2023.