Skip to main content

Showing 1–50 of 179 results for author: Pollefeys, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19811  [pdf, ps, other

    cs.CV

    EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting

    Authors: Daiwei Zhang, Gengyan Li, Jiajie Li, Mickaël Bressieux, Otmar Hilliges, Marc Pollefeys, Luc Van Gool, Xi Wang

    Abstract: Human activities are inherently complex, and even simple household tasks involve numerous object interactions. To better understand these activities and behaviors, it is crucial to model their dynamic interactions with the environment. The recent availability of affordable head-mounted cameras and egocentric data offers a more accessible and efficient means to understand dynamic human-object inter… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.05849  [pdf, other

    cs.RO

    MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps

    Authors: Jianhao Zheng, Daniel Barath, Marc Pollefeys, Iro Armeni

    Abstract: Creating 3D semantic reconstructions of environments is fundamental to many applications, especially when related to autonomous agent operation (e.g., goal-oriented navigation or object interaction and manipulation). Commonly, 3D semantic reconstruction systems capture the entire scene in the same level of detail. However, certain tasks (e.g., object interaction) require a fine-grained and high-re… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  3. arXiv:2406.04340  [pdf, other

    cs.CV

    GLACE: Global Local Accelerated Coordinate Encoding

    Authors: Fang**hua Wang, Xudong Jiang, Silvano Galliani, Christoph Vogel, Marc Pollefeys

    Abstract: Scene coordinate regression (SCR) methods are a family of visual localization methods that directly regress 2D-3D matches for camera pose estimation. They are effective in small-scale scenes but face significant challenges in large-scale scenes that are further amplified in the absence of ground truth 3D point clouds for supervision. Here, the model can only rely on reprojection constraints and ne… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Large-scale visual localization with a single optimizable MLP. CVPR 2024. Code: https://github.com/cvg/glace. Project page: https://xjiangan.github.io/glace

  4. arXiv:2406.03175  [pdf, other

    cs.CV

    Dynamic 3D Gaussian Fields for Urban Areas

    Authors: Tobias Fischer, Jonas Kulhanek, Samuel Rota Bulò, Lorenzo Porzi, Marc Pollefeys, Peter Kontschieder

    Abstract: We present an efficient neural 3D scene representation for novel-view synthesis (NVS) in large-scale, dynamic urban areas. Existing works are not well suited for applications like mixed-reality or closed-loop simulation due to their limited visual quality and non-interactive rendering speeds. Recently, rasterization-based approaches have achieved high-quality NVS at impressive speeds. However, the… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Project page is available at https://tobiasfshr.github.io/pub/4dgf/

  5. arXiv:2405.20141  [pdf, other

    cs.CV

    OpenDAS: Domain Adaptation for Open-Vocabulary Segmentation

    Authors: Gonca Yilmaz, Songyou Peng, Francis Engelmann, Marc Pollefeys, Hermann Blum

    Abstract: The advent of Vision Language Models (VLMs) transformed image understanding from closed-set classifications to dynamic image-language interactions, enabling open-vocabulary segmentation. Despite this flexibility, VLMs often fall behind closed-set classifiers in accuracy due to their reliance on ambiguous image captions and lack of domain-specific knowledge. We, therefore, introduce a new task doma… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  6. arXiv:2405.19295  [pdf, other

    cs.CV

    3D Neural Edge Reconstruction

    Authors: Lei Li, Songyou Peng, Zehao Yu, Shaohui Liu, Rémi Pautrat, Xiaochuan Yin, Marc Pollefeys

    Abstract: Real-world objects and environments are predominantly composed of edge features, including straight lines and curves. Such edges are crucial elements for various applications, such as CAD modeling, surface meshing, lane map**, etc. However, existing traditional methods only prioritize lines over curves for simplicity in geometric modeling. To this end, we introduce EMAP, a new method for learnin… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Project page: https://neural-edge-map.github.io

  7. arXiv:2405.18715  [pdf, other

    cs.CV

    NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild

    Authors: Weining Ren, Zihan Zhu, Boyang Sun, Jiaqi Chen, Marc Pollefeys, Songyou Peng

    Abstract: Neural Radiance Fields (NeRFs) have shown remarkable success in synthesizing photorealistic views from multi-view images of static scenes, but face challenges in dynamic, real-world environments with distractors like moving objects, shadows, and lighting changes. Existing methods manage controlled environments and low occlusion ratios but fall short in render quality, especially under high occlusi… ▽ More

    Submitted 2 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: CVPR 2024, first two authors contributed equally. Project Page: https://rwn17.github.io/nerf-on-the-go/

  8. arXiv:2405.10255  [pdf, other

    cs.CV cs.RO

    When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

    Authors: Xianzheng Ma, Yash Bhalgat, Brandon Smart, Shuai Chen, Xinghui Li, Jian Ding, **dong Gu, Dave Zhenyu Chen, Songyou Peng, Jia-Wang Bian, Philip H Torr, Marc Pollefeys, Matthias Nießner, Ian D Reid, Angel X. Chang, Iro Laina, Victor Adrian Prisacariu

    Abstract: As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the methodologies enabling LLMs to process, understand, and generate 3D data. Highlighting the unique advantages of LLMs, such as in-context lear… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  9. arXiv:2405.01333  [pdf, other

    cs.RO cs.CV

    NeRF in Robotics: A Survey

    Authors: Guangming Wang, Lei Pan, Songyou Peng, Shaohui Liu, Chenfeng Xu, Yanzi Miao, Wei Zhan, Masayoshi Tomizuka, Marc Pollefeys, Hesheng Wang

    Abstract: Meticulous 3D environment representations have been a longstanding goal in computer vision and robotics fields. The recent emergence of neural implicit representations has introduced radical innovation to this field as implicit representations enable numerous capabilities. Among these, the Neural Radiance Field (NeRF) has sparked a trend because of the huge representational advantages, such as sim… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 21 pages, 19 figures

  10. arXiv:2404.18025  [pdf, other

    cs.CV

    Retrieval Robust to Object Motion Blur

    Authors: Rong Zou, Marc Pollefeys, Denys Rozumnyi

    Abstract: Moving objects are frequently seen in daily life and usually appear blurred in images due to their motion. While general object retrieval is a widely explored area in computer vision, it primarily focuses on sharp and static objects, and retrieval of motion-blurred objects in large image collections remains unexplored. We propose a method for object retrieval in images that are affected by motion… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  11. arXiv:2404.16552  [pdf, other

    cs.CV

    Efficient Solution of Point-Line Absolute Pose

    Authors: Petr Hruby, Timothy Duff, Marc Pollefeys

    Abstract: We revisit certain problems of pose estimation based on 3D--2D correspondences between features which may be points or lines. Specifically, we address the two previously-studied minimal problems of estimating camera extrinsics from $p \in \{ 1, 2 \}$ point--point correspondences and $l=3-p$ line--line correspondences. To the best of our knowledge, all of the previously-known practical solutions to… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, 11 pages, 8 figures, 5 tables

    MSC Class: 68T45 ACM Class: I.4.5

  12. arXiv:2404.14565  [pdf, other

    cs.CV

    "Where am I?" Scene Retrieval with Language

    Authors: Jiaqi Chen, Daniel Barath, Iro Armeni, Marc Pollefeys, Hermann Blum

    Abstract: Natural language interfaces to embodied AI are becoming more ubiquitous in our daily lives. This opens further opportunities for language-based interaction with embodied agents, such as a user instructing an agent to execute some task in a specific location. For example, "put the bowls back in the cupboard next to the fridge" or "meet me at the intersection under the red sign." As such, we need me… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  13. arXiv:2404.12440  [pdf, other

    cs.RO cs.CV

    Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds

    Authors: Oliver Lemke, Zuria Bauer, René Zurbrügg, Marc Pollefeys, Francis Engelmann, Hermann Blum

    Abstract: In recent years, modern techniques in deep learning and large-scale datasets have led to impressive progress in 3D instance segmentation, grasp pose estimation, and robotics. This allows for accurate detection directly in 3D scenes, object- and environment-aware grasp prediction, as well as robust and repeatable robotic manipulation. This work aims to integrate these recent methods into a comprehe… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted at ICRA 2024 Workshops. Code and videos available at https://spot-compose.github.io/

    ACM Class: I.2.9; I.2.10

  14. arXiv:2404.03658  [pdf, other

    cs.CV

    Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

    Authors: Rui Li, Tobias Fischer, Mattia Segu, Marc Pollefeys, Luc Van Gool, Federico Tombari

    Abstract: Recovering the 3D scene geometry from a single view is a fundamental yet ill-posed problem in computer vision. While classical depth estimation methods infer only a 2.5D scene representation limited to the image plane, recent approaches based on radiance fields reconstruct a full 3D representation. However, these methods still struggle with occluded regions since inferring geometry without visual… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: CVPR 2024. Project page: https://ruili3.github.io/kyn

  15. arXiv:2404.03650  [pdf, other

    cs.CV

    OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views

    Authors: Francis Engelmann, Fabian Manhardt, Michael Niemeyer, Keisuke Tateno, Marc Pollefeys, Federico Tombari

    Abstract: Large visual-language models (VLMs), like CLIP, enable open-set image segmentation to segment arbitrary concepts from an image in a zero-shot manner. This goes beyond the traditional closed-set assumption, i.e., where models can only segment classes from a pre-defined training set. More recently, first works on open-set segmentation in 3D scenes have appeared in the literature. These methods are h… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: ICLR 2024, Project page: https://opennerf.github.io

    Journal ref: ICLR 2024

  16. arXiv:2404.02152  [pdf, other

    cs.CV

    GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image

    Authors: Chong Bao, Yinda Zhang, Yuan Li, Xiyu Zhang, Bangbang Yang, Hujun Bao, Marc Pollefeys, Guofeng Zhang, Zhaopeng Cui

    Abstract: Recently, we have witnessed the explosive growth of various volumetric representations in modeling animatable head avatars. However, due to the diversity of frameworks, there is no practical method to support high-level applications like 3D head avatar editing across different representations. In this paper, we propose a generic avatar editing approach that can be universally applied to various 3D… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024. Project page: https://zju3dv.github.io/geneavatar/

  17. arXiv:2404.00469  [pdf, other

    cs.CV

    SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs

    Authors: Yang Miao, Francis Engelmann, Olga Vysotska, Federico Tombari, Marc Pollefeys, Dániel Béla Baráth

    Abstract: We introduce a novel problem, i.e., the localization of an input image within a multi-modal reference map represented by a database of 3D scene graphs. These graphs comprise multiple modalities, including object-level point clouds, images, attributes, and relationships between objects, offering a lightweight and efficient alternative to conventional methods that rely on extensive image databases.… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  18. arXiv:2404.00429  [pdf, other

    cs.CV

    Multiway Point Cloud Mosaicking with Diffusion and Global Optimization

    Authors: Shengze **, Iro Armeni, Marc Pollefeys, Daniel Barath

    Abstract: We introduce a novel framework for multiway point cloud mosaicking (named Wednesday), designed to co-align sets of partially overlap** point clouds -- typically obtained from 3D scanners or moving RGB-D cameras -- into a unified coordinate system. At the core of our approach is ODIN, a learned pairwise registration algorithm that iteratively identifies overlaps and refines attention scores, empl… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  19. arXiv:2404.00168  [pdf, other

    cs.CV

    Multi-Level Neural Scene Graphs for Dynamic Urban Environments

    Authors: Tobias Fischer, Lorenzo Porzi, Samuel Rota Bulò, Marc Pollefeys, Peter Kontschieder

    Abstract: We estimate the radiance field of large-scale dynamic areas from multiple vehicle captures under varying environmental conditions. Previous works in this domain are either restricted to static environments, do not scale to more than a single short video, or struggle to separately represent dynamic object instances. To this end, we present a novel, decomposable radiance field approach for dynamic u… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

    Comments: CVPR 2024. Project page is available at https://tobiasfshr.github.io/pub/ml-nsg/

  20. arXiv:2403.16736  [pdf, other

    cs.CV

    Creating a Digital Twin of Spinal Surgery: A Proof of Concept

    Authors: Jonas Hein, Frédéric Giraud, Lilian Calvet, Alexander Schwarz, Nicola Alessandro Cavalcanti, Sergey Prokudin, Mazda Farshad, Siyu Tang, Marc Pollefeys, Fabio Carrillo, Philipp Fürnstahl

    Abstract: Surgery digitalization is the process of creating a virtual replica of real-world surgery, also referred to as a surgical digital twin (SDT). It has significant applications in various fields such as education and training, surgical planning, and automation of surgical tasks. In addition, SDTs are an ideal foundation for machine learning methods, enabling the automatic generation of training data.… ▽ More

    Submitted 22 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted for the DCA in MI Workshop @ CVPR 2024. Project page: https://jonashein.github.io/surgerydigitization/

  21. arXiv:2403.15313  [pdf, other

    cs.CV cs.AI

    CR3DT: Camera-RADAR Fusion for 3D Detection and Tracking

    Authors: Nicolas Baumann, Michael Baumgartner, Edoardo Ghignone, Jonas Kühne, Tobias Fischer, Yung-Hsu Yang, Marc Pollefeys, Michele Magno

    Abstract: Accurate detection and tracking of surrounding objects is essential to enable self-driving vehicles. While Light Detection and Ranging (LiDAR) sensors have set the benchmark for high performance, the appeal of camera-only solutions lies in their cost-effectiveness. Notably, despite the prevalent use of Radio Detection and Ranging (RADAR) sensors in automotive systems, their potential in 3D detecti… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  22. arXiv:2403.14627  [pdf, other

    cs.CV

    MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

    Authors: Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai

    Abstract: We propose MVSplat, an efficient feed-forward 3D Gaussian Splatting model learned from sparse multi-view images. To accurately localize the Gaussian centers, we propose to build a cost volume representation via plane swee** in the 3D space, where the cross-view feature similarities stored in the cost volume can provide valuable geometry cues to the estimation of depth. We learn the Gaussian prim… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Project page: https://donydchen.github.io/mvsplat Code: https://github.com/donydchen/mvsplat

  23. arXiv:2403.03370  [pdf, other

    cs.CV cs.RO

    F$^3$Loc: Fusion and Filtering for Floorplan Localization

    Authors: Changan Chen, Rui Wang, Christoph Vogel, Marc Pollefeys

    Abstract: In this paper we propose an efficient data-driven solution to self-localization within a floorplan. Floorplan data is readily available, long-term persistent and inherently robust to changes in the visual appearance. Our method does not require retraining per map and location or demand a large database of images of the area of interest. We propose a novel probabilistic model consisting of an obser… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 10 pages, 11 figure, accepted to CVPR 2024

  24. arXiv:2402.15321  [pdf, other

    cs.CV cs.AI cs.LG

    OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding

    Authors: Francis Engelmann, Ayca Takmaz, Jonas Schult, Elisabetta Fedele, Johanna Wald, Songyou Peng, Xi Wang, Or Litany, Siyu Tang, Federico Tombari, Marc Pollefeys, Leonidas Guibas, Hongbo Tian, Chunjie Wang, Xiaosheng Yan, Bingwen Wang, Xuanyang Zhang, Xiao Liu, Phuc Nguyen, Khoi Nguyen, Anh Tran, Cuong Pham, Zhening Huang, Xiaoyang Wu, Xi Chen , et al. (3 additional authors not shown)

    Abstract: This report provides an overview of the challenge hosted at the OpenSUN3D Workshop on Open-Vocabulary 3D Scene Understanding held in conjunction with ICCV 2023. The goal of this workshop series is to provide a platform for exploration and discussion of open-vocabulary 3D scene understanding tasks, including but not limited to segmentation, detection and map**. We provide an overview of the chall… ▽ More

    Submitted 17 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: Our OpenSUN3D workshop website for ICCV 2023: https://opensun3d.github.io/index_iccv23.html

  25. arXiv:2401.10786  [pdf, other

    cs.CV

    Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion

    Authors: Zuoyue Li, Zhenqiang Li, Zhaopeng Cui, Marc Pollefeys, Martin R. Oswald

    Abstract: Directly generating scenes from satellite imagery offers exciting possibilities for integration into applications like games and map services. However, challenges arise from significant view changes and scene scale. Previous efforts mainly focused on image or video generation, lacking exploration into the adaptability of scene generation for arbitrary views. Existing 3D generation works either ope… ▽ More

    Submitted 1 April, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Journal ref: CVPR 2024

  26. arXiv:2401.08739  [pdf, other

    cs.CV cs.AI

    EgoGen: An Egocentric Synthetic Data Generator

    Authors: Gen Li, Kaifeng Zhao, Siwei Zhang, Xiaozhong Lyu, Mihai Dusmanu, Yan Zhang, Marc Pollefeys, Siyu Tang

    Abstract: Understanding the world in first-person view is fundamental in Augmented Reality (AR). This immersive perspective brings dramatic visual changes and unique challenges compared to third-person views. Synthetic data has empowered third-person-view vision models, but its application to embodied egocentric perception tasks remains largely unexplored. A critical challenge lies in simulating natural hum… ▽ More

    Submitted 11 April, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted by CVPR 2024 (Oral). 23 pages, 17 figures. Project page: https://ego-gen.github.io/

  27. arXiv:2401.03771  [pdf, other

    cs.CV

    NeRFmentation: NeRF-based Augmentation for Monocular Depth Estimation

    Authors: Casimir Feldmann, Niall Siegenheim, Nikolas Hars, Lovro Rabuzin, Mert Ertugrul, Luca Wolfart, Marc Pollefeys, Zuria Bauer, Martin R. Oswald

    Abstract: The capabilities of monocular depth estimation (MDE) models are limited by the availability of sufficient and diverse datasets. In the case of MDE models for autonomous driving, this issue is exacerbated by the linearity of the captured data trajectories. We propose a NeRF-based data augmentation pipeline to introduce synthetic data with more diverse viewing directions into training datasets and d… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  28. arXiv:2401.01887  [pdf, other

    cs.CV

    LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry

    Authors: Weirong Chen, Le Chen, Rui Wang, Marc Pollefeys

    Abstract: Visual odometry estimates the motion of a moving camera based on visual input. Existing methods, mostly focusing on two-view point tracking, often ignore the rich temporal context in the image sequence, thereby overlooking the global motion patterns and providing no assessment of the full trajectory reliability. These shortcomings hinder performance in scenarios with occlusion, dynamic objects, an… ▽ More

    Submitted 12 June, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: Accepted to CVPR 2024. Project page: https://chiaki530.github.io/projects/leapvo

  29. arXiv:2312.17232  [pdf, other

    cs.CV

    Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels

    Authors: Rui Huang, Songyou Peng, Ayca Takmaz, Federico Tombari, Marc Pollefeys, Shiji Song, Gao Huang, Francis Engelmann

    Abstract: Current 3D scene segmentation methods are heavily dependent on manually annotated 3D training datasets. Such manual annotations are labor-intensive, and often lack fine-grained details. Importantly, models trained on this data typically struggle to recognize object classes beyond the annotated classes, i.e., they do not generalize well to unseen domains and require additional domain-specific annot… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: Project Page: http://segment3d.github.io

  30. arXiv:2312.13285  [pdf, other

    cs.CV

    UniSDF: Unifying Neural Representations for High-Fidelity 3D Reconstruction of Complex Scenes with Reflections

    Authors: Fang**hua Wang, Marie-Julie Rakotosaona, Michael Niemeyer, Richard Szeliski, Marc Pollefeys, Federico Tombari

    Abstract: Neural 3D scene representations have shown great potential for 3D reconstruction from 2D images. However, reconstructing real-world captures of complex scenes still remains a challenge. Existing generic 3D reconstruction methods often struggle to represent fine geometric details and do not adequately model reflective surfaces of large-scale scenes. Techniques that explicitly focus on reflective su… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: Project page: https://fang**huawang.github.io/UniSDF

  31. arXiv:2312.04565  [pdf, other

    cs.CV

    MuRF: Multi-Baseline Radiance Fields

    Authors: Haofei Xu, Anpei Chen, Yuedong Chen, Christos Sakaridis, Yulun Zhang, Marc Pollefeys, Andreas Geiger, Fisher Yu

    Abstract: We present Multi-Baseline Radiance Fields (MuRF), a general feed-forward approach to solving sparse view synthesis under multiple different baseline settings (small and large baselines, and different number of input views). To render a target novel view, we discretize the 3D space into planes parallel to the target image plane, and accordingly construct a target view frustum volume. Such a target… ▽ More

    Submitted 9 June, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: CVPR 2024, Project Page: https://haofeixu.github.io/murf/, Code: https://github.com/autonomousvision/murf

  32. arXiv:2311.18068  [pdf, other

    cs.CV

    ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic Reconstruction

    Authors: Silvan Weder, Francis Engelmann, Johannes L. Schönberger, Akihito Seki, Marc Pollefeys, Martin R. Oswald

    Abstract: We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames. Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality. To overcome the inherent challenges of online methods, we make two main contributions. First, to effectively extract information from the… ▽ More

    Submitted 3 December, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

  33. arXiv:2311.17491  [pdf, other

    cs.CV

    Spherical Frustum Sparse Convolution Network for LiDAR Point Cloud Semantic Segmentation

    Authors: Yu Zheng, Guangming Wang, Jiuming Liu, Marc Pollefeys, Hesheng Wang

    Abstract: LiDAR point cloud semantic segmentation enables the robots to obtain fine-grained semantic information of the surrounding environment. Recently, many works project the point cloud onto the 2D image and adopt the 2D Convolutional Neural Networks (CNNs) or vision transformer for LiDAR point cloud semantic segmentation. However, since more than one point can be projected onto the same 2D position but… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: 17 pages, 10 figures, under review

  34. arXiv:2311.12174  [pdf, other

    cs.CV

    LABELMAKER: Automatic Semantic Label Generation from RGB-D Trajectories

    Authors: Silvan Weder, Hermann Blum, Francis Engelmann, Marc Pollefeys

    Abstract: Semantic annotations are indispensable to train or evaluate perception models, yet very costly to acquire. This work introduces a fully automated 2D/3D labeling framework that, without any human intervention, can generate labels for RGB-D scans at equal (or better) level of accuracy than comparable manually annotated datasets such as ScanNet. Our approach is based on an ensemble of state-of-the-ar… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  35. arXiv:2311.11016  [pdf, other

    cs.RO

    SNI-SLAM: Semantic Neural Implicit SLAM

    Authors: Siting Zhu, Guangming Wang, Hermann Blum, Jiuming Liu, Liang Song, Marc Pollefeys, Hesheng Wang

    Abstract: We propose SNI-SLAM, a semantic SLAM system utilizing neural implicit representation, that simultaneously performs accurate semantic map**, high-quality surface reconstruction, and robust camera tracking. In this system, we introduce hierarchical semantic representation to allow multi-level semantic comprehension for top-down structured semantic map** of the scene. In addition, to fully utiliz… ▽ More

    Submitted 27 March, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

    Comments: Accepted to CVPR 2024

  36. arXiv:2311.09346  [pdf, other

    cs.CV cs.LG cs.RO

    Nothing Stands Still: A Spatiotemporal Benchmark on 3D Point Cloud Registration Under Large Geometric and Temporal Change

    Authors: Tao Sun, Yan Hao, Shengyu Huang, Silvio Savarese, Konrad Schindler, Marc Pollefeys, Iro Armeni

    Abstract: Building 3D geometric maps of man-made spaces is a well-established and active field that is fundamental to computer vision and robotics. However, considering the evolving nature of built environments, it is essential to question the capabilities of current map** efforts in handling temporal changes. In addition, spatiotemporal map** holds significant potential for achieving sustainability and… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: 27 pages, 29 figures. For the project page, see http://nothing-stands-still.com

  37. arXiv:2311.03345  [pdf, other

    cs.CV

    Long-Term Invariant Local Features via Implicit Cross-Domain Correspondences

    Authors: Zador Pataki, Mohammad Altillawi, Menelaos Kanakis, Rémi Pautrat, Fengyi Shen, Ziyuan Liu, Luc Van Gool, Marc Pollefeys

    Abstract: Modern learning-based visual feature extraction networks perform well in intra-domain localization, however, their performance significantly declines when image pairs are captured across long-term visual domain variations, such as different seasonal and daytime variations. In this paper, our first contribution is a benchmark to investigate the performance impact of long-term variations on visual l… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: 14 pages + 5 pages appendix, 13 figures

  38. arXiv:2310.06984  [pdf, other

    cs.CV cs.RO

    Leveraging Neural Radiance Fields for Uncertainty-Aware Visual Localization

    Authors: Le Chen, Weirong Chen, Rui Wang, Marc Pollefeys

    Abstract: As a promising fashion for visual localization, scene coordinate regression (SCR) has seen tremendous progress in the past decade. Most recent methods usually adopt neural networks to learn the map** from image pixels to 3D scene coordinates, which requires a vast amount of annotated training data. We propose to leverage Neural Radiance Fields (NeRF) to generate training samples for SCR. Despite… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: 8 pages, 5 figures

  39. arXiv:2310.05133  [pdf, other

    cs.CV cs.LG

    Geometry Aware Field-to-field Transformations for 3D Semantic Segmentation

    Authors: Dominik Hollidt, Clinton Wang, Polina Golland, Marc Pollefeys

    Abstract: We present a novel approach to perform 3D semantic segmentation solely from 2D supervision by leveraging Neural Radiance Fields (NeRFs). By extracting features along a surface point cloud, we achieve a compact representation of the scene which is sample-efficient and conducive to 3D reasoning. Learning this feature space in an unsupervised manner via masked autoencoding enables few-shot segmentati… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: 8 pages

  40. arXiv:2310.02650  [pdf, other

    cs.RO cs.CV

    Active Visual Localization for Multi-Agent Collaboration: A Data-Driven Approach

    Authors: Matthew Hanlon, Boyang Sun, Marc Pollefeys, Hermann Blum

    Abstract: Rather than having each newly deployed robot create its own map of its surroundings, the growing availability of SLAM-enabled devices provides the option of simply localizing in a map of another robot or device. In cases such as multi-robot or human-robot collaboration, localizing all agents in the same map is even necessary. However, localizing e.g. a ground robot in the map of a drone or head-mo… ▽ More

    Submitted 8 May, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

  41. arXiv:2310.02392  [pdf, other

    cs.RO

    A 3D Mixed Reality Interface for Human-Robot Teaming

    Authors: Jiaqi Chen, Boyang Sun, Marc Pollefeys, Hermann Blum

    Abstract: This paper presents a mixed-reality human-robot teaming system. It allows human operators to see in real-time where robots are located, even if they are not in line of sight. The operator can also visualize the map that the robots create of their environment and can easily send robots to new goal positions. The system mainly consists of a map** and a control module. The map** module is a real-… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  42. arXiv:2309.17024  [pdf, other

    cs.CV

    HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World

    Authors: Xin Wang, Taein Kwon, Mahdi Rad, Bowen Pan, Ishani Chakraborty, Sean Andrist, Dan Bohus, Ashley Feniello, Bugra Tekin, Felipe Vieira Frujeri, Neel Joshi, Marc Pollefeys

    Abstract: Building an interactive AI assistant that can perceive, reason, and collaborate with humans in the real world has been a long-standing pursuit in the AI community. This work is part of a broader research effort to develop intelligent agents that can interactively guide humans through performing tasks in the physical world. As a first step in this direction, we introduce HoloAssist, a large-scale e… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: ICCV 2023

  43. arXiv:2309.16040  [pdf, other

    cs.CV

    Handbook on Leveraging Lines for Two-View Relative Pose Estimation

    Authors: Petr Hruby, Shaohui Liu, Rémi Pautrat, Marc Pollefeys, Daniel Barath

    Abstract: We propose an approach for estimating the relative pose between calibrated image pairs by jointly exploiting points, lines, and their coincidences in a hybrid manner. We investigate all possible configurations where these data modalities can be used together and review the minimal solvers available in the literature. Our hybrid framework combines the advantages of all configurations, enabling robu… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: 2 view relative pose from special configurations of line

    MSC Class: 68T45 ACM Class: I.4.5; I.4.8

  44. arXiv:2309.16023  [pdf, other

    cs.CV

    Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature

    Authors: Shengze **, Daniel Barath, Marc Pollefeys, Iro Armeni

    Abstract: Point cloud registration has seen recent success with several learning-based methods that focus on correspondence matching and, as such, optimize only for this objective. Following the learning step of correspondence matching, they evaluate the estimated rigid transformation with a RANSAC-like framework. While it is an indispensable component of these methods, it prevents a fully end-to-end traini… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  45. arXiv:2309.14737  [pdf, other

    cs.RO cs.CV

    Volumetric Semantically Consistent 3D Panoptic Map**

    Authors: Yang Miao, Iro Armeni, Marc Pollefeys, Daniel Barath

    Abstract: We introduce an online 2D-to-3D semantic instance map** algorithm aimed at generating comprehensive, accurate, and efficient semantic 3D maps suitable for autonomous agents in unstructured environments. The proposed approach is based on a Voxel-TSDF representation used in recent algorithms. It introduces novel ways of integrating semantic prediction confidence during map**, producing semantic… ▽ More

    Submitted 5 March, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: 8 pages, 2 figures

  46. arXiv:2309.10001  [pdf, other

    cs.CV

    CaSAR: Contact-aware Skeletal Action Recognition

    Authors: Junan Lin, Zhichao Sun, Enjie Cao, Taein Kwon, Mahdi Rad, Marc Pollefeys

    Abstract: Skeletal Action recognition from an egocentric view is important for applications such as interfaces in AR/VR glasses and human-robot interaction, where the device has limited resources. Most of the existing skeletal action recognition approaches use 3D coordinates of hand joints and 8-corner rectangular bounding boxes of objects as inputs, but they do not capture how the hands and objects interac… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

    Comments: 10 pages, 8 figures

  47. arXiv:2309.06441  [pdf, other

    cs.CV cs.AI cs.GR

    Learning Disentangled Avatars with Hybrid 3D Representations

    Authors: Yao Feng, Weiyang Liu, Timo Bolkart, **long Yang, Marc Pollefeys, Michael J. Black

    Abstract: Tremendous efforts have been made to learn animatable and photorealistic human avatars. Towards this end, both explicit and implicit 3D representations are heavily studied for a holistic modeling and capture of the whole human (e.g., body, clothing, face and hair), but neither representation is an optimal choice in terms of representation efficacy since different parts of the human avatar have dif… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: home page: https://yfeng95.github.io/delta. arXiv admin note: text overlap with arXiv:2210.01868

  48. arXiv:2309.03160  [pdf, other

    cs.CV

    ResFields: Residual Neural Fields for Spatiotemporal Signals

    Authors: Marko Mihajlovic, Sergey Prokudin, Marc Pollefeys, Siyu Tang

    Abstract: Neural fields, a category of neural networks trained to represent high-frequency signals, have gained significant attention in recent years due to their impressive performance in modeling complex 3D data, such as signed distance (SDFs) or radiance fields (NeRFs), via a single multi-layer perceptron (MLP). However, despite the power and simplicity of representing signals with an MLP, these methods… ▽ More

    Submitted 11 February, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: [ICLR 2024 Spotlight] Project and code at: https://markomih.github.io/ResFields/

  49. arXiv:2308.14713  [pdf, other

    cs.CV

    R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras

    Authors: Aron Schmied, Tobias Fischer, Martin Danelljan, Marc Pollefeys, Fisher Yu

    Abstract: Dense 3D reconstruction and ego-motion estimation are key challenges in autonomous driving and robotics. Compared to the complex, multi-modal systems deployed today, multi-camera systems provide a simpler, low-cost alternative. However, camera-based 3D reconstruction of complex dynamic scenes has proven extremely difficult, as existing solutions often produce incomplete or incoherent results. We p… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV 2023. Project page is available at https://www.vis.xyz/pub/r3d3/

  50. arXiv:2308.10694  [pdf, other

    cs.CV

    Vanishing Point Estimation in Uncalibrated Images with Prior Gravity Direction

    Authors: Rémi Pautrat, Shaohui Liu, Petr Hruby, Marc Pollefeys, Daniel Barath

    Abstract: We tackle the problem of estimating a Manhattan frame, i.e. three orthogonal vanishing points, and the unknown focal length of the camera, leveraging a prior vertical direction. The direction can come from an Inertial Measurement Unit that is a standard component of recent consumer devices, e.g., smartphones. We provide an exhaustive analysis of minimal line configurations and derive two new 2-lin… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV 2023