Skip to main content

Showing 1–50 of 58 results for author: Oswald, M R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09415  [pdf, other

    cs.CV cs.LG

    An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

    Authors: Duy-Kien Nguyen, Mahmoud Assran, Unnat Jain, Martin R. Oswald, Cees G. M. Snoek, Xinlei Chen

    Abstract: This work does not introduce a new method. Instead, we present an interesting finding that questions the necessity of the inductive bias -- locality in modern computer vision architectures. Concretely, we find that vanilla Transformers can operate by directly treating each individual pixel as a token and achieve highly performant results. This is substantially different from the popular design in… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Technical report, 23 pages

  2. arXiv:2406.09126  [pdf, other

    cs.CV

    Auto-Vocabulary Segmentation for LiDAR Points

    Authors: Weijie Wei, Osman Ülger, Fatemeh Karimi Najadasl, Theo Gevers, Martin R. Oswald

    Abstract: Existing perception methods for autonomous driving fall short of recognizing unknown entities not covered in the training data. Open-vocabulary methods offer promising capabilities in detecting any object but are limited by user-specified queries representing target classes. We propose AutoVoc3D, a framework for automatic object class recognition and open-ended segmentation. Evaluation on nuScenes… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by CVPR 2024 OpenSun3D Workshop

  3. arXiv:2405.16544  [pdf, other

    cs.CV

    Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians

    Authors: Erik Sandström, Keisuke Tateno, Michael Oechsle, Michael Niemeyer, Luc Van Gool, Martin R. Oswald, Federico Tombari

    Abstract: 3D Gaussian Splatting has emerged as a powerful representation of geometry and appearance for RGB-only dense Simultaneous Localization and Map** (SLAM), as it provides a compact dense map representation while enabling efficient and high-quality map rendering. However, existing methods show significantly worse reconstruction quality than competing methods using other 3D representations, e.g. neur… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 21 pages

  4. arXiv:2403.19549  [pdf, other

    cs.CV cs.RO

    GlORIE-SLAM: Globally Optimized RGB-only Implicit Encoding Point Cloud SLAM

    Authors: Ganlin Zhang, Erik Sandström, Youmin Zhang, Manthan Patel, Luc Van Gool, Martin R. Oswald

    Abstract: Recent advancements in RGB-only dense Simultaneous Localization and Map** (SLAM) have predominantly utilized grid-based neural implicit encodings and/or struggle to efficiently realize global map and pose consistency. To this end, we propose an efficient RGB-only dense SLAM system using a flexible neural point cloud scene representation that adapts to keyframe poses and depth updates, without ne… ▽ More

    Submitted 27 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  5. arXiv:2402.13255  [pdf, other

    cs.CV cs.RO

    How NeRFs and 3D Gaussian Splatting are Resha** SLAM: a Survey

    Authors: Fabio Tosi, Youmin Zhang, Ziren Gong, Erik Sandström, Stefano Mattoccia, Martin R. Oswald, Matteo Poggi

    Abstract: Over the past two decades, research in the field of Simultaneous Localization and Map** (SLAM) has undergone a significant evolution, highlighting its critical role in enabling autonomous exploration of unknown environments. This evolution ranges from hand-crafted methods, through the era of deep learning, to more recent developments focused on Neural Radiance Fields (NeRFs) and 3D Gaussian Spla… ▽ More

    Submitted 11 April, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  6. arXiv:2402.09944  [pdf, other

    cs.CV

    Loopy-SLAM: Dense Neural SLAM with Loop Closures

    Authors: Lorenzo Liso, Erik Sandström, Vladimir Yugay, Luc Van Gool, Martin R. Oswald

    Abstract: Neural RGBD SLAM techniques have shown promise in dense Simultaneous Localization And Map** (SLAM), yet face challenges such as error accumulation during camera tracking resulting in distorted maps. In response, we introduce Loopy-SLAM that globally optimizes poses and the dense 3D model. We use frame-to-model tracking using a data-driven point-based submap generation method and trigger loop clo… ▽ More

    Submitted 10 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  7. arXiv:2401.10786  [pdf, other

    cs.CV

    Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion

    Authors: Zuoyue Li, Zhenqiang Li, Zhaopeng Cui, Marc Pollefeys, Martin R. Oswald

    Abstract: Directly generating scenes from satellite imagery offers exciting possibilities for integration into applications like games and map services. However, challenges arise from significant view changes and scene scale. Previous efforts mainly focused on image or video generation, lacking exploration into the adaptability of scene generation for arbitrary views. Existing 3D generation works either ope… ▽ More

    Submitted 1 April, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Journal ref: CVPR 2024

  8. arXiv:2401.03771  [pdf, other

    cs.CV

    NeRFmentation: NeRF-based Augmentation for Monocular Depth Estimation

    Authors: Casimir Feldmann, Niall Siegenheim, Nikolas Hars, Lovro Rabuzin, Mert Ertugrul, Luca Wolfart, Marc Pollefeys, Zuria Bauer, Martin R. Oswald

    Abstract: The capabilities of monocular depth estimation (MDE) models are limited by the availability of sufficient and diverse datasets. In the case of MDE models for autonomous driving, this issue is exacerbated by the linearity of the captured data trajectories. We propose a NeRF-based data augmentation pipeline to introduce synthetic data with more diverse viewing directions into training datasets and d… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  9. arXiv:2312.10217  [pdf, other

    cs.CV

    T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning

    Authors: Weijie Wei, Fatemeh Karimi Nejadasl, Theo Gevers, Martin R. Oswald

    Abstract: The scarcity of annotated data in LiDAR point cloud understanding hinders effective representation learning. Consequently, scholars have been actively investigating efficacious self-supervised pre-training paradigms. Nevertheless, temporal information, which is inherent in the LiDAR point cloud sequence, is consistently disregarded. To better utilize this property, we propose an effective pre-trai… ▽ More

    Submitted 21 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: Under review

  10. arXiv:2312.10070  [pdf, other

    cs.CV cs.RO

    Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting

    Authors: Vladimir Yugay, Yue Li, Theo Gevers, Martin R. Oswald

    Abstract: We present a dense simultaneous localization and map** (SLAM) method that uses 3D Gaussians as a scene representation. Our approach enables interactive-time reconstruction and photo-realistic rendering from real-world single-camera RGBD videos. To this end, we propose a novel effective strategy for seeding new Gaussians for newly explored areas and their effective online optimization that is ind… ▽ More

    Submitted 22 March, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

  11. arXiv:2312.04539  [pdf, other

    cs.CV

    Auto-Vocabulary Semantic Segmentation

    Authors: Osman Ülger, Maksymilian Kulicki, Yuki Asano, Martin R. Oswald

    Abstract: Open-ended image understanding tasks gained significant attention from the research community, particularly with the emergence of Vision-Language Models. Open-Vocabulary Segmentation (OVS) methods are capable of performing semantic segmentation without relying on a fixed vocabulary, and in some cases, they operate without the need for training or fine-tuning. However, OVS methods typically require… ▽ More

    Submitted 20 March, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

  12. arXiv:2311.18512  [pdf, other

    cs.CV cs.LG

    Revisiting Proposal-based Object Detection

    Authors: Aritra Bhowmik, Martin R. Oswald, Pascal Mettes, Cees G. M. Snoek

    Abstract: This paper revisits the pipeline for detecting objects in images with proposals. For any object detector, the obtained box proposals or queries need to be classified and regressed towards ground truth boxes. The common solution for the final predictions is to directly maximize the overlap between each proposal and the ground truth box, followed by a winner-takes-all ranking or non-maximum suppress… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: 10 pages, 7 figures

  13. arXiv:2311.18068  [pdf, other

    cs.CV

    ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic Reconstruction

    Authors: Silvan Weder, Francis Engelmann, Johannes L. Schönberger, Akihito Seki, Marc Pollefeys, Martin R. Oswald

    Abstract: We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames. Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality. To overcome the inherent challenges of online methods, we make two main contributions. First, to effectively extract information from the… ▽ More

    Submitted 3 December, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

  14. arXiv:2310.07573  [pdf, other

    cs.CV

    Relational Prior Knowledge Graphs for Detection and Instance Segmentation

    Authors: Osman Ülger, Yu Wang, Ysbrand Galama, Sezer Karaoglu, Theo Gevers, Martin R. Oswald

    Abstract: Humans have a remarkable ability to perceive and reason about the world around them by understanding the relationships between objects. In this paper, we investigate the effectiveness of using such relationships for object detection and instance segmentation. To this end, we propose a Relational Prior-based Feature Enhancement Model (RP-FEM), a graph transformer that enhances object proposal featu… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Published in ICCV2023 SG2RL Workshop

  15. arXiv:2310.05920  [pdf, other

    cs.CV

    SimPLR: A Simple and Plain Transformer for Scaling-Efficient Object Detection and Segmentation

    Authors: Duy-Kien Nguyen, Martin R. Oswald, Cees G. M. Snoek

    Abstract: The ability to detect objects in images at varying scales has played a pivotal role in the design of modern object detectors. Despite considerable progress in removing hand-crafted components and simplifying the architecture with transformers, multi-scale feature maps and/or pyramid design remain a key factor for their empirical success. In this paper, we show that this reliance on either feature… ▽ More

    Submitted 15 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

  16. arXiv:2310.00401  [pdf, other

    cs.LG cs.RO

    Learning High-level Semantic-Relational Concepts for SLAM

    Authors: Jose Andres Millan-Romera, Hriday Bavle, Muhammad Shaheer, Martin R. Oswald, Holger Voos, Jose Luis Sanchez-Lopez

    Abstract: Recent works on SLAM extend their pose graphs with higher-level semantic concepts like Rooms exploiting relationships between them, to provide, not only a richer representation of the situation/environment but also to improve the accuracy of its estimation. Concretely, our previous work, Situational Graphs (S-Graphs+), a pioneer in jointly leveraging semantic relationships in the factor optimizati… ▽ More

    Submitted 22 March, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

  17. arXiv:2309.17162  [pdf, other

    cs.CV

    APNet: Urban-level Scene Segmentation of Aerial Images and Point Clouds

    Authors: Weijie Wei, Martin R. Oswald, Fatemeh Karimi Nejadasl, Theo Gevers

    Abstract: In this paper, we focus on semantic segmentation method for point clouds of urban scenes. Our fundamental concept revolves around the collaborative utilization of diverse scene representations to benefit from different context information and network architectures. To this end, the proposed network architecture, called APNet, is split into two branches: a point cloud branch and an aerial image bra… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV Workshop 2023 and selected as an oral

  18. Automatic registration with continuous pose updates for marker-less surgical navigation in spine surgery

    Authors: Florentin Liebmann, Marco von Atzigen, Dominik Stütz, Julian Wolf, Lukas Zingg, Daniel Suter, Laura Leoty, Hooman Esfandiari, Jess G. Snedeker, Martin R. Oswald, Marc Pollefeys, Mazda Farshad, Philipp Fürnstahl

    Abstract: Established surgical navigation systems for pedicle screw placement have been proven to be accurate, but still reveal limitations in registration or surgical guidance. Registration of preoperative data to the intraoperative anatomy remains a time-consuming, error-prone task that includes exposure to harmful radiation. Surgical guidance through conventional displays has well-known drawbacks, as inf… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

  19. arXiv:2306.16917  [pdf, other

    cs.CV cs.LG cs.RO

    The Drunkard's Odometry: Estimating Camera Motion in Deforming Scenes

    Authors: David Recasens, Martin R. Oswald, Marc Pollefeys, Javier Civera

    Abstract: Estimating camera motion in deformable scenes poses a complex and open research challenge. Most existing non-rigid structure from motion techniques assume to observe also static scene parts besides deforming scene parts in order to establish an anchoring reference. However, this assumption does not hold true in certain relevant application cases such as endoscopies. Deformable odometry and SLAM pi… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

  20. arXiv:2306.11048  [pdf, other

    cs.CV

    UncLe-SLAM: Uncertainty Learning for Dense Neural SLAM

    Authors: Erik Sandström, Kevin Ta, Luc Van Gool, Martin R. Oswald

    Abstract: We present an uncertainty learning framework for dense neural simultaneous localization and map** (SLAM). Estimating pixel-wise uncertainties for the depth input of dense SLAM methods allows re-weighing the tracking and map** losses towards image regions that contain more suitable information that is more reliable for SLAM. To this end, we propose an online framework for sensor uncertainty est… ▽ More

    Submitted 6 September, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

    Comments: ICCV 2023 Workshop. 20 pages, 9 figures

  21. arXiv:2306.05411  [pdf, other

    cs.CV

    R-MAE: Regions Meet Masked Autoencoders

    Authors: Duy-Kien Nguyen, Vaibhav Aggarwal, Yanghao Li, Martin R. Oswald, Alexander Kirillov, Cees G. M. Snoek, Xinlei Chen

    Abstract: In this work, we explore regions as a potential visual analogue of words for self-supervised image representation learning. Inspired by Masked Autoencoding (MAE), a generative pre-training baseline, we propose masked region autoencoding to learn from groups of pixels or regions. Specifically, we design an architecture which efficiently addresses the one-to-many map** between images and regions,… ▽ More

    Submitted 4 January, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

  22. arXiv:2305.02398  [pdf, other

    cs.CV

    Learning-based Relational Object Matching Across Views

    Authors: Cathrin Elich, Iro Armeni, Martin R. Oswald, Marc Pollefeys, Joerg Stueckler

    Abstract: Intelligent robots require object-level scene understanding to reason about possible tasks and interactions with the environment. Moreover, many perception tasks such as scene reconstruction, image retrieval, or place recognition can benefit from reasoning on the level of objects. While keypoint-based matching can yield strong results for finding correspondences for images with small to medium vie… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: Accepted for publication in IEEE International Conference on Robotics and Automation (ICRA), 2023

    MSC Class: 68T45 ACM Class: I.2.10; I.4.8

  23. arXiv:2304.06419  [pdf, other

    cs.CV cs.GR

    Tracking by 3D Model Estimation of Unknown Objects in Videos

    Authors: Denys Rozumnyi, Jiri Matas, Marc Pollefeys, Vittorio Ferrari, Martin R. Oswald

    Abstract: Most model-free visual object tracking methods formulate the tracking task as object location estimation given by a 2D segmentation or a bounding box in each video frame. We argue that this representation is limited and instead propose to guide and improve 2D tracking with an explicit object representation, namely the textured 3D shape and 6DoF pose in each video frame. Our representation tackles… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

  24. arXiv:2304.04278  [pdf, other

    cs.CV

    Point-SLAM: Dense Neural Point Cloud-based SLAM

    Authors: Erik Sandström, Yue Li, Luc Van Gool, Martin R. Oswald

    Abstract: We propose a dense neural simultaneous localization and map** (SLAM) approach for monocular RGBD input which anchors the features of a neural scene representation in a point cloud that is iteratively generated in an input-dependent data-driven manner. We demonstrate that both tracking and map** can be performed with the same point-based neural scene representation by minimizing an RGBD-based r… ▽ More

    Submitted 12 September, 2023; v1 submitted 9 April, 2023; originally announced April 2023.

    Comments: ICCV 2023. 18 Pages, 12 Figures

  25. arXiv:2303.17209  [pdf, other

    cs.CV

    Human from Blur: Human Pose Tracking from Blurry Images

    Authors: Yiming Zhao, Denys Rozumnyi, Jie Song, Otmar Hilliges, Marc Pollefeys, Martin R. Oswald

    Abstract: We propose a method to estimate 3D human poses from substantially blurred images. The key idea is to tackle the inverse problem of image deblurring by modeling the forward problem with a 3D human model, a texture map, and a sequence of poses to describe human motion. The blurring process is then modeled by a temporal image aggregation step. Using a differentiable renderer, we can solve the inverse… ▽ More

    Submitted 25 September, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: typos and minor error fixed

  26. arXiv:2302.03594  [pdf, other

    cs.CV

    NICER-SLAM: Neural Implicit Scene Encoding for RGB SLAM

    Authors: Zihan Zhu, Songyou Peng, Viktor Larsson, Zhaopeng Cui, Martin R. Oswald, Andreas Geiger, Marc Pollefeys

    Abstract: Neural implicit representations have recently become popular in simultaneous localization and map** (SLAM), especially in dense visual SLAM. However, previous works in this direction either rely on RGB-D sensors, or require a separate monocular SLAM approach for camera tracking and do not produce high-fidelity dense 3D scene reconstruction. In this paper, we present NICER-SLAM, a dense RGB SLAM… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    Comments: Video: https://youtu.be/tUXzqEZWg2w

  27. arXiv:2212.12395  [pdf, other

    cs.CV

    Detecting Objects with Context-Likelihood Graphs and Graph Refinement

    Authors: Aritra Bhowmik, Yu Wang, Nora Baka, Martin R. Oswald, Cees G. M. Snoek

    Abstract: The goal of this paper is to detect objects by exploiting their interrelationships. Contrary to existing methods, which learn objects and relations separately, our key idea is to learn the object-relation distribution jointly. We first propose a novel way of creating a graphical representation of an image from inter-object relation priors and initial class predictions, we call a context-likelihood… ▽ More

    Submitted 27 September, 2023; v1 submitted 23 December, 2022; originally announced December 2022.

    Comments: 13 pages, 8 figures. In Proceedings of International Conference on Computer Vision (ICCV) 2023

  28. arXiv:2212.07766  [pdf, other

    cs.CV

    DeepLSD: Line Segment Detection and Refinement with Deep Image Gradients

    Authors: Rémi Pautrat, Daniel Barath, Viktor Larsson, Martin R. Oswald, Marc Pollefeys

    Abstract: Line segments are ubiquitous in our human-made world and are increasingly used in vision tasks. They are complementary to feature points thanks to their spatial extent and the structural information they provide. Traditional line detectors based on the image gradient are extremely fast and accurate, but lack robustness in noisy images and challenging conditions. Their learned counterparts are more… ▽ More

    Submitted 28 March, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: Accepted at CVPR 2023

  29. NeuralMeshing: Differentiable Meshing of Implicit Neural Representations

    Authors: Mathias Vetsch, Sandro Lombardi, Marc Pollefeys, Martin R. Oswald

    Abstract: The generation of triangle meshes from point clouds, i.e. meshing, is a core task in computer graphics and computer vision. Traditional techniques directly construct a surface mesh using local decision heuristics, while some recent methods based on neural implicit representations try to leverage data-driven approaches for this meshing process. However, it is challenging to define a learnable repre… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: This preprint has not undergone any post-submission improvements or corrections. The Version of Record of this contribution is published in "44th DAGM German Conference on Pattern Recognition (GCPR 2022), Konstanz, Germany, September 27-30, 2022, Proceedings", and is available at https://doi.org/10.1007/978-3-031-16788-1_20

  30. arXiv:2207.11467  [pdf, other

    cs.CV cs.AI

    CompNVS: Novel View Synthesis with Scene Completion

    Authors: Zuoyue Li, Tianxing Fan, Zhenqiang Li, Zhaopeng Cui, Yoichi Sato, Marc Pollefeys, Martin R. Oswald

    Abstract: We introduce a scalable framework for novel view synthesis from RGB-D images with largely incomplete scene coverage. While generative neural approaches have demonstrated spectacular results on 2D images, they have not yet achieved similar photorealistic results in combination with scene completion where a spatial 3D scene understanding is essential. To this end, we propose a generative pipeline pe… ▽ More

    Submitted 23 July, 2022; originally announced July 2022.

    Comments: ECCV 2022

  31. arXiv:2204.03353  [pdf, other

    cs.CV

    Learning Online Multi-Sensor Depth Fusion

    Authors: Erik Sandström, Martin R. Oswald, Suryansh Kumar, Silvan Weder, Fisher Yu, Cristian Sminchisescu, Luc Van Gool

    Abstract: Many hand-held or mixed reality devices are used with a single sensor for 3D reconstruction, although they often comprise multiple sensors. Multi-sensor depth fusion is able to substantially improve the robustness and accuracy of 3D reconstruction methods, but existing techniques are not robust enough to handle sensors which operate with diverse value ranges as well as noise and outlier statistics… ▽ More

    Submitted 21 September, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted to ECCV 2022. 31 pages, 17 figures, 15 Tables

  32. arXiv:2203.15601  [pdf, other

    cs.CV cs.LG eess.IV

    Photographic Visualization of Weather Forecasts with Generative Adversarial Networks

    Authors: Christian Sigg, Flavia Cavallaro, Tobias Günther, Martin R. Oswald

    Abstract: Outdoor webcam images are an information-dense yet accessible visualization of past and present weather conditions, and are consulted by meteorologists and the general public alike. Weather forecasts, however, are still communicated as text, pictograms or charts. We therefore introduce a novel method that uses photographic images to also visualize future weather conditions. This is challenging,… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

  33. NVS-MonoDepth: Improving Monocular Depth Prediction with Novel View Synthesis

    Authors: Zuria Bauer, Zuoyue Li, Sergio Orts-Escolano, Miguel Cazorla, Marc Pollefeys, Martin R. Oswald

    Abstract: Building upon the recent progress in novel view synthesis, we propose its application to improve monocular depth estimation. In particular, we propose a novel training method split in three main steps. First, the prediction results of a monocular depth network are warped to an additional view point. Second, we apply an additional image synthesis network, which corrects and improves the quality of… ▽ More

    Submitted 22 December, 2021; originally announced December 2021.

    Comments: 8 pages (main paper), 9 pages (supplementary material), 14 figures, 4 tables

    Journal ref: 2021 International Conference on 3D Vision (3DV)

  34. arXiv:2112.12130  [pdf, other

    cs.CV

    NICE-SLAM: Neural Implicit Scalable Encoding for SLAM

    Authors: Zihan Zhu, Songyou Peng, Viktor Larsson, Weiwei Xu, Hujun Bao, Zhaopeng Cui, Martin R. Oswald, Marc Pollefeys

    Abstract: Neural implicit representations have recently shown encouraging results in various domains, including promising progress in simultaneous localization and map** (SLAM). Nevertheless, existing methods produce over-smoothed scene reconstructions and have difficulty scaling up to large scenes. These limitations are mainly due to their simple fully-connected network architecture that does not incorpo… ▽ More

    Submitted 21 April, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

    Comments: CVPR 2022, first two authors contributed equally. Project page: https://pengsongyou.github.io/nice-slam

  35. arXiv:2111.14465  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Motion-from-Blur: 3D Shape and Motion Estimation of Motion-blurred Objects in Videos

    Authors: Denys Rozumnyi, Martin R. Oswald, Vittorio Ferrari, Marc Pollefeys

    Abstract: We propose a method for jointly estimating the 3D motion, 3D shape, and appearance of highly motion-blurred objects from a video. To this end, we model the blurred appearance of a fast moving object in a generative fashion by parametrizing its 3D position, rotation, velocity, acceleration, bounces, shape, and texture over the duration of a predefined time window spanning multiple frames. Using dif… ▽ More

    Submitted 7 April, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: CVPR 2022 camera-ready

    Journal ref: 2022 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  36. arXiv:2111.13087  [pdf, other

    cs.CV

    BoxeR: Box-Attention for 2D and 3D Transformers

    Authors: Duy-Kien Nguyen, Jihong Ju, Olaf Booij, Martin R. Oswald, Cees G. M. Snoek

    Abstract: In this paper, we propose a simple attention mechanism, we call box-attention. It enables spatial interaction between grid features, as sampled from boxes of interest, and improves the learning capability of transformers for several vision tasks. Specifically, we present BoxeR, short for Box Transformer, which attends to a set of boxes by predicting their transformation from a reference window on… ▽ More

    Submitted 25 March, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

    Comments: In Proceeding of CVPR'2022

  37. arXiv:2110.06436  [pdf, other

    cs.CV

    Non-local Recurrent Regularization Networks for Multi-view Stereo

    Authors: Qingshan Xu, Martin R. Oswald, Wenbing Tao, Marc Pollefeys, Zhaopeng Cui

    Abstract: In deep multi-view stereo networks, cost regularization is crucial to achieve accurate depth estimation. Since 3D cost volume filtering is usually memory-consuming, recurrent 2D cost map regularization has recently become popular and has shown great potential in reconstructing 3D models of different scales. However, existing recurrent methods only model the local dependencies in the depth domain,… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

  38. arXiv:2108.13995  [pdf, other

    cs.CV

    RealisticHands: A Hybrid Model for 3D Hand Reconstruction

    Authors: Michael Seeber, Roi Poranne, Marc Polleyfeys, Martin R. Oswald

    Abstract: Estimating 3D hand meshes from RGB images robustly is a highly desirable task, made challenging due to the numerous degrees of freedom, and issues such as self similarity and occlusions. Previous methods generally either use parametric 3D hand models or follow a model-free approach. While the former can be considered more robust, e.g. to occlusions, they are less expressive. We propose a hybrid ap… ▽ More

    Submitted 1 February, 2022; v1 submitted 31 August, 2021; originally announced August 2021.

    Comments: International Conference on 3D Vision (3DV) 2021

  39. arXiv:2108.05246  [pdf, other

    cs.CV

    A Real-Time Online Learning Framework for Joint 3D Reconstruction and Semantic Segmentation of Indoor Scenes

    Authors: Davide Menini, Suryansh Kumar, Martin R. Oswald, Erik Sandstrom, Cristian Sminchisescu, Luc Van Gool

    Abstract: This paper presents a real-time online vision framework to jointly recover an indoor scene's 3D structure and semantic label. Given noisy depth maps, a camera trajectory, and 2D semantic labels at train time, the proposed deep neural network based approach learns to fuse the depth over frames with suitable semantic labels in the scene space. Our approach exploits the joint volumetric representatio… ▽ More

    Submitted 28 December, 2021; v1 submitted 11 August, 2021; originally announced August 2021.

    Comments: Accepted for publication at IEEE Robotics and Automation Letters (RA-L), 2022. Draft info: 9 pages, 5 figures, 4 tables

  40. arXiv:2106.08762  [pdf, other

    cs.CV

    Shape from Blur: Recovering Textured 3D Shape and Motion of Fast Moving Objects

    Authors: Denys Rozumnyi, Martin R. Oswald, Vittorio Ferrari, Marc Pollefeys

    Abstract: We address the novel task of jointly reconstructing the 3D shape, texture, and motion of an object from a single motion-blurred image. While previous approaches address the deblurring problem only in the 2D image domain, our proposed rigorous modeling of all object properties in the 3D domain enables the correct description of arbitrary object motion. This leads to significantly better image decom… ▽ More

    Submitted 26 October, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: Accepted to 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

    Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  41. arXiv:2104.03362  [pdf, other

    cs.CV

    SOLD2: Self-supervised Occlusion-aware Line Description and Detection

    Authors: Rémi Pautrat, Juan-Ting Lin, Viktor Larsson, Martin R. Oswald, Marc Pollefeys

    Abstract: Compared to feature point detection and description, detecting and matching line segments offer additional challenges. Yet, line features represent a promising complement to points for multi-view tasks. Lines are indeed well-defined by the image gradient, frequently appear even in poorly textured areas and offer robust structural cues. We thus hereby introduce the first joint detection and descrip… ▽ More

    Submitted 9 April, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

    Comments: 17 pages, Accepted at CVPR 2021 (Oral)

  42. arXiv:2012.15680  [pdf, other

    cs.CV

    Unsupervised Monocular Depth Reconstruction of Non-Rigid Scenes

    Authors: Ayça Takmaz, Danda Pani Paudel, Thomas Probst, Ajad Chhatkuli, Martin R. Oswald, Luc Van Gool

    Abstract: Monocular depth reconstruction of complex and dynamic scenes is a highly challenging problem. While for rigid scenes learning-based methods have been offering promising results even in unsupervised cases, there exists little to no literature addressing the same for dynamic and deformable scenes. In this work, we present an unsupervised monocular framework for dense depth estimation of dynamic scen… ▽ More

    Submitted 28 October, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

  43. arXiv:2012.14240  [pdf, other

    cs.CV

    DeepSurfels: Learning Online Appearance Fusion

    Authors: Marko Mihajlovic, Silvan Weder, Marc Pollefeys, Martin R. Oswald

    Abstract: We present DeepSurfels, a novel hybrid scene representation for geometry and appearance information. DeepSurfels combines explicit and neural building blocks to jointly encode geometry and appearance information. In contrast to established representations, DeepSurfels better represents high-frequency textures, is well-suited for online updates of appearance information, and can be easily combined… ▽ More

    Submitted 30 May, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

    Comments: In Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2021

  44. FMODetect: Robust Detection of Fast Moving Objects

    Authors: Denys Rozumnyi, Jiri Matas, Filip Sroubek, Marc Pollefeys, Martin R. Oswald

    Abstract: We propose the first learning-based approach for fast moving objects detection. Such objects are highly blurred and move over large distances within one video frame. Fast moving objects are associated with a deblurring and matting problem, also called deblatting. We show that the separation of deblatting into consecutive matting and deblurring allows achieving real-time performance, i.e. an order… ▽ More

    Submitted 17 August, 2021; v1 submitted 15 December, 2020; originally announced December 2020.

    Comments: Accepted to International Conference on Computer Vision (ICCV) 2021

    Journal ref: 2021 IEEE/CVF International Conference on Computer Vision (ICCV)

  45. arXiv:2012.06628  [pdf, other

    cs.CV

    Sat2Vid: Street-view Panoramic Video Synthesis from a Single Satellite Image

    Authors: Zuoyue Li, Zhenqiang Li, Zhaopeng Cui, Rongjun Qin, Marc Pollefeys, Martin R. Oswald

    Abstract: We present a novel method for synthesizing both temporally and geometrically consistent street-view panoramic video from a single satellite image and camera trajectory. Existing cross-view synthesis approaches focus on images, while video synthesis in such a case has not yet received enough attention. For geometrical and temporal consistency, our approach explicitly creates a 3D point cloud repres… ▽ More

    Submitted 5 May, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

    Comments: Technical Report

  46. DeFMO: Deblurring and Shape Recovery of Fast Moving Objects

    Authors: Denys Rozumnyi, Martin R. Oswald, Vittorio Ferrari, Jiri Matas, Marc Pollefeys

    Abstract: Objects moving at high speed appear significantly blurred when captured with cameras. The blurry appearance is especially ambiguous when the object has complex shape or texture. In such cases, classical methods, or even humans, are unable to recover the object's appearance and motion. We propose a method that, given a single image with its estimated background, outputs the object's appearance and… ▽ More

    Submitted 30 March, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

    Comments: CVPR 2021 camera-ready

    Journal ref: 2021 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  47. arXiv:2011.14791  [pdf, other

    cs.CV

    NeuralFusion: Online Depth Fusion in Latent Space

    Authors: Silvan Weder, Johannes L. Schönberger, Marc Pollefeys, Martin R. Oswald

    Abstract: We present a novel online depth map fusion approach that learns depth map aggregation in a latent feature space. While previous fusion methods use an explicit scene representation like signed distance functions (SDFs), we propose a learned feature representation for the fusion. The key idea is a separation between the scene representation used for the fusion and the output scene representation, vi… ▽ More

    Submitted 8 June, 2021; v1 submitted 30 November, 2020; originally announced November 2020.

  48. Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using Deep Shape Priors

    Authors: Cathrin Elich, Martin R. Oswald, Marc Pollefeys, Joerg Stueckler

    Abstract: Representing scenes at the granularity of objects is a prerequisite for scene understanding and decision making. We propose PriSMONet, a novel approach based on Prior Shape knowledge for learning Multi-Object 3D scene decomposition and representations from single images. Our approach learns to decompose images of synthetic scenes with multiple objects on a planar surface into its constituent scene… ▽ More

    Submitted 3 May, 2022; v1 submitted 8 October, 2020; originally announced October 2020.

    Comments: Preprint accepted to Computer Vision and Image Understanding

  49. arXiv:2009.10467  [pdf, other

    cs.CV cs.LG

    Self-Supervised Learning of Non-Rigid Residual Flow and Ego-Motion

    Authors: Ivan Tishchenko, Sandro Lombardi, Martin R. Oswald, Marc Pollefeys

    Abstract: Most of the current scene flow methods choose to model scene flow as a per point translation vector without differentiating between static and dynamic components of 3D motion. In this work we present an alternative method for end-to-end scene flow learning by joint estimation of non-rigid residual flow and ego-motion flow for dynamic 3D scenes. We propose to learn the relative rigid transformation… ▽ More

    Submitted 19 October, 2020; v1 submitted 22 September, 2020; originally announced September 2020.

    Comments: Accepted to 3DV 2020 (oral)

  50. arXiv:2008.00096  [pdf, other

    cs.CV

    KAPLAN: A 3D Point Descriptor for Shape Completion

    Authors: Audrey Richard, Ian Cherabier, Martin R. Oswald, Marc Pollefeys, Konrad Schindler

    Abstract: We present a novel 3D shape completion method that operates directly on unstructured point clouds, thus avoiding resource-intensive data structures like voxel grids. To this end, we introduce KAPLAN, a 3D point descriptor that aggregates local shape information via a series of 2D convolutions. The key idea is to project the points in a local neighborhood onto multiple planes with different orienta… ▽ More

    Submitted 16 October, 2020; v1 submitted 31 July, 2020; originally announced August 2020.

    Comments: 18 pages, 15 figures