Skip to main content

Showing 1–3 of 3 results for author: Hars, N

.
  1. arXiv:2401.03771  [pdf, other

    cs.CV

    NeRFmentation: NeRF-based Augmentation for Monocular Depth Estimation

    Authors: Casimir Feldmann, Niall Siegenheim, Nikolas Hars, Lovro Rabuzin, Mert Ertugrul, Luca Wolfart, Marc Pollefeys, Zuria Bauer, Martin R. Oswald

    Abstract: The capabilities of monocular depth estimation (MDE) models are limited by the availability of sufficient and diverse datasets. In the case of MDE models for autonomous driving, this issue is exacerbated by the linearity of the captured data trajectories. We propose a NeRF-based data augmentation pipeline to introduce synthetic data with more diverse viewing directions into training datasets and d… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  2. arXiv:2306.02329  [pdf, other

    cs.CV

    Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes

    Authors: Alexandros Delitzas, Maria Parelli, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis, Gregor Bachmann, Thomas Hofmann

    Abstract: Training models to apply common-sense linguistic knowledge and visual concepts from 2D images to 3D scene understanding is a promising direction that researchers have only recently started to explore. However, it still remains understudied whether 2D distilled knowledge can provide useful representations for downstream 3D vision-language tasks such as 3D question answering. In this paper, we propo… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: The first two authors contributed equally. arXiv admin note: text overlap with arXiv:2304.06061

  3. arXiv:2304.06061  [pdf, other

    cs.CV

    CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes

    Authors: Maria Parelli, Alexandros Delitzas, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis, Gregor Bachmann, Thomas Hofmann

    Abstract: Training models to apply linguistic knowledge and visual concepts from 2D images to 3D world understanding is a promising direction that researchers have only recently started to explore. In this work, we design a novel 3D pre-training Vision-Language method that helps a model learn semantically meaningful and transferable 3D scene point cloud representations. We inject the representational power… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: CVPRW 2023. Code will be made publicly available: https://github.com/AlexDelitzas/3D-VQA