Skip to main content

Showing 1–2 of 2 results for author: Vlassis, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2306.02329  [pdf, other

    cs.CV

    Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes

    Authors: Alexandros Delitzas, Maria Parelli, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis, Gregor Bachmann, Thomas Hofmann

    Abstract: Training models to apply common-sense linguistic knowledge and visual concepts from 2D images to 3D scene understanding is a promising direction that researchers have only recently started to explore. However, it still remains understudied whether 2D distilled knowledge can provide useful representations for downstream 3D vision-language tasks such as 3D question answering. In this paper, we propo… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: The first two authors contributed equally. arXiv admin note: text overlap with arXiv:2304.06061

  2. arXiv:2304.06061  [pdf, other

    cs.CV

    CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes

    Authors: Maria Parelli, Alexandros Delitzas, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis, Gregor Bachmann, Thomas Hofmann

    Abstract: Training models to apply linguistic knowledge and visual concepts from 2D images to 3D world understanding is a promising direction that researchers have only recently started to explore. In this work, we design a novel 3D pre-training Vision-Language method that helps a model learn semantically meaningful and transferable 3D scene point cloud representations. We inject the representational power… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: CVPRW 2023. Code will be made publicly available: https://github.com/AlexDelitzas/3D-VQA