Skip to main content

Showing 1–4 of 4 results for author: Parelli, M

.
  1. arXiv:2311.18448  [pdf, other

    cs.CV

    HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video

    Authors: Zicong Fan, Maria Parelli, Maria Eleni Kadoglou, Muhammed Kocabas, Xu Chen, Michael J. Black, Otmar Hilliges

    Abstract: Since humans interact with diverse objects every day, the holistic 3D capture of these interactions is important to understand and model human behaviour. However, most existing methods for hand-object reconstruction from RGB either assume pre-scanned object templates or heavily rely on limited 3D hand-object data, restricting their ability to scale and generalize to more unconstrained interaction… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  2. arXiv:2309.03726  [pdf, other

    cs.CV

    Interpretable Visual Question Answering via Reasoning Supervision

    Authors: Maria Parelli, Dimitrios Mallis, Markos Diomataris, Vassilis Pitsikalis

    Abstract: Transformer-based architectures have recently demonstrated remarkable performance in the Visual Question Answering (VQA) task. However, such models are likely to disregard crucial visual cues and often rely on multimodal shortcuts and inherent biases of the language modality to predict the correct answer, a phenomenon commonly referred to as lack of visual grounding. In this work, we alleviate thi… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  3. arXiv:2306.02329  [pdf, other

    cs.CV

    Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes

    Authors: Alexandros Delitzas, Maria Parelli, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis, Gregor Bachmann, Thomas Hofmann

    Abstract: Training models to apply common-sense linguistic knowledge and visual concepts from 2D images to 3D scene understanding is a promising direction that researchers have only recently started to explore. However, it still remains understudied whether 2D distilled knowledge can provide useful representations for downstream 3D vision-language tasks such as 3D question answering. In this paper, we propo… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: The first two authors contributed equally. arXiv admin note: text overlap with arXiv:2304.06061

  4. arXiv:2304.06061  [pdf, other

    cs.CV

    CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes

    Authors: Maria Parelli, Alexandros Delitzas, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis, Gregor Bachmann, Thomas Hofmann

    Abstract: Training models to apply linguistic knowledge and visual concepts from 2D images to 3D world understanding is a promising direction that researchers have only recently started to explore. In this work, we design a novel 3D pre-training Vision-Language method that helps a model learn semantically meaningful and transferable 3D scene point cloud representations. We inject the representational power… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: CVPRW 2023. Code will be made publicly available: https://github.com/AlexDelitzas/3D-VQA