Skip to main content

Showing 1–9 of 9 results for author: Gkanatsios, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.10885  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    3D Diffuser Actor: Policy Diffusion with 3D Scene Representations

    Authors: Tsung-Wei Ke, Nikolaos Gkanatsios, Katerina Fragkiadaki

    Abstract: We marry diffusion policies and 3D scene representations for robot manipulation. Diffusion policies learn the action distribution conditioned on the robot and environment state using conditional diffusion models. They have recently shown to outperform both deterministic and alternative state-conditioned action distribution learning methods. 3D robot policies use 3D scene feature representations ag… ▽ More

    Submitted 11 March, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: First two authors contributed equally

  2. arXiv:2402.06559  [pdf, other

    cs.LG cs.AI cs.CL cs.RO

    Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction Following

    Authors: Brian Yang, Huangyuan Su, Nikolaos Gkanatsios, Tsung-Wei Ke, Ayush Jain, Jeff Schneider, Katerina Fragkiadaki

    Abstract: Diffusion models excel at modeling complex and multimodal trajectory distributions for decision-making and control. Reward-gradient guided denoising has been recently proposed to generate trajectories that maximize both a differentiable reward function and the likelihood under the data distribution captured by a diffusion model. Reward-gradient guided denoising requires a differentiable reward fun… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  3. arXiv:2401.02416  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    ODIN: A Single Model for 2D and 3D Segmentation

    Authors: Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki

    Abstract: State-of-the-art models on contemporary 3D segmentation benchmarks like ScanNet consume and label dataset-provided 3D point clouds, obtained through post processing of sensed multiview RGB-D images. They are typically trained in-domain, forego large-scale 2D pre-training and outperform alternatives that featurize the posed RGB-D multiview images instead. The gap in performance between methods that… ▽ More

    Submitted 25 June, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

    Comments: Camera Ready (CVPR 2024, Highlight)

  4. arXiv:2306.17817  [pdf, other

    cs.RO cs.AI cs.LG

    Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation

    Authors: Theophile Gervet, Zhou Xian, Nikolaos Gkanatsios, Katerina Fragkiadaki

    Abstract: 3D perceptual representations are well suited for robot manipulation as they easily encode occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial precision in end-effector pose prediction, which typically demands high-resolution 3D feature grids that are computationally expensive to process. As a result, most manipulation policies operate directly in 2D, foregoing… ▽ More

    Submitted 19 October, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

  5. arXiv:2304.14391  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement

    Authors: Nikolaos Gkanatsios, Ayush Jain, Zhou Xian, Yunchu Zhang, Christopher Atkeson, Katerina Fragkiadaki

    Abstract: Language is compositional; an instruction can express multiple relation constraints to hold among objects in a scene that a robot is tasked to rearrange. Our focus in this work is an instructable scene-rearranging framework that generalizes to longer instructions and to spatial concept compositions never seen at training time. We propose to represent language-instructed spatial concepts with energ… ▽ More

    Submitted 23 January, 2024; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: First two authors contributed equally | RSS 2023

  6. arXiv:2304.14382  [pdf, other

    cs.CV cs.AI cs.LG

    Analogy-Forming Transformers for Few-Shot 3D Parsing

    Authors: Nikolaos Gkanatsios, Mayank Singh, Zhaoyuan Fang, Shubham Tulsiani, Katerina Fragkiadaki

    Abstract: We present Analogical Networks, a model that encodes domain knowledge explicitly, in a collection of structured labelled 3D scenes, in addition to implicitly, as model parameters, and segments 3D object scenes with analogical reasoning: instead of map** a scene to part segments directly, our model first retrieves related scenes from memory and their corresponding part structures, and then predic… ▽ More

    Submitted 30 May, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: ICLR 2023

  7. arXiv:2112.08879  [pdf, other

    cs.CV cs.CL

    Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds

    Authors: Ayush Jain, Nikolaos Gkanatsios, Ishita Mediratta, Katerina Fragkiadaki

    Abstract: Most models tasked to ground referential utterances in 2D and 3D scenes learn to select the referred object from a pool of object proposals provided by a pre-trained detector. This is limiting because an utterance may refer to visual entities at various levels of granularity, such as the chair, the leg of the chair, or the tip of the front leg of the chair, which may be missed by the detector. We… ▽ More

    Submitted 21 July, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: First two authors contributed equally | ECCV 2022 Camera Ready

  8. arXiv:2006.05123  [pdf, other

    cs.RO cs.CV

    Orientation Attentive Robotic Grasp Synthesis with Augmented Grasp Map Representation

    Authors: Georgia Chalvatzaki, Nikolaos Gkanatsios, Petros Maragos, Jan Peters

    Abstract: Inherent morphological characteristics in objects may offer a wide range of plausible gras** orientations that obfuscates the visual learning of robotic gras**. Existing grasp generation approaches are cursed to construct discontinuous grasp maps by aggregating annotations for drastically different orientations per gras** point. Moreover, current methods generate grasp candidates across a si… ▽ More

    Submitted 2 February, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

    Comments: 7 pages, 4 figures, 5 tables

  9. arXiv:1902.05829  [pdf, other

    cs.CV

    Deeply Supervised Multimodal Attentional Translation Embeddings for Visual Relationship Detection

    Authors: Nikolaos Gkanatsios, Vassilis Pitsikalis, Petros Koutras, Athanasia Zlatintsi, Petros Maragos

    Abstract: Detecting visual relationships, i.e. <Subject, Predicate, Object> triplets, is a challenging Scene Understanding task approached in the past via linguistic priors or spatial information in a single feature branch. We introduce a new deeply supervised two-branch architecture, the Multimodal Attentional Translation Embeddings, where the visual features of each branch are driven by a multimodal atten… ▽ More

    Submitted 15 February, 2019; originally announced February 2019.