Skip to main content

Showing 1–11 of 11 results for author: Irshad, M Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08488  [pdf, other

    cs.CV cs.AI cs.LG

    ICE-G: Image Conditional Editing of 3D Gaussian Splats

    Authors: Vishnu Jaganathan, Hannah Hanyun Huang, Muhammad Zubair Irshad, Varun Jampani, Amit Raj, Zsolt Kira

    Abstract: Recently many techniques have emerged to create high quality 3D assets and scenes. When it comes to editing of these objects, however, existing approaches are either slow, compromise on quality, or do not provide enough customization. We introduce a novel approach to quickly edit a 3D model from a single reference view. Our technique first segments the edit image, and then matches semantically cor… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR AI4CC Workshop 2024. Project page: https://ice-gaussian.github.io

  2. arXiv:2405.20364  [pdf, other

    cs.CV cs.AI cs.RO

    Learning 3D Robotics Perception using Inductive Priors

    Authors: Muhammad Zubair Irshad

    Abstract: Recent advances in deep learning have led to a data-centric intelligence i.e. artificially intelligent models unlocking the potential to ingest a large amount of data and be really good at performing digital tasks such as text-to-image generation, machine-human conversation, and image recognition. This thesis covers the topic of learning with structured inductive bias and priors to design approach… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Georgia Tech Ph.D. Thesis, December 2023. For more details: https://zubairirshad.com/

  3. arXiv:2404.01300  [pdf, other

    cs.CV cs.AI cs.LG

    NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields

    Authors: Muhammad Zubair Irshad, Sergey Zakahrov, Vitor Guizilini, Adrien Gaidon, Zsolt Kira, Rares Ambrus

    Abstract: Neural fields excel in computer vision and robotics due to their ability to understand the 3D visual world such as inferring semantics, geometry, and dynamics. Given the capabilities of neural fields in densely representing a 3D scene from 2D images, we ask the question: Can we scale their self-supervised pretraining, specifically using masked autoencoders, to generate effective 3D representations… ▽ More

    Submitted 18 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: 29 pages, 13 figures. Project Page: https://nerf-mae.github.io/

  4. arXiv:2402.12647  [pdf, other

    cs.CV cs.RO

    DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal Category-level Pose Estimation

    Authors: Takuya Ikeda, Sergey Zakharov, Tianyi Ko, Muhammad Zubair Irshad, Robert Lee, Katherine Liu, Rares Ambrus, Koichi Nishiwaki

    Abstract: This paper addresses the challenging problem of category-level pose estimation. Current state-of-the-art methods for this task face challenges when dealing with symmetric objects and when attempting to generalize to new environments solely through synthetic data training. In this work, we address these challenges by proposing a probabilistic model that relies on diffusion to estimate dense canonic… ▽ More

    Submitted 5 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: 8 pages. 9 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  5. arXiv:2310.12974  [pdf, other

    cs.CV cs.RO

    FSD: Fast Self-Supervised Single RGB-D to Categorical 3D Objects

    Authors: Mayank Lunayach, Sergey Zakharov, Dian Chen, Rares Ambrus, Zsolt Kira, Muhammad Zubair Irshad

    Abstract: In this work, we address the challenging task of 3D object recognition without the reliance on real-world 3D labeled data. Our goal is to predict the 3D shape, size, and 6D pose of objects within a single RGB-D image, operating at the category level and eliminating the need for CAD models during inference. While existing self-supervised methods have made strides in this field, they often suffer fr… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Project page: https://fsd6d.github.io

  6. arXiv:2308.12967  [pdf, other

    cs.CV cs.AI cs.LG

    NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes

    Authors: Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Vitor Guizilini, Thomas Kollar, Adrien Gaidon, Zsolt Kira, Rares Ambrus

    Abstract: Recent implicit neural representations have shown great results for novel view synthesis. However, existing methods require expensive per-scene optimization from many views hence limiting their application to real-world unbounded urban settings where the objects of interest or backgrounds are observed from very few views. To mitigate this challenge, we introduce a new approach called NeO 360, Neur… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: Accepted to International Conference on Computer Vision (ICCV), 2023. Project page: https://zubair-irshad.github.io/projects/neo360.html

  7. arXiv:2303.15782  [pdf, other

    cs.CV cs.RO

    CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects

    Authors: Nick Heppert, Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Rares Andrei Ambrus, Jeannette Bohg, Abhinav Valada, Thomas Kollar

    Abstract: We present CARTO, a novel approach for reconstructing multiple articulated objects from a single stereo RGB observation. We use implicit object-centric representations and learn a single geometry and articulation decoder for multiple object categories. Despite training on multiple categories, our decoder achieves a comparable reconstruction accuracy to methods that train bespoke decoders separatel… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: 20 pages, 11 figures, accepted at CVPR 2023

  8. arXiv:2207.13691  [pdf, other

    cs.CV cs.LG cs.RO

    ShAPO: Implicit Representations for Multi-Object Shape, Appearance, and Pose Optimization

    Authors: Muhammad Zubair Irshad, Sergey Zakharov, Rares Ambrus, Thomas Kollar, Zsolt Kira, Adrien Gaidon

    Abstract: Our method studies the complex task of object-centric 3D understanding from a single RGB-D observation. As it is an ill-posed problem, existing methods suffer from low performance for both 3D shape and 6D pose and size estimation in complex multi-object scenarios with occlusions. We present ShAPO, a method for joint multi-object detection, 3D textured reconstruction, 6D object pose and size estima… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: Accepted to European Conference on Computer Vision (ECCV), 2022

  9. arXiv:2203.01929  [pdf, other

    cs.CV cs.LG cs.RO

    CenterSnap: Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation

    Authors: Muhammad Zubair Irshad, Thomas Kollar, Michael Laskey, Kevin Stone, Zsolt Kira

    Abstract: This paper studies the complex task of simultaneous multi-object 3D reconstruction, 6D pose and size estimation from a single-view RGB-D observation. In contrast to instance-level pose estimation, we focus on a more challenging problem where CAD models are not available at inference time. Existing approaches mainly follow a complex multi-stage pipeline which first localizes and detects each object… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

    Comments: Accepted to ICRA 2022, Project page with videos: https://zubair-irshad.github.io/projects/CenterSnap.html

  10. arXiv:2108.11945  [pdf, other

    cs.RO cs.CL cs.CV

    SASRA: Semantically-aware Spatio-temporal Reasoning Agent for Vision-and-Language Navigation in Continuous Environments

    Authors: Muhammad Zubair Irshad, Niluthpol Chowdhury Mithun, Zachary Seymour, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar

    Abstract: This paper presents a novel approach for the Vision-and-Language Navigation (VLN) task in continuous 3D environments, which requires an autonomous agent to follow natural language instructions in unseen environments. Existing end-to-end learning-based VLN methods struggle at this task as they focus mostly on utilizing raw visual observations and lack the semantic spatio-temporal reasoning capabili… ▽ More

    Submitted 26 August, 2021; originally announced August 2021.

    Comments: 10 pages, 4 figures

  11. arXiv:2104.10674  [pdf, other

    cs.RO

    Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation

    Authors: Muhammad Zubair Irshad, Chih-Yao Ma, Zsolt Kira

    Abstract: Deep Learning has revolutionized our ability to solve complex problems such as Vision-and-Language Navigation (VLN). This task requires the agent to navigate to a goal purely based on visual sensory inputs given natural language instructions. However, prior works formulate the problem as a navigation graph with a discrete action space. In this work, we lift the agent off the navigation graph and p… ▽ More

    Submitted 21 April, 2021; originally announced April 2021.

    Comments: Accepted to ICRA 2021; Code and Dataset are available at https://github.com/GT-RIPL/robo-vln