Skip to main content

Showing 1–2 of 2 results for author: Rasheed, H A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.13435  [pdf, other

    cs.CV cs.AI

    PG-Video-LLaVA: Pixel Grounding Large Video-Language Models

    Authors: Shehan Munasinghe, Rusiru Thushara, Muhammad Maaz, Hanoona Abdul Rasheed, Salman Khan, Mubarak Shah, Fahad Khan

    Abstract: Extending image-based Large Multimodal Models (LMMs) to videos is challenging due to the inherent complexity of video data. The recent approaches extending image-based LMMs to videos either lack the grounding capabilities (e.g., VideoChat, Video-ChatGPT, Video-LLaMA) or do not utilize the audio-signals for better video understanding (e.g., Video-ChatGPT). Addressing these gaps, we propose PG-Video… ▽ More

    Submitted 13 December, 2023; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: Technical Report

  2. arXiv:2105.08788  [pdf, other

    cs.CV

    Self-Supervised Learning for Fine-Grained Visual Categorization

    Authors: Muhammad Maaz, Hanoona Abdul Rasheed, Dhanalaxmi Gaddam

    Abstract: Recent research in self-supervised learning (SSL) has shown its capability in learning useful semantic representations from images for classification tasks. Through our work, we study the usefulness of SSL for Fine-Grained Visual Categorization (FGVC). FGVC aims to distinguish objects of visually similar sub categories within a general category. The small inter-class, but large intra-class variati… ▽ More

    Submitted 18 May, 2021; originally announced May 2021.

    Comments: 10 pages, 6 figures