Skip to main content

Showing 1–2 of 2 results for author: Govind, M K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19391  [pdf, other

    cs.CV

    Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads

    Authors: Ali Khaleghi Rahimian, Manish Kumar Govind, Subhajit Maity, Dominick Reilly, Christian Kümmerle, Srijan Das, Aritra Dutta

    Abstract: Visual perception tasks are predominantly solved by Vision Transformer (ViT) architectures, which, despite their effectiveness, encounter a computational bottleneck due to the quadratic complexity of computing self-attention. This inefficiency is largely due to the self-attention heads capturing redundant token interactions, reflecting inherent redundancy within visual data. Many works have aimed… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: The code is publicly available at https://github.com/Charlotte-CharMLab/Fibottention

  2. arXiv:2406.09390  [pdf, other

    cs.CV cs.LG

    LLAVIDAL: Benchmarking Large Language Vision Models for Daily Activities of Living

    Authors: Rajatsubhra Chakraborty, Arkaprava Sinha, Dominick Reilly, Manish Kumar Govind, Pu Wang, Francois Bremond, Srijan Das

    Abstract: Large Language Vision Models (LLVMs) have demonstrated effectiveness in processing internet videos, yet they struggle with the visually perplexing dynamics present in Activities of Daily Living (ADL) due to limited pertinent datasets and models tailored to relevant cues. To this end, we propose a framework for curating ADL multiview datasets to fine-tune LLVMs, resulting in the creation of ADL-X,… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.