Skip to main content

Showing 1–8 of 8 results for author: Gani, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.10326  [pdf, other

    cs.CV

    VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs

    Authors: Rohit Bharadwaj, Hanan Gani, Muzammal Naseer, Fahad Shahbaz Khan, Salman Khan

    Abstract: The recent developments in Large Multi-modal Video Models (Video-LMMs) have significantly enhanced our ability to interpret and analyze video data. Despite their impressive capabilities, current Video-LMMs have not been evaluated for anomaly detection tasks, which is critical to their deployment in practical scenarios e.g., towards identifying deepfakes, manipulated video content, traffic accident… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Data: https://huggingface.co/datasets/rohit901/VANE-Bench

  2. arXiv:2402.17725  [pdf, other

    eess.IV cs.CV

    MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation

    Authors: Hanan Gani, Muzammal Naseer, Fahad Khan, Salman Khan

    Abstract: Volumetric medical segmentation is a critical component of 3D medical image analysis that delineates different semantic regions. Deep neural networks have significantly improved volumetric medical segmentation, but they generally require large-scale annotated data to achieve better performance, which can be expensive and prohibitive to obtain. To address this limitation, existing works typically p… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Code available at https://github.com/hananshafi/MedContext

  3. arXiv:2402.08070  [pdf, other

    cs.CV

    Multi-Attribute Vision Transformers are Efficient and Robust Learners

    Authors: Hanan Gani, Nada Saadi, Noor Hussein, Karthik Nandakumar

    Abstract: Since their inception, Vision Transformers (ViTs) have emerged as a compelling alternative to Convolutional Neural Networks (CNNs) across a wide spectrum of tasks. ViTs exhibit notable characteristics, including global attention, resilience against occlusions, and adaptability to distribution shifts. One underexplored aspect of ViTs is their potential for multi-attribute learning, referring to the… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: Code: https://github.com/hananshafi/MTL-ViT. arXiv admin note: text overlap with arXiv:2207.08677 by other authors

  4. arXiv:2311.01459  [pdf, other

    cs.CV

    Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization

    Authors: Jameel Hassan, Hanan Gani, Noor Hussein, Muhammad Uzair Khattak, Muzammal Naseer, Fahad Shahbaz Khan, Salman Khan

    Abstract: The promising zero-shot generalization of vision-language models such as CLIP has led to their adoption using prompt learning for numerous downstream tasks. Previous works have shown test-time prompt tuning using entropy minimization to adapt text prompts for unseen domains. While effective, this overlooks the key cause for performance degradation to unseen domains -- distribution shift. In this w… ▽ More

    Submitted 10 January, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted to NeurIPS 2023

  5. arXiv:2310.10640  [pdf, other

    cs.CV

    LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts

    Authors: Hanan Gani, Shariq Farooq Bhat, Muzammal Naseer, Salman Khan, Peter Wonka

    Abstract: Diffusion-based generative models have significantly advanced text-to-image generation but encounter challenges when processing lengthy and intricate text prompts describing complex scenes with multiple objects. While excelling in generating images from short, single-object descriptions, these models often struggle to faithfully capture all the nuanced details within longer and more elaborate text… ▽ More

    Submitted 25 February, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted at ICLR 2024

  6. arXiv:2210.07240  [pdf, other

    cs.CV

    How to Train Vision Transformer on Small-scale Datasets?

    Authors: Hanan Gani, Muzammal Naseer, Mohammad Yaqub

    Abstract: Vision Transformer (ViT), a radically different architecture than convolutional neural networks offers multiple advantages including design simplicity, robustness and state-of-the-art performance on many vision tasks. However, in contrast to convolutional neural networks, Vision Transformer lacks inherent inductive biases. Therefore, successful training of such models is mainly attributed to pre-t… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted at BMVC 2022

  7. arXiv:1904.11576  [pdf

    physics.ao-ph cs.LG stat.ML

    Forecasting Drought Using Multilayer Perceptron Artificial Neural Network Model

    Authors: Zulifqar Ali, Ijaz Hussain, Muhammad Faisal, Hafiza Mamona Nazir, Tajammal Hussain, Muhammad Yousaf Shad, Alaa Mohamd Shoukry, Showkat Hussain Gani

    Abstract: These days human beings are facing many environmental challenges due to frequently occurring drought hazards. It may have an effect on the countrys environment, the community, and industries. Several adverse impacts of drought hazard are continued in Pakistan, including other hazards. However, early measurement and detection of drought can provide guidance to water resources management for employi… ▽ More

    Submitted 17 April, 2019; originally announced April 2019.

  8. arXiv:1809.02875  [pdf, other

    cs.CV

    A Supervised Learning Methodology for Real-Time Disguised Face Recognition in the Wild

    Authors: Saumya Kumaar, Abhinandan Dogra, Abrar Majeedi, Hanan Gani, Ravi M. Vishwanath, S N Omkar

    Abstract: Facial recognition has always been a challeng- ing task for computer vision scientists and experts. Despite complexities arising due to variations in camera parameters, illumination and face orientations, significant progress has been made in the field with deep learning algorithms now competing with human-level accuracy. But in contrast to the recent advances in face recognition techniques, Disgu… ▽ More

    Submitted 8 September, 2018; originally announced September 2018.

    Comments: Accepted at 2018 International Conference on Robotics and Computer Vision