Skip to main content

Showing 1–17 of 17 results for author: VS, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.12960  [pdf, other

    cs.CV

    FaceXFormer: A Unified Transformer for Facial Analysis

    Authors: Kartik Narayan, Vibashan VS, Rama Chellappa, Vishal M. Patel

    Abstract: In this work, we introduce FaceXformer, an end-to-end unified transformer model for a comprehensive range of facial analysis tasks such as face parsing, landmark detection, head pose estimation, attributes recognition, and estimation of age, gender, race, and landmarks visibility. Conventional methods in face analysis have often relied on task-specific designs and preprocessing techniques, which l… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project page: https://kartik-3004.github.io/facexformer_web/

  2. arXiv:2403.09620  [pdf, other

    cs.CV

    PosSAM: Panoptic Open-vocabulary Segment Anything

    Authors: Vibashan VS, Shubhankar Borse, Hyo** Park, Debasmit Das, Vishal Patel, Munawar Hayat, Fatih Porikli

    Abstract: In this paper, we introduce an open-vocabulary panoptic segmentation model that effectively unifies the strengths of the Segment Anything Model (SAM) with the vision-language CLIP model in an end-to-end framework. While SAM excels in generating spatially-aware masks, it's decoder falls short in recognizing object class information and tends to oversegment without additional guidance. Existing appr… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  3. arXiv:2401.10484  [pdf, other

    cs.IR cs.AI cs.AR

    Enhancing Scalability in Recommender Systems through Lottery Ticket Hypothesis and Knowledge Distillation-based Neural Network Pruning

    Authors: Rajaram R, Manoj Bharadhwaj, Vasan VS, Nargis Pervin

    Abstract: This study introduces an innovative approach aimed at the efficient pruning of neural networks, with a particular focus on their deployment on edge devices. Our method involves the integration of the Lottery Ticket Hypothesis (LTH) with the Knowledge Distillation (KD) framework, resulting in the formulation of three distinct pruning models. These models have been developed to address scalability i… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted in WITS 2023 as a workshop paper

  4. arXiv:2312.14126  [pdf, other

    cs.CV

    Entropic Open-set Active Learning

    Authors: Bardia Safaei, Vibashan VS, Celso M. de Melo, Vishal M. Patel

    Abstract: Active Learning (AL) aims to enhance the performance of deep models by selecting the most informative samples for annotation from a pool of unlabeled data. Despite impressive performance in closed-set settings, most AL methods fail in real-world scenarios where the unlabeled data contains unknown categories. Recently, a few studies have attempted to tackle the AL problem for the open-set setting.… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted in AAAI 2024

  5. arXiv:2303.16891  [pdf, other

    cs.CV

    Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations

    Authors: Vibashan VS, Ning Yu, Chen Xing, Can Qin, Mingfei Gao, Juan Carlos Niebles, Vishal M. Patel, Ran Xu

    Abstract: Existing instance segmentation models learn task-specific information using manual mask annotations from base (training) categories. These mask annotations require tremendous human effort, limiting the scalability to annotate novel (new) categories. To alleviate this problem, Open-Vocabulary (OV) methods leverage large-scale image-caption pairs and vision-language models to learn novel categories.… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023. Project site: https://vibashan.github.io/ovis-web/

  6. arXiv:2211.05883  [pdf, other

    cs.CV

    Open-Set Automatic Target Recognition

    Authors: Bardia Safaei, Vibashan VS, Celso M. de Melo, Shuowen Hu, Vishal M. Patel

    Abstract: Automatic Target Recognition (ATR) is a category of computer vision algorithms which attempts to recognize targets on data obtained from different sensors. ATR algorithms are extensively used in real-world scenarios such as military and surveillance applications. Existing ATR algorithms are developed for traditional closed-set methods where training and testing have the same class distribution. Th… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

    Comments: 5 pages, 3 figures. Submitted to ICASSP 2023

  7. arXiv:2204.05289  [pdf, other

    cs.CV

    Towards Online Domain Adaptive Object Detection

    Authors: Vibashan VS, Poojan Oza, Vishal M. Patel

    Abstract: Existing object detection models assume both the training and test data are sampled from the same source domain. This assumption does not hold true when these detectors are deployed in real-world applications, where they encounter new visual domain. Unsupervised Domain Adaptation (UDA) methods are generally employed to mitigate the adverse effects caused by domain shift. Existing UDA methods opera… ▽ More

    Submitted 21 October, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted to WACV 2023

  8. arXiv:2203.15793  [pdf, other

    cs.CV

    Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection

    Authors: Vibashan VS, Poojan Oza, Vishal M. Patel

    Abstract: Unsupervised Domain Adaptation (UDA) is an effective approach to tackle the issue of domain shift. Specifically, UDA methods try to align the source and target representations to improve the generalization on the target domain. Further, UDA methods work under the assumption that the source data is accessible during the adaptation process. However, in real-world scenarios, the labelled source data… ▽ More

    Submitted 21 March, 2023; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2023. Project site: \href{https://viudomain.github.io/irg-sfda-web/}{https://viudomain.github.io/irg-sfda-web/}

  9. arXiv:2203.15792  [pdf, other

    cs.CV

    Target and Task specific Source-Free Domain Adaptive Image Segmentation

    Authors: Vibashan VS, Jeya Maria Jose Valanarasu, Vishal M. Patel

    Abstract: Solving the domain shift problem during inference is essential in medical imaging, as most deep-learning based solutions suffer from it. In practice, domain shifts are tackled by performing Unsupervised Domain Adaptation (UDA), where a model is adapted to an unlabelled target domain by leveraging the labelled source data. In medical scenarios, the data comes with huge privacy concerns making it di… ▽ More

    Submitted 10 March, 2023; v1 submitted 29 March, 2022; originally announced March 2022.

  10. arXiv:2203.05574  [pdf, other

    eess.IV cs.CV

    On-the-Fly Test-time Adaptation for Medical Image Segmentation

    Authors: Jeya Maria Jose Valanarasu, Pengfei Guo, Vibashan VS, Vishal M. Patel

    Abstract: One major problem in deep learning-based solutions for medical imaging is the drop in performance when a model is tested on a data distribution different from the one that it is trained on. Adapting the source model to target data distribution at test-time is an efficient solution for the data-shift problem. Previous methods solve this by adapting the model to target distribution by using techniqu… ▽ More

    Submitted 10 March, 2022; originally announced March 2022.

    Comments: Tech Report

  11. arXiv:2112.08189  [pdf, other

    cs.CV cs.RO eess.IV

    ST-MTL: Spatio-Temporal Multitask Learning Model to Predict Scanpath While Tracking Instruments in Robotic Surgery

    Authors: Mobarakol Islam, Vibashan VS, Chwee Ming Lim, Hongliang Ren

    Abstract: Representation learning of the task-oriented attention while tracking instrument holds vast potential in image-guided robotic surgery. Incorporating cognitive ability to automate the camera control enables the surgeon to concentrate more on dealing with surgical instruments. The objective is to reduce the operation time and facilitate the surgery for both surgeons and patients. We propose an end-t… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

    Comments: 12 pages

  12. arXiv:2110.03143  [pdf, other

    cs.CV

    Meta-UDA: Unsupervised Domain Adaptive Thermal Object Detection using Meta-Learning

    Authors: Vibashan VS, Domenick Poster, Suya You, Shuowen Hu, Vishal M. Patel

    Abstract: Object detectors trained on large-scale RGB datasets are being extensively employed in real-world applications. However, these RGB-trained models suffer a performance drop under adverse illumination and lighting conditions. Infrared (IR) cameras are robust under such conditions and can be helpful in real-world applications. Though thermal cameras are widely used for military applications and incre… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

    Comments: Accepted to WACV 2022

  13. arXiv:2107.09011  [pdf, other

    cs.CV

    Image Fusion Transformer

    Authors: Vibashan VS, Jeya Maria Jose Valanarasu, Poojan Oza, Vishal M. Patel

    Abstract: In image fusion, images obtained from different sensors are fused to generate a single image with enhanced information. In recent years, state-of-the-art methods have adopted Convolution Neural Networks (CNNs) to encode meaningful features for image fusion. Specifically, CNN-based methods perform image fusion by fusing local features. However, they do not consider long-range dependencies that are… ▽ More

    Submitted 4 December, 2022; v1 submitted 19 July, 2021; originally announced July 2021.

    Comments: Accepted at ICIP 2022

  14. arXiv:2105.13502  [pdf, other

    cs.CV cs.LG

    Unsupervised Domain Adaptation of Object Detectors: A Survey

    Authors: Poojan Oza, Vishwanath A. Sindagi, Vibashan VS, Vishal M. Patel

    Abstract: Recent advances in deep learning have led to the development of accurate and efficient models for various computer vision applications such as classification, segmentation, and detection. However, learning highly accurate models relies on the availability of large-scale annotated datasets. Due to this, model performance drops drastically when evaluated on label-scarce datasets having visually dist… ▽ More

    Submitted 4 July, 2021; v1 submitted 27 May, 2021; originally announced May 2021.

  15. arXiv:2104.00985  [pdf, other

    eess.IV cs.CV

    Brain Tumor Segmentation and Survival Prediction using 3D Attention UNet

    Authors: Mobarakol Islam, Vibashan VS, V Jeya Maria Jose, Navodini Wijethilake, Uppal Utkarsh, Hongliang Ren

    Abstract: In this work, we develop an attention convolutional neural network (CNN) to segment brain tumors from Magnetic Resonance Images (MRI). Further, we predict the survival rate using various machine learning methods. We adopt a 3D UNet architecture and integrate channel and spatial attention with the decoder network to perform segmentation. For survival prediction, we extract some novel radiomic featu… ▽ More

    Submitted 2 April, 2021; originally announced April 2021.

    Comments: MICCAI-BrainLes Workshop

  16. arXiv:2103.04224  [pdf, other

    cs.CV

    MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection

    Authors: Vibashan VS, Vikram Gupta, Poojan Oza, Vishwanath A. Sindagi, Vishal M. Patel

    Abstract: Existing approaches for unsupervised domain adaptive object detection perform feature alignment via adversarial training. While these methods achieve reasonable improvements in performance, they typically perform category-agnostic domain alignment, thereby resulting in negative transfer of features. To overcome this issue, in this work, we attempt to incorporate category information into the domai… ▽ More

    Submitted 3 April, 2021; v1 submitted 6 March, 2021; originally announced March 2021.

    Comments: Accepted to CVPR 2021

  17. AP-MTL: Attention Pruned Multi-task Learning Model for Real-time Instrument Detection and Segmentation in Robot-assisted Surgery

    Authors: Mobarakol Islam, Vibashan VS, Hongliang Ren

    Abstract: Surgical scene understanding and multi-tasking learning are crucial for image-guided robotic surgery. Training a real-time robotic system for the detection and segmentation of high-resolution images provides a challenging problem with the limited computational resource. The perception drawn can be applied in effective real-time feedback, surgical skill assessment, and human-robot collaborative sur… ▽ More

    Submitted 31 May, 2020; v1 submitted 10 March, 2020; originally announced March 2020.

    Comments: Accepted in the conference of ICRA 2020