Search | arXiv e-print repository

doi 10.1609/aaai.v38i5.28227

Semi-supervised Open-World Object Detection

Authors: Sahal Shaji Mullappilly, Abhishek Singh Gehlot, Rao Muhammad Anwer, Fahad Shahbaz Khan, Hisham Cholakkal

Abstract: Conventional open-world object detection (OWOD) problem setting first distinguishes known and unknown classes and then later incrementally learns the unknown objects when introduced with labels in the subsequent tasks. However, the current OWOD formulation heavily relies on the external human oracle for knowledge input during the incremental learning stages. Such reliance on run-time makes this fo… ▽ More Conventional open-world object detection (OWOD) problem setting first distinguishes known and unknown classes and then later incrementally learns the unknown objects when introduced with labels in the subsequent tasks. However, the current OWOD formulation heavily relies on the external human oracle for knowledge input during the incremental learning stages. Such reliance on run-time makes this formulation less realistic in a real-world deployment. To address this, we introduce a more realistic formulation, named semi-supervised open-world detection (SS-OWOD), that reduces the annotation cost by casting the incremental learning stages of OWOD in a semi-supervised manner. We demonstrate that the performance of the state-of-the-art OWOD detector dramatically deteriorates in the proposed SS-OWOD setting. Therefore, we introduce a novel SS-OWOD detector, named SS-OWFormer, that utilizes a feature-alignment scheme to better align the object query representations between the original and augmented images to leverage the large unlabeled and few labeled data. We further introduce a pseudo-labeling scheme for unknown detection that exploits the inherent capability of decoder object queries to capture object-specific information. We demonstrate the effectiveness of our SS-OWOD problem setting and approach for remote sensing object detection, proposing carefully curated splits and baseline performance evaluations. Our experiments on 4 datasets including MS COCO, PASCAL, Objects365 and DOTA demonstrate the effectiveness of our approach. Our source code, models and splits are available here - https://github.com/sahalshajim/SS-OWFormer △ Less

Submitted 25 February, 2024; originally announced February 2024.

Comments: Accepted to AAAI 2024 (Main Track)

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence 2024

arXiv:2308.06825 [pdf, other]

On the relative abundances of Cavansite and Pentagonite

Authors: Bhalchandra S. Pujari, Sagar Gehlot, Mihir Arjunwadkar, Dilip G. Kanhere, Raymond Duraiswami

Abstract: Cavansite is a visually stunning blue vanadosilicate mineral with limited occurrences worldwide, whereas Pentagonite is a closely related dimorph with similar physical and chemical properties, yet is extremely rare compared to Cavansite. The reasons behind Pentagonite's exceptional rarity remain largely unknown. In this study, (a) density functional theory (DFT) is utilized to investigate the el… ▽ More Cavansite is a visually stunning blue vanadosilicate mineral with limited occurrences worldwide, whereas Pentagonite is a closely related dimorph with similar physical and chemical properties, yet is extremely rare compared to Cavansite. The reasons behind Pentagonite's exceptional rarity remain largely unknown. In this study, (a) density functional theory (DFT) is utilized to investigate the electronic structures of Cavansite and Pentagonite at ground state and finite pressures; (b) a two-state Boltzmann probability model is then employed to construct a comprehensive phase diagram that reveals the abundance of each species across a wide range of pressure and temperature conditions; and (c) dehydration characteristics of these two minerals are explored. The present analysis reveals the key factors that contribute to the relative scarcity of Pentagonite, including differences in structural arrangement and electronic configurations between the two minerals. Specifically, it shows that (a) because of the peculiar arrangements of SiO4 polyhedra, Cavansite forms a compact structure (about 2.7% less in volume) resulting in lower energy; (b) at a temperature of about 650K only about 1% Pentagonite can form; (c) vanadium induces a highly localized state in both of these otherwise large-band-gap insulators resulting in an extremely weak magnetic phase that is unlikely to be observed at any reasonable finite temperature; and (d) water molecules are loosely bound inside the microporous crystals of Cavansite and Pentagonite, suggesting potential applications of these minerals in various technological fields. △ Less

Submitted 2 March, 2024; v1 submitted 13 August, 2023; originally announced August 2023.

arXiv:2208.14206 [pdf, other]

FUSION: Fully Unsupervised Test-Time Stain Adaptation via Fused Normalization Statistics

Authors: Nilanjan Chattopadhyay, Shiv Gehlot, Nitin Singhal

Abstract: Staining reveals the micro structure of the aspirate while creating histopathology slides. Stain variation, defined as a chromatic difference between the source and the target, is caused by varying characteristics during staining, resulting in a distribution shift and poor performance on the target. The goal of stain normalization is to match the target's chromatic distribution to that of the sour… ▽ More Staining reveals the micro structure of the aspirate while creating histopathology slides. Stain variation, defined as a chromatic difference between the source and the target, is caused by varying characteristics during staining, resulting in a distribution shift and poor performance on the target. The goal of stain normalization is to match the target's chromatic distribution to that of the source. However, stain normalisation causes the underlying morphology to distort, resulting in an incorrect diagnosis. We propose FUSION, a new method for promoting stain-adaption by adjusting the model to the target in an unsupervised test-time scenario, eliminating the necessity for significant labelling at the target end. FUSION works by altering the target's batch normalization statistics and fusing them with source statistics using a weighting factor. The algorithm reduces to one of two extremes based on the weighting factor. Despite the lack of training or supervision, FUSION surpasses existing equivalent algorithms for classification and dense predictions (segmentation), as demonstrated by comprehensive experiments on two public datasets. △ Less

Submitted 30 August, 2022; originally announced August 2022.

Comments: Accepted in European Conference on Computer Vision (ECCV) 2022 Workshop: AI-enabled medical image analysis (AIMIA)

arXiv:2205.05543 [pdf, other]

An Empirical Study Of Self-supervised Learning Approaches For Object Detection With Transformers

Authors: Gokul Karthik Kumar, Sahal Shaji Mullappilly, Abhishek Singh Gehlot

Abstract: Self-supervised learning (SSL) methods such as masked language modeling have shown massive performance gains by pretraining transformer models for a variety of natural language processing tasks. The follow-up research adapted similar methods like masked image modeling in vision transformer and demonstrated improvements in the image classification task. Such simple self-supervised methods are not e… ▽ More Self-supervised learning (SSL) methods such as masked language modeling have shown massive performance gains by pretraining transformer models for a variety of natural language processing tasks. The follow-up research adapted similar methods like masked image modeling in vision transformer and demonstrated improvements in the image classification task. Such simple self-supervised methods are not exhaustively studied for object detection transformers (DETR, Deformable DETR) as their transformer encoder modules take input in the convolutional neural network (CNN) extracted feature space rather than the image space as in general vision transformers. However, the CNN feature maps still maintain the spatial relationship and we utilize this property to design self-supervised learning approaches to train the encoder of object detection transformers in pretraining and multi-task learning settings. We explore common self-supervised methods based on image reconstruction, masked image modeling and jigsaw. Preliminary experiments in the iSAID dataset demonstrate faster convergence of DETR in the initial epochs in both pretraining and multi-task learning settings; nonetheless, similar improvement is not observed in the case of multi-task learning with Deformable DETR. The code for our experiments with DETR and Deformable DETR are available at https://github.com/gokulkarthik/detr and https://github.com/gokulkarthik/Deformable-DETR respectively. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: Final Project for the course "Visual Object Detection And Recognition" (CV703) at MBZUAI

arXiv:2204.05814 [pdf, other]

MuCoT: Multilingual Contrastive Training for Question-Answering in Low-resource Languages

Authors: Gokul Karthik Kumar, Abhishek Singh Gehlot, Sahal Shaji Mullappilly, Karthik Nandakumar

Abstract: Accuracy of English-language Question Answering (QA) systems has improved significantly in recent years with the advent of Transformer-based models (e.g., BERT). These models are pre-trained in a self-supervised fashion with a large English text corpus and further fine-tuned with a massive English QA dataset (e.g., SQuAD). However, QA datasets on such a scale are not available for most of the othe… ▽ More Accuracy of English-language Question Answering (QA) systems has improved significantly in recent years with the advent of Transformer-based models (e.g., BERT). These models are pre-trained in a self-supervised fashion with a large English text corpus and further fine-tuned with a massive English QA dataset (e.g., SQuAD). However, QA datasets on such a scale are not available for most of the other languages. Multi-lingual BERT-based models (mBERT) are often used to transfer knowledge from high-resource languages to low-resource languages. Since these models are pre-trained with huge text corpora containing multiple languages, they typically learn language-agnostic embeddings for tokens from different languages. However, directly training an mBERT-based QA system for low-resource languages is challenging due to the paucity of training data. In this work, we augment the QA samples of the target language using translation and transliteration into other languages and use the augmented data to fine-tune an mBERT-based QA model, which is already pre-trained in English. Experiments on the Google ChAII dataset show that fine-tuning the mBERT model with translations from the same language family boosts the question-answering performance, whereas the performance degrades in the case of cross-language families. We further show that introducing a contrastive loss between the translated question-context feature pairs during the fine-tuning process, prevents such degradation with cross-lingual family translations and leads to marginal improvement. The code for this work is available at https://github.com/gokulkarthik/mucot. △ Less

Submitted 12 April, 2022; originally announced April 2022.

Comments: Accepted for oral presentation at ACL 2022 Workshop on Speech and Language Technologies for Dravidian Languages

arXiv:2109.07029 [pdf, other]

Seeking an Optimal Approach for Computer-Aided Pulmonary Embolism Detection

Authors: Nahid Ul Islam, Shiv Gehlot, Zongwei Zhou, Michael B Gotway, Jianming Liang

Abstract: Pulmonary embolism (PE) represents a thrombus ("blood clot"), usually originating from a lower extremity vein, that travels to the blood vessels in the lung, causing vascular obstruction and in some patients, death. This disorder is commonly diagnosed using CT pulmonary angiography (CTPA). Deep learning holds great promise for the computer-aided CTPA diagnosis (CAD) of PE. However, numerous compet… ▽ More Pulmonary embolism (PE) represents a thrombus ("blood clot"), usually originating from a lower extremity vein, that travels to the blood vessels in the lung, causing vascular obstruction and in some patients, death. This disorder is commonly diagnosed using CT pulmonary angiography (CTPA). Deep learning holds great promise for the computer-aided CTPA diagnosis (CAD) of PE. However, numerous competing methods for a given task in the deep learning literature exist, causing great confusion regarding the development of a CAD PE system. To address this confusion, we present a comprehensive analysis of competing deep learning methods applicable to PE diagnosis using CTPA at the both image and exam levels. At the image level, we compare convolutional neural networks (CNNs) with vision transformers, and contrast self-supervised learning (SSL) with supervised learning, followed by an evaluation of transfer learning compared with training from scratch. At the exam level, we focus on comparing conventional classification (CC) with multiple instance learning (MIL). Our extensive experiments consistently show: (1) transfer learning consistently boosts performance despite differences between natural images and CT scans, (2) transfer learning with SSL surpasses its supervised counterparts; (3) CNNs outperform vision transformers, which otherwise show satisfactory performance; and (4) CC is, surprisingly, superior to MIL. Compared with the state of the art, our optimal approach provides an AUC gain of 0.2\% and 1.05\% for image-level and exam-level, respectively. △ Less

Submitted 14 September, 2021; originally announced September 2021.

arXiv:2006.00304 [pdf, other]

doi 10.1016/j.media.2020.101661

SDCT-AuxNet$^θ$: DCT Augmented Stain Deconvolutional CNN with Auxiliary Classifier for Cancer Diagnosis

Authors: Shiv Gehlot, Anubha Gupta, Ritu Gupta

Abstract: Acute lymphoblastic leukemia (ALL) is a pervasive pediatric white blood cell cancer across the globe. With the popularity of convolutional neural networks (CNNs), computer-aided diagnosis of cancer has attracted considerable attention. Such tools are easily deployable and are cost-effective. Hence, these can enable extensive coverage of cancer diagnostic facilities. However, the development of suc… ▽ More Acute lymphoblastic leukemia (ALL) is a pervasive pediatric white blood cell cancer across the globe. With the popularity of convolutional neural networks (CNNs), computer-aided diagnosis of cancer has attracted considerable attention. Such tools are easily deployable and are cost-effective. Hence, these can enable extensive coverage of cancer diagnostic facilities. However, the development of such a tool for ALL cancer was challenging so far due to the non-availability of a large training dataset. The visual similarity between the malignant and normal cells adds to the complexity of the problem. This paper discusses the recent release of a large dataset and presents a novel deep learning architecture for the classification of cell images of ALL cancer. The proposed architecture, namely, SDCT-AuxNet$^θ$ is a 2-module framework that utilizes a compact CNN as the main classifier in one module and a Kernel SVM as the auxiliary classifier in the other one. While CNN classifier uses features through bilinear-pooling, spectral-averaged features are used by the auxiliary classifier. Further, this CNN is trained on the stain deconvolved quantity images in the optical density domain instead of the conventional RGB images. A novel test strategy is proposed that exploits both the classifiers for decision making using the confidence scores of their predicted class labels. Elaborate experiments have been carried out on our recently released public dataset of 15114 images of ALL cancer and healthy cells to establish the validity of the proposed methodology that is also robust to subject-level variability. A weighted F1 score of 94.8$\%$ is obtained that is best so far on this challenging dataset. △ Less

Submitted 7 June, 2020; v1 submitted 30 May, 2020; originally announced June 2020.

Comments: The final version of this preprint has been published in Medical Image Analysis

Journal ref: Medical Image Analysis, 61, 101661, 2020

arXiv:1704.08189 [pdf, ps, other]

Properties of Ultra Gamma Function

Authors: Kuldeep Singh Gehlot

Abstract: In this paper we study the integral of type \[_{δ,a}Γ_{ρ,b}(x) =Γ(δ,a;ρ,b)(x)=\int_{0}^{\infty}t^{x-1}e^{-\frac{t^δ}{a}-\frac{t^{-ρ}}{b}}dt.\] Different authors called this integral by different names like ultra gamma function, generalized gamma function, Kratzel integral, inverse Gaussian integral, reaction-rate probability integral, Bessel integral etc. We prove several identities and recurren… ▽ More In this paper we study the integral of type \[_{δ,a}Γ_{ρ,b}(x) =Γ(δ,a;ρ,b)(x)=\int_{0}^{\infty}t^{x-1}e^{-\frac{t^δ}{a}-\frac{t^{-ρ}}{b}}dt.\] Different authors called this integral by different names like ultra gamma function, generalized gamma function, Kratzel integral, inverse Gaussian integral, reaction-rate probability integral, Bessel integral etc. We prove several identities and recurrence relation of above said integral, we called this integral as Four Parameter Gamma Function. Also we evaluate relation between Four Parameter Gamma Function, p-k Gamma Function and Classical Gamma Function. With some conditions we can evaluate Four Parameter Gamma Function in term of Hypergeometric function. △ Less

Submitted 8 March, 2018; v1 submitted 15 April, 2017; originally announced April 2017.

Comments: New Paper

MSC Class: 33B15

arXiv:1701.01052 [pdf, ps, other]

Two Parameter Gamma Function and its Properties

Authors: Kuldeep Singh Gehlot

Abstract: In this paper we introduce the Two Parameter Gamma Function, Beta Function and Pochhammer Symbol. We named them, as p - k Gamma Function, p - k Beta Function and p - k Pochhammer Symbol and denoted as $_{p}Γ_{k}(x), $ $_{p}B_{k}(x,y) $ and $_{p}(x)_{n,k} $ respectively. We prove the several identities for $_{p}Γ_{k}(x), $ $_{p}B_{k}(x,y) $ and $_{p}(x)_{n,k} $ those satisfied by the classical Gamm… ▽ More In this paper we introduce the Two Parameter Gamma Function, Beta Function and Pochhammer Symbol. We named them, as p - k Gamma Function, p - k Beta Function and p - k Pochhammer Symbol and denoted as $_{p}Γ_{k}(x), $ $_{p}B_{k}(x,y) $ and $_{p}(x)_{n,k} $ respectively. We prove the several identities for $_{p}Γ_{k}(x), $ $_{p}B_{k}(x,y) $ and $_{p}(x)_{n,k} $ those satisfied by the classical Gamma, Beta and Pochhammer Symbol. Also we provide the integral representation for the $_{p}Γ_{k}(x) $ and $_{p}B_{k}(x,y) $. △ Less

Submitted 3 January, 2017; originally announced January 2017.

MSC Class: 33B15

Showing 1–9 of 9 results for author: Gehlot, S