Search | arXiv e-print repository

doi 10.1109/TAES.2023.3337768

Contrastive Learning and Cycle Consistency-based Transductive Transfer Learning for Target Annotation

Authors: Shoaib Meraj Sami, Md Mahedi Hasan, Nasser M. Nasrabadi, Raghuveer Rao

Abstract: Annotating automatic target recognition (ATR) is a highly challenging task, primarily due to the unavailability of labeled data in the target domain. Hence, it is essential to construct an optimal target domain classifier by utilizing the labeled information of the source domain images. The transductive transfer learning (TTL) method that incorporates a CycleGAN-based unpaired domain translation n… ▽ More Annotating automatic target recognition (ATR) is a highly challenging task, primarily due to the unavailability of labeled data in the target domain. Hence, it is essential to construct an optimal target domain classifier by utilizing the labeled information of the source domain images. The transductive transfer learning (TTL) method that incorporates a CycleGAN-based unpaired domain translation network has been previously proposed in the literature for effective ATR annotation. Although this method demonstrates great potential for ATR, it severely suffers from lower annotation performance, higher Fréchet Inception Distance (FID) score, and the presence of visual artifacts in the synthetic images. To address these issues, we propose a hybrid contrastive learning base unpaired domain translation (H-CUT) network that achieves a significantly lower FID score. It incorporates both attention and entropy to emphasize the domain-specific region, a noisy feature mixup module to generate high variational synthetic negative patches, and a modulated noise contrastive estimation (MoNCE) loss to reweight all negative patches using optimal transport for better performance. Our proposed contrastive learning and cycle-consistency-based TTL (C3TTL) framework consists of two H-CUT networks and two classifiers. It simultaneously optimizes cycle-consistency, MoNCE, and identity losses. In C3TTL, two H-CUT networks have been employed through a bijection map** to feed the reconstructed source domain images into a pretrained classifier to guide the optimal target domain classifier. Extensive experimental analysis conducted on three ATR datasets demonstrates that the proposed C3TTL method is effective in annotating civilian and military vehicles, as well as ship targets. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: This Paper is Accepted in IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS. This Arxiv version is an older version than the reviewed version

arXiv:2312.09367 [pdf, other]

Text-Guided Face Recognition using Multi-Granularity Cross-Modal Contrastive Learning

Authors: Md Mahedi Hasan, Shoaib Meraj Sami, Nasser Nasrabadi

Abstract: State-of-the-art face recognition (FR) models often experience a significant performance drop when dealing with facial images in surveillance scenarios where images are in low quality and often corrupted with noise. Leveraging facial characteristics, such as freckles, scars, gender, and ethnicity, becomes highly beneficial in improving FR performance in such scenarios. In this paper, we introduce… ▽ More State-of-the-art face recognition (FR) models often experience a significant performance drop when dealing with facial images in surveillance scenarios where images are in low quality and often corrupted with noise. Leveraging facial characteristics, such as freckles, scars, gender, and ethnicity, becomes highly beneficial in improving FR performance in such scenarios. In this paper, we introduce text-guided face recognition (TGFR) to analyze the impact of integrating facial attributes in the form of natural language descriptions. We hypothesize that adding semantic information into the loop can significantly improve the image understanding capability of an FR algorithm compared to other soft biometrics. However, learning a discriminative joint embedding within the multimodal space poses a considerable challenge due to the semantic gap in the unaligned image-text representations, along with the complexities arising from ambiguous and incoherent textual descriptions of the face. To address these challenges, we introduce a face-caption alignment module (FCAM), which incorporates cross-modal contrastive losses across multiple granularities to maximize the mutual information between local and global features of the face-caption pair. Within FCAM, we refine both facial and textual features for learning aligned and discriminative features. We also design a face-caption fusion module (FCFM) that applies fine-grained interactions and coarse-grained associations among cross-modal features. Through extensive experiments conducted on three face-caption datasets, proposed TGFR demonstrates remarkable improvements, particularly on low-quality images, over existing FR models and outperforms other related methods and benchmarks. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted at IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024

arXiv:2305.13886 [pdf, other]

Deep Transductive Transfer Learning for Automatic Target Recognition

Authors: Shoaib M. Sami, Nasser M. Nasrabadi, Raghuveer Rao

Abstract: One of the major obstacles in designing an automatic target recognition (ATR) algorithm, is that there are often labeled images in one domain (i.e., infrared source domain) but no annotated images in the other target domains (i.e., visible, SAR, LIDAR). Therefore, automatically annotating these images is essential to build a robust classifier in the target domain based on the labeled images of the… ▽ More One of the major obstacles in designing an automatic target recognition (ATR) algorithm, is that there are often labeled images in one domain (i.e., infrared source domain) but no annotated images in the other target domains (i.e., visible, SAR, LIDAR). Therefore, automatically annotating these images is essential to build a robust classifier in the target domain based on the labeled images of the source domain. Transductive transfer learning is an effective way to adapt a network to a new target domain by utilizing a pretrained ATR network in the source domain. We propose an unpaired transductive transfer learning framework where a CycleGAN model and a well-trained ATR classifier in the source domain are used to construct an ATR classifier in the target domain without having any labeled data in the target domain. We employ a CycleGAN model to transfer the mid-wave infrared (MWIR) images to visible (VIS) domain images (or visible to MWIR domain). To train the transductive CycleGAN, we optimize a cost function consisting of the adversarial, identity, cycle-consistency, and categorical cross-entropy loss for both the source and target classifiers. In this paper, we perform a detailed experimental analysis on the challenging DSIAC ATR dataset. The dataset consists of ten classes of vehicles at different poses and distances ranging from 1-5 kilometers on both the MWIR and VIS domains. In our experiment, we assume that the images in the VIS domain are the unlabeled target dataset. We first detect and crop the vehicles from the raw images and then project them into a common distance of 2 kilometers. Our proposed transductive CycleGAN achieves 71.56% accuracy in classifying the visible domain vehicles in the DSIAC ATR dataset. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: 10 pages, 5 figures

Journal ref: SPIE Defense & Commercial Sensing 2023, Conference 12521, Automatic target recognition XXXIII, Orlando, Florida

arXiv:2209.08130 [pdf, other]

Robust Ensemble Morph Detection with Domain Generalization

Authors: Hossein Kashiani, Shoaib Meraj Sami, Sobhan Soleymani, Nasser M. Nasrabadi

Abstract: Although a substantial amount of studies is dedicated to morph detection, most of them fail to generalize for morph faces outside of their training paradigm. Moreover, recent morph detection methods are highly vulnerable to adversarial attacks. In this paper, we intend to learn a morph detection model with high generalization to a wide range of morphing attacks and high robustness against differen… ▽ More Although a substantial amount of studies is dedicated to morph detection, most of them fail to generalize for morph faces outside of their training paradigm. Moreover, recent morph detection methods are highly vulnerable to adversarial attacks. In this paper, we intend to learn a morph detection model with high generalization to a wide range of morphing attacks and high robustness against different adversarial attacks. To this aim, we develop an ensemble of convolutional neural networks (CNNs) and Transformer models to benefit from their capabilities simultaneously. To improve the robust accuracy of the ensemble model, we employ multi-perturbation adversarial training and generate adversarial examples with high transferability for several single models. Our exhaustive evaluations demonstrate that the proposed robust ensemble model generalizes to several morphing attacks and face datasets. In addition, we validate that our robust ensemble model gain better robustness against several adversarial attacks while outperforming the state-of-the-art studies. △ Less

Submitted 16 September, 2022; originally announced September 2022.

Comments: Accepted in IJCB 2022

arXiv:2208.11822 [pdf, ps, other]

Benchmarking Human Face Similarity Using Identical Twins

Authors: Shoaib Meraj Sami, John McCauley, Sobhan Soleymani, Nasser Nasrabadi, Jeremy Dawson

Abstract: The problem of distinguishing identical twins and non-twin look-alikes in automated facial recognition (FR) applications has become increasingly important with the widespread adoption of facial biometrics. Due to the high facial similarity of both identical twins and look-alikes, these face pairs represent the hardest cases presented to facial recognition tools. This work presents an application o… ▽ More The problem of distinguishing identical twins and non-twin look-alikes in automated facial recognition (FR) applications has become increasingly important with the widespread adoption of facial biometrics. Due to the high facial similarity of both identical twins and look-alikes, these face pairs represent the hardest cases presented to facial recognition tools. This work presents an application of one of the largest twin datasets compiled to date to address two FR challenges: 1) determining a baseline measure of facial similarity between identical twins and 2) applying this similarity measure to determine the impact of doppelgangers, or look-alikes, on FR performance for large face datasets. The facial similarity measure is determined via a deep convolutional neural network. This network is trained on a tailored verification task designed to encourage the network to group together highly similar face pairs in the embedding space and achieves a test AUC of 0.9799. The proposed network provides a quantitative similarity score for any two given faces and has been applied to large-scale face datasets to identify similar face pairs. An additional analysis which correlates the comparison score returned by a facial recognition tool and the similarity score returned by the proposed network has also been performed. △ Less

Submitted 24 August, 2022; originally announced August 2022.

Comments: 34 pages, 48 figures, Accepted in IET Biometrics Journal (5th August 2022)

arXiv:2110.11467 [pdf]

Power Transformer Fault Diagnosis with Intrinsic Time-scale Decomposition and XGBoost Classifier

Authors: Shoaib Meraj Sami, Mohammed Imamul Hassan Bhuiyan

Abstract: An intrinsic time-scale decomposition (ITD) based method for power transformer fault diagnosis is proposed. Dissolved gas analysis (DGA) parameters are ranked according to their skewness, and then ITD based features extraction is performed. An optimal set of PRC features are determined by an XGBoost classifier. For classification purpose, an XGBoost classifier is used to the optimal PRC features s… ▽ More An intrinsic time-scale decomposition (ITD) based method for power transformer fault diagnosis is proposed. Dissolved gas analysis (DGA) parameters are ranked according to their skewness, and then ITD based features extraction is performed. An optimal set of PRC features are determined by an XGBoost classifier. For classification purpose, an XGBoost classifier is used to the optimal PRC features set. The proposed method's performance in classification is studied using publicly available DGA data of 376 power transformers and employing an XGBoost classifier. The Proposed method achieves more than 95% accuracy and high sensitivity and F1-score, better than conventional methods and some recent machine learning-based fault diagnosis approaches. Moreover, it gives better Cohen Kappa and F1-score as compared to the recently introduced EMD-based hierarchical technique for fault diagnosis in power transformers. △ Less

Submitted 21 October, 2021; originally announced October 2021.

Comments: 09 pages, 3 figures, conference

arXiv:2110.11451 [pdf]

doi 10.1109/ICECE51571.2020.9393037

An EMD-based Method for the Detection of Power Transformer Faults with a Hierarchical Ensemble Classifier

Authors: Shoaib Meraj Sami, Mohammed Imamul Hassan Bhuiyan

Abstract: In this paper, an Empirical Mode Decomposition-based method is proposed for the detection of transformer faults from Dissolve gas analysis (DGA) data. Ratio-based DGA parameters are ranked using their skewness. Optimal sets of intrinsic mode function coefficients are obtained from the ranked DGA parameters. A Hierarchical classification scheme employing XGBoost is presented for classifying the fea… ▽ More In this paper, an Empirical Mode Decomposition-based method is proposed for the detection of transformer faults from Dissolve gas analysis (DGA) data. Ratio-based DGA parameters are ranked using their skewness. Optimal sets of intrinsic mode function coefficients are obtained from the ranked DGA parameters. A Hierarchical classification scheme employing XGBoost is presented for classifying the features to identify six different categories of transformer faults. Performance of the Proposed Method is studied for publicly available DGA data of 377 transformers. It is shown that the proposed method can yield more than 90% sensitivity and accuracy in the detection of transformer faults, a superior performance as compared to conventional methods as well as several existing machine learning-based techniques. △ Less

Submitted 21 October, 2021; originally announced October 2021.

Comments: 04 pages, 04 figures, Conference

Showing 1–7 of 7 results for author: Sami, S M