Search | arXiv e-print repository

Transformation-Dependent Adversarial Attacks

Authors: Yaoteng Tan, Zikui Cai, M. Salman Asif

Abstract: We introduce transformation-dependent adversarial attacks, a new class of threats where a single additive perturbation can trigger diverse, controllable mis-predictions by systematically transforming the input (e.g., scaling, blurring, compression). Unlike traditional attacks with static effects, our perturbations embed metamorphic properties to enable different adversarial attacks as a function o… ▽ More We introduce transformation-dependent adversarial attacks, a new class of threats where a single additive perturbation can trigger diverse, controllable mis-predictions by systematically transforming the input (e.g., scaling, blurring, compression). Unlike traditional attacks with static effects, our perturbations embed metamorphic properties to enable different adversarial attacks as a function of the transformation parameters. We demonstrate the transformation-dependent vulnerability across models (e.g., convolutional networks and vision transformers) and vision tasks (e.g., image classification and object detection). Our proposed geometric and photometric transformations enable a range of targeted errors from one crafted input (e.g., higher than 90% attack success rate for classifiers). We analyze effects of model architecture and type/variety of transformations on attack effectiveness. This work forces a paradigm shift by redefining adversarial inputs as dynamic, controllable threats. We highlight the need for robust defenses against such multifaceted, chameleon-like perturbations that current techniques are ill-prepared for. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.02575 [pdf, other]

Cross-Modal Safety Alignment: Is textual unlearning all you need?

Authors: Trishna Chakraborty, Erfan Shayegani, Zikui Cai, Nael Abu-Ghazaleh, M. Salman Asif, Yue Dong, Amit K. Roy-Chowdhury, Chengyu Song

Abstract: Recent studies reveal that integrating new modalities into Large Language Models (LLMs), such as Vision-Language Models (VLMs), creates a new attack surface that bypasses existing safety training techniques like Supervised Fine-tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF). While further SFT and RLHF-based safety training can be conducted in multi-modal settings, collecting mu… ▽ More Recent studies reveal that integrating new modalities into Large Language Models (LLMs), such as Vision-Language Models (VLMs), creates a new attack surface that bypasses existing safety training techniques like Supervised Fine-tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF). While further SFT and RLHF-based safety training can be conducted in multi-modal settings, collecting multi-modal training datasets poses a significant challenge. Inspired by the structural design of recent multi-modal models, where, regardless of the combination of input modalities, all inputs are ultimately fused into the language space, we aim to explore whether unlearning solely in the textual domain can be effective for cross-modality safety alignment. Our evaluation across six datasets empirically demonstrates the transferability -- textual unlearning in VLMs significantly reduces the Attack Success Rate (ASR) to less than 8\% and in some cases, even as low as nearly 2\% for both text-based and vision-text-based attacks, alongside preserving the utility. Moreover, our experiments show that unlearning with a multi-modal dataset offers no potential benefits but incurs significantly increased computational demands, possibly up to 6 times higher. △ Less

Submitted 27 May, 2024; originally announced June 2024.

arXiv:2404.08921 [pdf, other]

PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos

Authors: Qi Zhao, M. Salman Asif, Zhan Ma

Abstract: The primary focus of Neural Representation for Videos (NeRV) is to effectively model its spatiotemporal consistency. However, current NeRV systems often face a significant issue of spatial inconsistency, leading to decreased perceptual quality. To address this issue, we introduce the Pyramidal Neural Representation for Videos (PNeRV), which is built on a multi-scale information connection and comp… ▽ More The primary focus of Neural Representation for Videos (NeRV) is to effectively model its spatiotemporal consistency. However, current NeRV systems often face a significant issue of spatial inconsistency, leading to decreased perceptual quality. To address this issue, we introduce the Pyramidal Neural Representation for Videos (PNeRV), which is built on a multi-scale information connection and comprises a lightweight rescaling operator, Kronecker Fully-connected layer (KFc), and a Benign Selective Memory (BSM) mechanism. The KFc, inspired by the tensor decomposition of the vanilla Fully-connected layer, facilitates low-cost rescaling and global correlation modeling. BSM merges high-level features with granular ones adaptively. Furthermore, we provide an analysis based on the Universal Approximation Theory of the NeRV system and validate the effectiveness of the proposed PNeRV.We conducted comprehensive experiments to demonstrate that PNeRV surpasses the performance of contemporary NeRV models, achieving the best results in video regression on UVG and DAVIS under various metrics (PSNR, SSIM, LPIPS, and FVD). Compared to vanilla NeRV, PNeRV achieves a +4.49 dB gain in PSNR and a 231% increase in FVD on UVG, along with a +3.28 dB PSNR and 634% FVD increase on DAVIS. △ Less

Submitted 13 April, 2024; originally announced April 2024.

arXiv:2403.10374 [pdf, other]

doi 10.1109/CAMSAP58249.2023.10403502

Overcoming Distribution Shifts in Plug-and-Play Methods with Test-Time Training

Authors: Edward P. Chandler, Shirin Shoushtari, Jiaming Liu, M. Salman Asif, Ulugbek S. Kamilov

Abstract: Plug-and-Play Priors (PnP) is a well-known class of methods for solving inverse problems in computational imaging. PnP methods combine physical forward models with learned prior models specified as image denoisers. A common issue with the learned models is that of a performance drop when there is a distribution shift between the training and testing data. Test-time training (TTT) was recently prop… ▽ More Plug-and-Play Priors (PnP) is a well-known class of methods for solving inverse problems in computational imaging. PnP methods combine physical forward models with learned prior models specified as image denoisers. A common issue with the learned models is that of a performance drop when there is a distribution shift between the training and testing data. Test-time training (TTT) was recently proposed as a general strategy for improving the performance of learned models when training and testing data come from different distributions. In this paper, we propose PnP-TTT as a new method for overcoming distribution shifts in PnP. PnP-TTT uses deep equilibrium learning (DEQ) for optimizing a self-supervised loss at the fixed points of PnP iterations. PnP-TTT can be directly applied on a single test sample to improve the generalization of PnP. We show through simulations that given a sufficient number of measurements, PnP-TTT enables the use of image priors trained on natural images for image reconstruction in magnetic resonance imaging (MRI). △ Less

Submitted 15 March, 2024; originally announced March 2024.

Journal ref: 2023 IEEE 9th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), 2023, pg. 186-190

arXiv:2401.13722 [pdf, other]

Proactive Emotion Tracker: AI-Driven Continuous Mood and Emotion Monitoring

Authors: Mohammad Asif, Sudhakar Mishra, Ankush Sonker, Sanidhya Gupta, Somesh Kumar Maurya, Uma Shanker Tiwary

Abstract: This research project aims to tackle the growing mental health challenges in today's digital age. It employs a modified pre-trained BERT model to detect depressive text within social media and users' web browsing data, achieving an impressive 93% test accuracy. Simultaneously, the project aims to incorporate physiological signals from wearable devices, such as smartwatches and EEG sensors, to prov… ▽ More This research project aims to tackle the growing mental health challenges in today's digital age. It employs a modified pre-trained BERT model to detect depressive text within social media and users' web browsing data, achieving an impressive 93% test accuracy. Simultaneously, the project aims to incorporate physiological signals from wearable devices, such as smartwatches and EEG sensors, to provide long-term tracking and prognosis of mood disorders and emotional states. This comprehensive approach holds promise for enhancing early detection of depression and advancing overall mental health outcomes. △ Less

Submitted 24 January, 2024; originally announced January 2024.

arXiv:2401.07892 [pdf, other]

Deep Fuzzy Framework for Emotion Recognition using EEG Signals and Emotion Representation in Type-2 Fuzzy VAD Space

Authors: Mohammad Asif, Noman Ali, Sudhakar Mishra, Anushka Dandawate, Uma Shanker Tiwary

Abstract: Recently, the representation of emotions in the Valence, Arousal and Dominance (VAD) space has drawn enough attention. However, the complex nature of emotions and the subjective biases in self-reported values of VAD make the emotion model too specific to a particular experiment. This study aims to develop a generic model representing emotions using a fuzzy VAD space and improve emotion recognition… ▽ More Recently, the representation of emotions in the Valence, Arousal and Dominance (VAD) space has drawn enough attention. However, the complex nature of emotions and the subjective biases in self-reported values of VAD make the emotion model too specific to a particular experiment. This study aims to develop a generic model representing emotions using a fuzzy VAD space and improve emotion recognition by utilizing this representation. We partitioned the crisp VAD space into a fuzzy VAD space using low, medium and high type-2 fuzzy dimensions to represent emotions. A framework that integrates fuzzy VAD space with EEG data has been developed to recognize emotions. The EEG features were extracted using spatial and temporal feature vectors from time-frequency spectrograms, while the subject-reported values of VAD were also considered. The study was conducted on the DENS dataset, which includes a wide range of twenty-four emotions, along with EEG data and subjective ratings. The study was validated using various deep fuzzy framework models based on type-2 fuzzy representation, cuboid probabilistic lattice representation and unsupervised fuzzy emotion clusters. These models resulted in emotion recognition accuracy of 96.09\%, 95.75\% and 95.31\%, respectively, for the classes of 24 emotions. The study also included an ablation study, one with crisp VAD space and the other without VAD space. The result with crisp VAD space performed better, while the deep fuzzy framework outperformed both models. The model was extended to predict cross-subject cases of emotions, and the results with 78.37\% accuracy are promising, proving the generality of our model. The generic nature of the developed model, along with its successful cross-subject predictions, gives direction for real-world applications in the areas such as affective computing, human-computer interaction, and mental health monitoring. △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2312.16221 [pdf, other]

STRIDE: Single-video based Temporally Continuous Occlusion Robust 3D Pose Estimation

Authors: Rohit Lal, Saketh Bachu, Yash Garg, Arindam Dutta, Calvin-Khang Ta, Dripta S. Raychaudhuri, Hannah Dela Cruz, M. Salman Asif, Amit K. Roy-Chowdhury

Abstract: The capability to accurately estimate 3D human poses is crucial for diverse fields such as action recognition, gait recognition, and virtual/augmented reality. However, a persistent and significant challenge within this field is the accurate prediction of human poses under conditions of severe occlusion. Traditional image-based estimators struggle with heavy occlusions due to a lack of temporal co… ▽ More The capability to accurately estimate 3D human poses is crucial for diverse fields such as action recognition, gait recognition, and virtual/augmented reality. However, a persistent and significant challenge within this field is the accurate prediction of human poses under conditions of severe occlusion. Traditional image-based estimators struggle with heavy occlusions due to a lack of temporal context, resulting in inconsistent predictions. While video-based models benefit from processing temporal data, they encounter limitations when faced with prolonged occlusions that extend over multiple frames. This challenge arises because these models struggle to generalize beyond their training datasets, and the variety of occlusions is hard to capture in the training data. Addressing these challenges, we propose STRIDE (Single-video based TempoRally contInuous occlusion Robust 3D Pose Estimation), a novel Test-Time Training (TTT) approach to fit a human motion prior for each video. This approach specifically handles occlusions that were not encountered during the model's training. By employing STRIDE, we can refine a sequence of noisy initial pose estimates into accurate, temporally coherent poses during test time, effectively overcoming the limitations of prior methods. Our framework demonstrates flexibility by being model-agnostic, allowing us to use any off-the-shelf 3D pose estimation method for improving robustness and temporal consistency. We validate STRIDE's efficacy through comprehensive experiments on challenging datasets like Occluded Human3.6M, Human3.6M, and OCMotion, where it not only outperforms existing single-image and video-based pose estimation models but also showcases superior handling of substantial occlusions, achieving fast, robust, accurate, and temporally consistent 3D pose estimates. △ Less

Submitted 13 March, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

arXiv:2312.03864 [pdf, other]

Geometry Matching for Multi-Embodiment Gras**

Authors: Maria Attarian, Muhammad Adil Asif, **gzhou Liu, Ruthrash Hari, Animesh Garg, Igor Gilitschenski, Jonathan Tompson

Abstract: Many existing learning-based gras** approaches concentrate on a single embodiment, provide limited generalization to higher DoF end-effectors and cannot capture a diverse set of grasp modes. We tackle the problem of gras** using multiple embodiments by learning rich geometric representations for both objects and end-effectors using Graph Neural Networks. Our novel method - GeoMatch - applies s… ▽ More Many existing learning-based gras** approaches concentrate on a single embodiment, provide limited generalization to higher DoF end-effectors and cannot capture a diverse set of grasp modes. We tackle the problem of gras** using multiple embodiments by learning rich geometric representations for both objects and end-effectors using Graph Neural Networks. Our novel method - GeoMatch - applies supervised learning on gras** data from multiple embodiments, learning end-to-end contact point likelihood maps as well as conditional autoregressive predictions of grasps keypoint-by-keypoint. We compare our method against baselines that support multiple embodiments. Our approach performs better across three end-effectors, while also producing diverse grasps. Examples, including real robot demos, can be found at geo-match.github.io. △ Less

Submitted 6 December, 2023; originally announced December 2023.

Journal ref: 7th Annual Conference on Robot Learning, 2023

arXiv:2312.03140 [pdf, other]

FlexModel: A Framework for Interpretability of Distributed Large Language Models

Authors: Matthew Choi, Muhammad Adil Asif, John Willes, David Emerson

Abstract: With the growth of large language models, now incorporating billions of parameters, the hardware prerequisites for their training and deployment have seen a corresponding increase. Although existing tools facilitate model parallelization and distributed training, deeper model interactions, crucial for interpretability and responsible AI techniques, still demand thorough knowledge of distributed co… ▽ More With the growth of large language models, now incorporating billions of parameters, the hardware prerequisites for their training and deployment have seen a corresponding increase. Although existing tools facilitate model parallelization and distributed training, deeper model interactions, crucial for interpretability and responsible AI techniques, still demand thorough knowledge of distributed computing. This often hinders contributions from researchers with machine learning expertise but limited distributed computing background. Addressing this challenge, we present FlexModel, a software package providing a streamlined interface for engaging with models distributed across multi-GPU and multi-node configurations. The library is compatible with existing model distribution libraries and encapsulates PyTorch models. It exposes user-registerable HookFunctions to facilitate straightforward interaction with distributed model internals, bridging the gap between distributed and single-device model paradigms. Primarily, FlexModel enhances accessibility by democratizing model interactions and promotes more inclusive research in the domain of large-scale neural networks. The package is found at https://github.com/VectorInstitute/flex_model. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: 14 pages, 8 figures. To appear at the Socially Responsible Language Modelling Research (SoLaR) Workshop, 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2310.06124 [pdf, other]

Factorized Tensor Networks for Multi-Task and Multi-Domain Learning

Authors: Yash Garg, Nebiyou Yismaw, Rakib Hyder, Ashley Prater-Bennette, M. Salman Asif

Abstract: Multi-task and multi-domain learning methods seek to learn multiple tasks/domains, jointly or one after another, using a single unified network. The key challenge and opportunity is to exploit shared information across tasks and domains to improve the efficiency of the unified network. The efficiency can be in terms of accuracy, storage cost, computation, or sample complexity. In this paper, we pr… ▽ More Multi-task and multi-domain learning methods seek to learn multiple tasks/domains, jointly or one after another, using a single unified network. The key challenge and opportunity is to exploit shared information across tasks and domains to improve the efficiency of the unified network. The efficiency can be in terms of accuracy, storage cost, computation, or sample complexity. In this paper, we propose a factorized tensor network (FTN) that can achieve accuracy comparable to independent single-task/domain networks with a small number of additional parameters. FTN uses a frozen backbone network from a source model and incrementally adds task/domain-specific low-rank tensor factors to the shared frozen network. This approach can adapt to a large number of target domains and tasks without catastrophic forgetting. Furthermore, FTN requires a significantly smaller number of task-specific parameters compared to existing methods. We performed experiments on widely used multi-domain and multi-task datasets. We show the experiments on convolutional-based architecture with different backbones and on transformer-based architecture. We observed that FTN achieves similar accuracy as single-task/domain methods while using only a fraction of additional parameters per task. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2310.03986 [pdf, other]

Robust Multimodal Learning with Missing Modalities via Parameter-Efficient Adaptation

Authors: Md Kaykobad Reza, Ashley Prater-Bennette, M. Salman Asif

Abstract: Multimodal learning seeks to utilize data from multiple sources to improve the overall performance of downstream tasks. It is desirable for redundancies in the data to make multimodal systems robust to missing or corrupted observations in some correlated modalities. However, we observe that the performance of several existing multimodal networks significantly deteriorates if one or multiple modali… ▽ More Multimodal learning seeks to utilize data from multiple sources to improve the overall performance of downstream tasks. It is desirable for redundancies in the data to make multimodal systems robust to missing or corrupted observations in some correlated modalities. However, we observe that the performance of several existing multimodal networks significantly deteriorates if one or multiple modalities are absent at test time. To enable robustness to missing modalities, we propose a simple and parameter-efficient adaptation procedure for pretrained multimodal networks. In particular, we exploit modulation of intermediate features to compensate for the missing modalities. We demonstrate that such adaptation can partially bridge performance drop due to missing modalities and outperform independent, dedicated networks trained for the available modality combinations in some cases. The proposed adaptation requires extremely small number of parameters (e.g., fewer than 0.7% of the total parameters) and applicable to a wide range of modality combinations and tasks. We conduct a series of experiments to highlight the missing modality robustness of our proposed method on 5 different datasets for multimodal semantic segmentation, multimodal material segmentation, and multimodal sentiment analysis tasks. Our proposed method demonstrates versatility across various tasks and datasets, and outperforms existing methods for robust multimodal learning with missing modalities. △ Less

Submitted 26 February, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

Comments: 22 pages, 3 figures, 11 tables

arXiv:2310.00133 [pdf, other]

Prior Mismatch and Adaptation in PnP-ADMM with a Nonconvex Convergence Analysis

Authors: Shirin Shoushtari, Jiaming Liu, Edward P. Chandler, M. Salman Asif, Ulugbek S. Kamilov

Abstract: Plug-and-Play (PnP) priors is a widely-used family of methods for solving imaging inverse problems by integrating physical measurement models with image priors specified using image denoisers. PnP methods have been shown to achieve state-of-the-art performance when the prior is obtained using powerful deep denoisers. Despite extensive work on PnP, the topic of distribution mismatch between the tra… ▽ More Plug-and-Play (PnP) priors is a widely-used family of methods for solving imaging inverse problems by integrating physical measurement models with image priors specified using image denoisers. PnP methods have been shown to achieve state-of-the-art performance when the prior is obtained using powerful deep denoisers. Despite extensive work on PnP, the topic of distribution mismatch between the training and testing data has often been overlooked in the PnP literature. This paper presents a set of new theoretical and numerical results on the topic of prior distribution mismatch and domain adaptation for alternating direction method of multipliers (ADMM) variant of PnP. Our theoretical result provides an explicit error bound for PnP-ADMM due to the mismatch between the desired denoiser and the one used for inference. Our analysis contributes to the work in the area by considering the mismatch under nonconvex data-fidelity terms and expansive denoisers. Our first set of numerical results quantifies the impact of the prior distribution mismatch on the performance of PnP-ADMM on the problem of image super-resolution. Our second set of numerical results considers a simple and effective domain adaption strategy that closes the performance gap due to the use of mismatched denoisers. Our results suggest the relative robustness of PnP-ADMM to prior distribution mismatch, while also showing that the performance gap can be significantly reduced with few training samples from the desired distribution. △ Less

Submitted 29 September, 2023; originally announced October 2023.

arXiv:2309.04001 [pdf, other]

doi 10.1109/OJSP.2024.3389812

MMSFormer: Multimodal Transformer for Material and Semantic Segmentation

Authors: Md Kaykobad Reza, Ashley Prater-Bennette, M. Salman Asif

Abstract: Leveraging information across diverse modalities is known to enhance performance on multimodal segmentation tasks. However, effectively fusing information from different modalities remains challenging due to the unique characteristics of each modality. In this paper, we propose a novel fusion strategy that can effectively fuse information from different modality combinations. We also propose a new… ▽ More Leveraging information across diverse modalities is known to enhance performance on multimodal segmentation tasks. However, effectively fusing information from different modalities remains challenging due to the unique characteristics of each modality. In this paper, we propose a novel fusion strategy that can effectively fuse information from different modality combinations. We also propose a new model named Multi-Modal Segmentation TransFormer (MMSFormer) that incorporates the proposed fusion strategy to perform multimodal material and semantic segmentation tasks. MMSFormer outperforms current state-of-the-art models on three different datasets. As we begin with only one input modality, performance improves progressively as additional modalities are incorporated, showcasing the effectiveness of the fusion block in combining useful information from diverse input modalities. Ablation studies show that different modules in the fusion block are crucial for overall model performance. Furthermore, our ablation studies also highlight the capacity of different input modalities to improve performance in the identification of different types of materials. The code and pretrained models will be made available at https://github.com/csiplab/MMSFormer. △ Less

Submitted 7 April, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

Comments: Accepted by IEEE Open Journal of Signal Processing. 15 pages, 3 figures, 9 tables

arXiv:2305.19379 [pdf, other]

doi 10.1109/ICSEC59635.2023.10329776

Inter Subject Emotion Recognition Using Spatio-Temporal Features From EEG Signal

Authors: Mohammad Asif, Diya Srivastava, Aditya Gupta, Uma Shanker Tiwary

Abstract: Inter-subject or subject-independent emotion recognition has been a challenging task in affective computing. This work is about an easy-to-implement emotion recognition model that classifies emotions from EEG signals subject independently. It is based on the famous EEGNet architecture, which is used in EEG-related BCIs. We used the Dataset on Emotion using Naturalistic Stimuli (DENS) dataset. The… ▽ More Inter-subject or subject-independent emotion recognition has been a challenging task in affective computing. This work is about an easy-to-implement emotion recognition model that classifies emotions from EEG signals subject independently. It is based on the famous EEGNet architecture, which is used in EEG-related BCIs. We used the Dataset on Emotion using Naturalistic Stimuli (DENS) dataset. The dataset contains the Emotional Events -- the precise information of the emotion timings that participants felt. The model is a combination of regular, depthwise and separable convolution layers of CNN to classify the emotions. The model has the capacity to learn the spatial features of the EEG channels and the temporal features of the EEG signals variability with time. The model is evaluated for the valence space ratings. The model achieved an accuracy of 73.04%. △ Less

Submitted 27 May, 2023; originally announced May 2023.

Report number: 2023 27th International Computer Science and Engineering Conference (ICSEC)

arXiv:2305.09214 [pdf]

doi 10.1007/s11042-020-10286-w

PIQI: Perceptual Image Quality Index based on Ensemble of Gaussian Process Regression

Authors: Nisar Ahmed, Hafiz Muhammad Shahzad Asif, Hassan Khalid

Abstract: Digital images contain a lot of redundancies, therefore, compression techniques are applied to reduce the image size without loss of reasonable image quality. Same become more prominent in the case of videos which contains image sequences and higher compression ratios are achieved in low throughput networks. Assessment of quality of images in such scenarios has become of particular interest. Subje… ▽ More Digital images contain a lot of redundancies, therefore, compression techniques are applied to reduce the image size without loss of reasonable image quality. Same become more prominent in the case of videos which contains image sequences and higher compression ratios are achieved in low throughput networks. Assessment of quality of images in such scenarios has become of particular interest. Subjective evaluation in most of the scenarios is infeasible so objective evaluation is preferred. Among the three objective quality measures, full-reference and reduced-reference methods require an original image in some form to calculate the image quality which is unfeasible in scenarios such as broadcasting, acquisition or enhancement. Therefore, a no-reference Perceptual Image Quality Index (PIQI) is proposed in this paper to assess the quality of digital images which calculates luminance and gradient statistics along with mean subtracted contrast normalized products in multiple scales and color spaces. These extracted features are provided to a stacked ensemble of Gaussian Process Regression (GPR) to perform the perceptual quality evaluation. The performance of the PIQI is checked on six benchmark databases and compared with twelve state-of-the-art methods and competitive results are achieved. The comparison is made based on RMSE, Pearson and Spearman correlation coefficients between ground truth and predicted quality scores. The scores of 0.0552, 0.9802 and 0.9776 are achieved respectively for these metrics on CSIQ database. Two cross-dataset evaluation experiments are performed to check the generalization of PIQI. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Journal ref: AMultimed Tools Appl 80, 15677 to 15700 (2021)

arXiv:2305.09141 [pdf]

doi 10.1007/s00500-021-06662-9

Deep Ensembling for Perceptual Image Quality Assessment

Authors: Nisar Ahmed, H. M. Shahzad Asif, Abdul Rauf Bhatti, Atif Khan

Abstract: Blind image quality assessment is a challenging task particularly due to the unavailability of reference information. Training a deep neural network requires a large amount of training data which is not readily available for image quality. Transfer learning is usually opted to overcome this limitation and different deep architectures are used for this purpose as they learn features differently. Af… ▽ More Blind image quality assessment is a challenging task particularly due to the unavailability of reference information. Training a deep neural network requires a large amount of training data which is not readily available for image quality. Transfer learning is usually opted to overcome this limitation and different deep architectures are used for this purpose as they learn features differently. After extensive experiments, we have designed a deep architecture containing two CNN architectures as its sub-units. Moreover, a self-collected image database BIQ2021 is proposed with 12,000 images having natural distortions. The self-collected database is subjectively scored and is used for model training and validation. It is demonstrated that synthetic distortion databases cannot provide generalization beyond the distortion types used in the database and they are not ideal candidates for general-purpose image quality assessment. Moreover, a large-scale database of 18.75 million images with synthetic distortions is used to pretrain the model and then retrain it on benchmark databases for evaluation. Experiments are conducted on six benchmark databases three of which are synthetic distortion databases (LIVE, CSIQ and TID2013) and three are natural distortion databases (LIVE Challenge Database, CID2013 and KonIQ-10 k). The proposed approach has provided a Pearson correlation coefficient of 0.8992, 0.8472 and 0.9452 subsequently and Spearman correlation coefficient of 0.8863, 0.8408 and 0.9421. Moreover, the performance is demonstrated using perceptually weighted rank correlation to indicate the perceptual superiority of the proposed approach. Multiple experiments are conducted to validate the generalization performance of the proposed model by training on different subsets of the databases and validating on the test subset of BIQ2021 database. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Journal ref: Soft Comput 26, 7601 to 7622 (2022)

arXiv:2304.06544 [pdf, other]

DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos

Authors: Qi Zhao, M. Salman Asif, Zhan Ma

Abstract: Existing implicit neural representation (INR) methods do not fully exploit spatiotemporal redundancies in videos. Index-based INRs ignore the content-specific spatial features and hybrid INRs ignore the contextual dependency on adjacent frames, leading to poor modeling capability for scenes with large motion or dynamics. We analyze this limitation from the perspective of function fitting and revea… ▽ More Existing implicit neural representation (INR) methods do not fully exploit spatiotemporal redundancies in videos. Index-based INRs ignore the content-specific spatial features and hybrid INRs ignore the contextual dependency on adjacent frames, leading to poor modeling capability for scenes with large motion or dynamics. We analyze this limitation from the perspective of function fitting and reveal the importance of frame difference. To use explicit motion information, we propose Difference Neural Representation for Videos (DNeRV), which consists of two streams for content and frame difference. We also introduce a collaborative content unit for effective feature fusion. We test DNeRV for video compression, inpainting, and interpolation. DNeRV achieves competitive results against the state-of-the-art neural compression approaches and outperforms existing implicit methods on downstream inpainting and interpolation for $960 \times 1920$ videos. △ Less

Submitted 13 April, 2023; originally announced April 2023.

arXiv:2303.14304 [pdf, other]

Ensemble-based Blackbox Attacks on Dense Prediction

Authors: Zikui Cai, Yaoteng Tan, M. Salman Asif

Abstract: We propose an approach for adversarial attacks on dense prediction models (such as object detectors and segmentation). It is well known that the attacks generated by a single surrogate model do not transfer to arbitrary (blackbox) victim models. Furthermore, targeted attacks are often more challenging than the untargeted attacks. In this paper, we show that a carefully designed ensemble can create… ▽ More We propose an approach for adversarial attacks on dense prediction models (such as object detectors and segmentation). It is well known that the attacks generated by a single surrogate model do not transfer to arbitrary (blackbox) victim models. Furthermore, targeted attacks are often more challenging than the untargeted attacks. In this paper, we show that a carefully designed ensemble can create effective attacks for a number of victim models. In particular, we show that normalization of the weights for individual models plays a critical role in the success of the attacks. We then demonstrate that by adjusting the weights of the ensemble according to the victim model can further improve the performance of the attacks. We performed a number of experiments for object detectors and segmentation to highlight the significance of the our proposed methods. Our proposed ensemble-based method outperforms existing blackbox attack methods for object detection and segmentation. Finally we show that our proposed method can also generate a single perturbation that can fool multiple blackbox detection and segmentation models simultaneously. Code is available at https://github.com/CSIPlab/EBAD. △ Less

Submitted 24 March, 2023; originally announced March 2023.

Comments: CVPR 2023 Accepted

arXiv:2303.13269 [pdf, other]

Disguise without Disruption: Utility-Preserving Face De-Identification

Authors: Zikui Cai, Zhongpai Gao, Benjamin Planche, Meng Zheng, Terrence Chen, M. Salman Asif, Ziyan Wu

Abstract: With the rise of cameras and smart sensors, humanity generates an exponential amount of data. This valuable information, including underrepresented cases like AI in medical settings, can fuel new deep-learning tools. However, data scientists must prioritize ensuring privacy for individuals in these untapped datasets, especially for images or videos with faces, which are prime targets for identific… ▽ More With the rise of cameras and smart sensors, humanity generates an exponential amount of data. This valuable information, including underrepresented cases like AI in medical settings, can fuel new deep-learning tools. However, data scientists must prioritize ensuring privacy for individuals in these untapped datasets, especially for images or videos with faces, which are prime targets for identification methods. Proposed solutions to de-identify such images often compromise non-identifying facial attributes relevant to downstream tasks. In this paper, we introduce Disguise, a novel algorithm that seamlessly de-identifies facial images while ensuring the usability of the modified data. Unlike previous approaches, our solution is firmly grounded in the domains of differential privacy and ensemble-learning research. Our method involves extracting and substituting depicted identities with synthetic ones, generated using variational mechanisms to maximize obfuscation and non-invertibility. Additionally, we leverage supervision from a mixture-of-experts to disentangle and preserve other utility attributes. We extensively evaluate our method using multiple datasets, demonstrating a higher de-identification rate and superior consistency compared to prior approaches in various downstream tasks. △ Less

Submitted 18 December, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: Accepted at AAAI 2024. Paper + supplementary material

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 38(1), 2024

arXiv:2303.06235 [pdf, other]

Compressive Sensing with Tensorized Autoencoder

Authors: Rakib Hyder, M. Salman Asif

Abstract: Deep networks can be trained to map images into a low-dimensional latent space. In many cases, different images in a collection are articulated versions of one another; for example, same object with different lighting, background, or pose. Furthermore, in many cases, parts of images can be corrupted by noise or missing entries. In this paper, our goal is to recover images without access to the gro… ▽ More Deep networks can be trained to map images into a low-dimensional latent space. In many cases, different images in a collection are articulated versions of one another; for example, same object with different lighting, background, or pose. Furthermore, in many cases, parts of images can be corrupted by noise or missing entries. In this paper, our goal is to recover images without access to the ground-truth (clean) images using the articulations as structural prior of the data. Such recovery problems fall under the domain of compressive sensing. We propose to learn autoencoder with tensor ring factorization on the the embedding space to impose structural constraints on the data. In particular, we use a tensor ring structure in the bottleneck layer of the autoencoder that utilizes the soft labels of the structured dataset. We empirically demonstrate the effectiveness of the proposed approach for inpainting and denoising applications. The resulting method achieves better reconstruction quality compared to other generative prior-based self-supervised recovery approaches for compressive sensing. △ Less

Submitted 10 March, 2023; originally announced March 2023.

Journal ref: ICASSP 2023

arXiv:2301.00321 [pdf, other]

Floods Relevancy and Identification of Location from Twitter Posts using NLP Techniques

Authors: Muhammad Suleman, Muhammad Asif, Tayyab Zamir, Ayaz Mehmood, Jebran Khan, Nasir Ahmad, Kashif Ahmad

Abstract: This paper presents our solutions for the MediaEval 2022 task on DisasterMM. The task is composed of two subtasks, namely (i) Relevance Classification of Twitter Posts (RCTP), and (ii) Location Extraction from Twitter Texts (LETT). The RCTP subtask aims at differentiating flood-related and non-relevant social posts while LETT is a Named Entity Recognition (NER) task and aims at the extraction of l… ▽ More This paper presents our solutions for the MediaEval 2022 task on DisasterMM. The task is composed of two subtasks, namely (i) Relevance Classification of Twitter Posts (RCTP), and (ii) Location Extraction from Twitter Texts (LETT). The RCTP subtask aims at differentiating flood-related and non-relevant social posts while LETT is a Named Entity Recognition (NER) task and aims at the extraction of location information from the text. For RCTP, we proposed four different solutions based on BERT, RoBERTa, Distil BERT, and ALBERT obtaining an F1-score of 0.7934, 0.7970, 0.7613, and 0.7924, respectively. For LETT, we used three models namely BERT, RoBERTa, and Distil BERTA obtaining an F1-score of 0.6256, 0.6744, and 0.6723, respectively. △ Less

Submitted 31 December, 2022; originally announced January 2023.

Comments: 5 pages, 1 figure, and 4 tables

arXiv:2212.07778 [pdf, other]

doi 10.1109/TPAMI.2024.3359326

Efficient Visual Computing with Camera RAW Snapshots

Authors: Zhihao Li, Ming Lu, Xu Zhang, Xin Feng, M. Salman Asif, Zhan Ma

Abstract: Conventional cameras capture image irradiance on a sensor and convert it to RGB images using an image signal processor (ISP). The images can then be used for photography or visual computing tasks in a variety of applications, such as public safety surveillance and autonomous driving. One can argue that since RAW images contain all the captured information, the conversion of RAW to RGB using an ISP… ▽ More Conventional cameras capture image irradiance on a sensor and convert it to RGB images using an image signal processor (ISP). The images can then be used for photography or visual computing tasks in a variety of applications, such as public safety surveillance and autonomous driving. One can argue that since RAW images contain all the captured information, the conversion of RAW to RGB using an ISP is not necessary for visual computing. In this paper, we propose a novel $ρ$-Vision framework to perform high-level semantic understanding and low-level compression using RAW images without the ISP subsystem used for decades. Considering the scarcity of available RAW image datasets, we first develop an unpaired CycleR2R network based on unsupervised CycleGAN to train modular unrolled ISP and inverse ISP (invISP) models using unpaired RAW and RGB images. We can then flexibly generate simulated RAW images (simRAW) using any existing RGB image dataset and finetune different models originally trained for the RGB domain to process real-world camera RAW images. We demonstrate object detection and image compression capabilities in RAW-domain using RAW-domain YOLOv3 and RAW image compressor (RIC) on snapshots from various cameras. Quantitative results reveal that RAW-domain task inference provides better detection accuracy and compression compared to RGB-domain processing. Furthermore, the proposed \r{ho}-Vision generalizes across various camera sensors and different task-specific models. Additional advantages of the proposed $ρ$-Vision that eliminates the ISP are the potential reductions in computations and processing times. △ Less

Submitted 25 January, 2024; v1 submitted 15 December, 2022; originally announced December 2022.

Comments: Accepted by T-PAMI 2024. Homepage: https://njuvision.github.io/rho-vision

arXiv:2211.02637 [pdf, other]

Emotion Recognition With Temporarily Localized 'Emotional Events' in Naturalistic Context

Authors: Mohammad Asif, Sudhakar Mishra, Majithia Tejas Vinodbhai, Uma Shanker Tiwary

Abstract: Emotion recognition using EEG signals is an emerging area of research due to its broad applicability in BCI. Emotional feelings are hard to stimulate in the lab. Emotions do not last long, yet they need enough context to be perceived and felt. However, most EEG-related emotion databases either suffer from emotionally irrelevant details (due to prolonged duration stimulus) or have minimal context d… ▽ More Emotion recognition using EEG signals is an emerging area of research due to its broad applicability in BCI. Emotional feelings are hard to stimulate in the lab. Emotions do not last long, yet they need enough context to be perceived and felt. However, most EEG-related emotion databases either suffer from emotionally irrelevant details (due to prolonged duration stimulus) or have minimal context doubting the feeling of any emotion using the stimulus. We tried to reduce the impact of this trade-off by designing an experiment in which participants are free to report their emotional feelings simultaneously watching the emotional stimulus. We called these reported emotional feelings "Emotional Events" in our Dataset on Emotion with Naturalistic Stimuli (DENS). We used EEG signals to classify emotional events on different combinations of Valence(V) and Arousal(A) dimensions and compared the results with benchmark datasets of DEAP and SEED. STFT is used for feature extraction and used in the classification model consisting of CNN-LSTM hybrid layers. We achieved significantly higher accuracy with our data compared to DEEP and SEED data. We conclude that having precise information about emotional feelings improves the classification accuracy compared to long-duration EEG signals which might be contaminated by mind-wandering. △ Less

Submitted 25 October, 2022; originally announced November 2022.

arXiv:2209.12443 [pdf]

Image Quality Assessment for Foliar Disease Identification (AgroPath)

Authors: Nisar Ahmed, Hafiz Muhammad Shahzad Asif, Gulshan Saleem, Muhammad Usman Younus

Abstract: Crop diseases are a major threat to food security and their rapid identification is important to prevent yield loss. Swift identification of these diseases are difficult due to the lack of necessary infrastructure. Recent advances in computer vision and increasing penetration of smartphones have paved the way for smartphone-assisted disease identification. Most of the plant diseases leave particul… ▽ More Crop diseases are a major threat to food security and their rapid identification is important to prevent yield loss. Swift identification of these diseases are difficult due to the lack of necessary infrastructure. Recent advances in computer vision and increasing penetration of smartphones have paved the way for smartphone-assisted disease identification. Most of the plant diseases leave particular artifacts on the foliar structure of the plant. This study was conducted in 2020 at Department of Computer Science and Engineering, University of Engineering and Technology, Lahore, Pakistan to check leaf-based plant disease identification. This study provided a deep neural network-based solution to foliar disease identification and incorporated image quality assessment to select the image of the required quality to perform identification and named it Agricultural Pathologist (Agro Path). The captured image by a novice photographer may contain noise, lack of structure, and blur which result in a failed or inaccurate diagnosis. Moreover, AgroPath model had 99.42% accuracy for foliar disease identification. The proposed addition can be especially useful for application of foliar disease identification in the field of agriculture. △ Less

Submitted 26 September, 2022; originally announced September 2022.

Journal ref: Journal of Agricultural Research 59.2 (2021): 177-186

arXiv:2209.09883 [pdf, other]

Leveraging Local Patch Differences in Multi-Object Scenes for Generative Adversarial Attacks

Authors: Abhishek Aich, Shasha Li, Chengyu Song, M. Salman Asif, Srikanth V. Krishnamurthy, Amit K. Roy-Chowdhury

Abstract: State-of-the-art generative model-based attacks against image classifiers overwhelmingly focus on single-object (i.e., single dominant object) images. Different from such settings, we tackle a more practical problem of generating adversarial perturbations using multi-object (i.e., multiple dominant objects) images as they are representative of most real-world scenes. Our goal is to design an attac… ▽ More State-of-the-art generative model-based attacks against image classifiers overwhelmingly focus on single-object (i.e., single dominant object) images. Different from such settings, we tackle a more practical problem of generating adversarial perturbations using multi-object (i.e., multiple dominant objects) images as they are representative of most real-world scenes. Our goal is to design an attack strategy that can learn from such natural scenes by leveraging the local patch differences that occur inherently in such images (e.g. difference between the local patch on the object `person' and the object `bike' in a traffic scene). Our key idea is to misclassify an adversarial multi-object image by confusing the victim classifier for each local patch in the image. Based on this, we propose a novel generative attack (called Local Patch Difference or LPD-Attack) where a novel contrastive loss function uses the aforesaid local differences in feature space of multi-object scenes to optimize the perturbation generator. Through various experiments across diverse victim convolutional neural networks, we show that our approach outperforms baseline generative attacks with highly transferable perturbations when evaluated under different white-box and black-box settings. △ Less

Submitted 3 October, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

Comments: Accepted at WACV 2023 (Round 1), camera-ready version

arXiv:2209.09502 [pdf, other]

GAMA: Generative Adversarial Multi-Object Scene Attacks

Authors: Abhishek Aich, Calvin-Khang Ta, Akash Gupta, Chengyu Song, Srikanth V. Krishnamurthy, M. Salman Asif, Amit K. Roy-Chowdhury

Abstract: The majority of methods for crafting adversarial attacks have focused on scenes with a single dominant object (e.g., images from ImageNet). On the other hand, natural scenes include multiple dominant objects that are semantically related. Thus, it is crucial to explore designing attack strategies that look beyond learning on single-object scenes or attack single-object victim classifiers. Due to t… ▽ More The majority of methods for crafting adversarial attacks have focused on scenes with a single dominant object (e.g., images from ImageNet). On the other hand, natural scenes include multiple dominant objects that are semantically related. Thus, it is crucial to explore designing attack strategies that look beyond learning on single-object scenes or attack single-object victim classifiers. Due to their inherent property of strong transferability of perturbations to unknown models, this paper presents the first approach of using generative models for adversarial attacks on multi-object scenes. In order to represent the relationships between different objects in the input scene, we leverage upon the open-sourced pre-trained vision-language model CLIP (Contrastive Language-Image Pre-training), with the motivation to exploit the encoded semantics in the language space along with the visual space. We call this attack approach Generative Adversarial Multi-object scene Attacks (GAMA). GAMA demonstrates the utility of the CLIP model as an attacker's tool to train formidable perturbation generators for multi-object scenes. Using the joint image-text features to train the generator, we show that GAMA can craft potent transferable perturbations in order to fool victim classifiers in various attack settings. For example, GAMA triggers ~16% more misclassification than state-of-the-art generative approaches in black-box settings where both the classifier architecture and data distribution of the attacker are different from the victim. Our code is available here: https://abhishekaich27.github.io/gama.html △ Less

Submitted 15 October, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

Comments: Accepted at NeurIPS 2022; First two authors contributed equally; Includes Supplementary Material

arXiv:2209.04829 [pdf, other]

Energy-Efficient Beamforming and Resource Optimization for AmBSC-Assisted Cooperative NOMA IoT Networks

Authors: Muhammad Asif, Asim Ihsan, Wali Ullah Khan, Ali Ranjha, Shengli Zhang, Sissi Xiaoxiao Wu

Abstract: In this manuscript, we present an energy-efficient alternating optimization framework based on the multi-antenna ambient backscatter communication (AmBSC) assisted cooperative non-orthogonal multiple access (NOMA) for next-generation (NG) internet-of-things (IoT) enabled communication networks. Specifically, the energy-efficiency maximization is achieved for the considered AmBSC-enabled multi-clus… ▽ More In this manuscript, we present an energy-efficient alternating optimization framework based on the multi-antenna ambient backscatter communication (AmBSC) assisted cooperative non-orthogonal multiple access (NOMA) for next-generation (NG) internet-of-things (IoT) enabled communication networks. Specifically, the energy-efficiency maximization is achieved for the considered AmBSC-enabled multi-cluster cooperative IoT NOMA system by optimizing the active-beamforming vector and power-allocation coefficients (PAC) of IoT NOMA users at the transmitter, as well as passive-beamforming vector at the multi-antenna assisted backscatter node. Usually, increasing the number of IoT NOMA users in each cluster results in inter-cluster interference (ICI) (among different clusters) and intra-cluster interference (among IoT NOMA users). To combat the impact of ICI, we exploit a zero-forcing (ZF) based active-beamforming, as well as an efficient clustering technique at the source node. Further, the effect of intra-cluster interference is mitigated by exploiting an efficient power-allocation policy that determines the PAC of IoT NOMA users under the quality-of-service (QoS), cooperation, SIC decoding, and power-budget constraints. Moreover, the considered non-convex passive-beamforming problem is transformed into a standard semi-definite programming (SDP) problem by exploiting the successive-convex approximation (SCA) approximation, as well as the difference of convex (DC) programming, where Rank-1 solution of passive-beamforming is obtained based on the penalty-based method. Furthermore, the numerical analysis of simulation results demonstrates that the proposed energy-efficiency maximization algorithm exhibits an efficient performance by achieving convergence within only a few iterations. △ Less

Submitted 11 September, 2022; originally announced September 2022.

arXiv:2208.03705 [pdf, other]

Rate Splitting Multiple Access for Next Generation Cognitive Radio Enabled LEO Satellite Networks

Authors: ali Ullah Khan, Zain Ali, Eva Lagunas, Asad Mahmood, Muhammad Asif, Asim Ihsan, Symeon Chatzinotas, Björn Ottersten, Octavia A. Dobre

Abstract: This paper proposes a cognitive radio enabled LEO SatCom using RSMA radio access technique with the coexistence of GEO SatCom network. In particular, this work aims to maximize the sum rate of LEO SatCom by simultaneously optimizing the power budget over different beams, RSMA power allocation for users over each beam, and subcarrier user assignment while restricting the interference temperature to… ▽ More This paper proposes a cognitive radio enabled LEO SatCom using RSMA radio access technique with the coexistence of GEO SatCom network. In particular, this work aims to maximize the sum rate of LEO SatCom by simultaneously optimizing the power budget over different beams, RSMA power allocation for users over each beam, and subcarrier user assignment while restricting the interference temperature to GEO SatCom. The problem of sum rate maximization is formulated as non-convex, where the global optimal solution is challenging to obtain. Thus, an efficient solution can be obtained in three steps: first we employ a successive convex approximation technique to reduce the complexity and make the problem more tractable. Second, for any given resource block user assignment, we adopt KKT conditions to calculate the transmit power over different beams and RSMA power allocation of users over each beam. Third, using the allocated power, we design an efficient algorithm based on the greedy approach for resource block user assignment. Numerical results demonstrate the benefits of the proposed optimization scheme compared to the benchmark schemes. △ Less

Submitted 6 February, 2023; v1 submitted 7 August, 2022; originally announced August 2022.

Comments: 32,9. arXiv admin note: substantial text overlap with arXiv:2208.02924

arXiv:2208.03610 [pdf, other]

Blackbox Attacks via Surrogate Ensemble Search

Authors: Zikui Cai, Chengyu Song, Srikanth Krishnamurthy, Amit Roy-Chowdhury, M. Salman Asif

Abstract: Blackbox adversarial attacks can be categorized into transfer- and query-based attacks. Transfer methods do not require any feedback from the victim model, but provide lower success rates compared to query-based methods. Query attacks often require a large number of queries for success. To achieve the best of both approaches, recent efforts have tried to combine them, but still require hundreds of… ▽ More Blackbox adversarial attacks can be categorized into transfer- and query-based attacks. Transfer methods do not require any feedback from the victim model, but provide lower success rates compared to query-based methods. Query attacks often require a large number of queries for success. To achieve the best of both approaches, recent efforts have tried to combine them, but still require hundreds of queries to achieve high success rates (especially for targeted attacks). In this paper, we propose a novel method for Blackbox Attacks via Surrogate Ensemble Search (BASES) that can generate highly successful blackbox attacks using an extremely small number of queries. We first define a perturbation machine that generates a perturbed image by minimizing a weighted loss function over a fixed set of surrogate models. To generate an attack for a given victim model, we search over the weights in the loss function using queries generated by the perturbation machine. Since the dimension of the search space is small (same as the number of surrogate models), the search requires a small number of queries. We demonstrate that our proposed method achieves better success rate with at least 30x fewer queries compared to state-of-the-art methods on different image classifiers trained with ImageNet. In particular, our method requires as few as 3 queries per image (on average) to achieve more than a 90% success rate for targeted attacks and 1-2 queries per image for over a 99% success rate for untargeted attacks. Our method is also effective on Google Cloud Vision API and achieved a 91% untargeted attack success rate with 2.9 queries per image. We also show that the perturbations generated by our proposed method are highly transferable and can be adopted for hard-label blackbox attacks. We also show effectiveness of BASES for hiding attacks on object detectors. △ Less

Submitted 23 November, 2022; v1 submitted 6 August, 2022; originally announced August 2022.

Comments: Our code is available at https://github.com/CSIPlab/BASES

Journal ref: NeurIPS 2022

arXiv:2208.02436 [pdf, other]

H2-Stereo: High-Speed, High-Resolution Stereoscopic Video System

Authors: Ming Cheng, Yiling Xu, Wang Shen, M. Salman Asif, Chao Ma, Jun Sun, Zhan Ma

Abstract: High-speed, high-resolution stereoscopic (H2-Stereo) video allows us to perceive dynamic 3D content at fine granularity. The acquisition of H2-Stereo video, however, remains challenging with commodity cameras. Existing spatial super-resolution or temporal frame interpolation methods provide compromised solutions that lack temporal or spatial details, respectively. To alleviate this problem, we pro… ▽ More High-speed, high-resolution stereoscopic (H2-Stereo) video allows us to perceive dynamic 3D content at fine granularity. The acquisition of H2-Stereo video, however, remains challenging with commodity cameras. Existing spatial super-resolution or temporal frame interpolation methods provide compromised solutions that lack temporal or spatial details, respectively. To alleviate this problem, we propose a dual camera system, in which one camera captures high-spatial-resolution low-frame-rate (HSR-LFR) videos with rich spatial details, and the other captures low-spatial-resolution high-frame-rate (LSR-HFR) videos with smooth temporal details. We then devise a Learned Information Fusion network (LIFnet) that exploits the cross-camera redundancies to enhance both camera views to high spatiotemporal resolution (HSTR) for reconstructing the H2-Stereo video effectively. We utilize a disparity network to transfer spatiotemporal information across views even in large disparity scenes, based on which, we propose disparity-guided flow-based war** for LSR-HFR view and complementary war** for HSR-LFR view. A multi-scale fusion method in feature domain is proposed to minimize occlusion-induced war** ghosts and holes in HSR-LFR view. The LIFnet is trained in an end-to-end manner using our collected high-quality Stereo Video dataset from YouTube. Extensive experiments demonstrate that our model outperforms existing state-of-the-art methods for both views on synthetic data and camera-captured real data with large disparity. Ablation studies explore various aspects, including spatiotemporal resolution, camera baseline, camera desynchronization, long/short exposures and applications, of our system to fully understand its capability for potential applications. △ Less

Submitted 4 August, 2022; originally announced August 2022.

arXiv:2208.01123 [pdf, other]

Energy-Efficient Backscatter-Assisted Coded Cooperative-NOMA for B5G Wireless Communications

Authors: Muhammad Asif, Asim Ihsan, Wali Ullah Khan, Ali Ranjha, Shengli Zhang, Sissi Xiaoxiao Wu

Abstract: In this manuscript, we propose an alternating optimization framework to maximize the energy efficiency of a backscatter-enabled cooperative Non-orthogonal multiple access (NOMA) system by optimizing the transmit power of the source, power allocation coefficients (PAC), and power of the relay node under imperfect successive interference cancellation (SIC) decoding. A three-stage low-complexity ener… ▽ More In this manuscript, we propose an alternating optimization framework to maximize the energy efficiency of a backscatter-enabled cooperative Non-orthogonal multiple access (NOMA) system by optimizing the transmit power of the source, power allocation coefficients (PAC), and power of the relay node under imperfect successive interference cancellation (SIC) decoding. A three-stage low-complexity energy-efficient alternating optimization algorithm is introduced which optimizes the transmit power, PAC, and relay power by considering the quality of service (QoS), power budget, and cooperation constraints. Subsequently, a joint channel coding framework is introduced to enhance the performance of far user which has no direct communication link with the base station (BS) and has bad channel conditions. In the destination node, the far user data is jointly decoded using a Sum-product algorithm (SPA) based joint iterative decoder realized by jointly-designed Quasi-cyclic Low-density parity-check (QC-LDPC) codes. Simulation results evince that the proposed backscatter-enabled cooperative NOMA system outperforms its counterpart by providing an efficient performance in terms of energy efficiency. Also, proposed jointly-designed QC-LDPC codes provide an excellent bit-error-rate (BER) performance by jointly decoding the far user data for considered BSC cooperative NOMA system with only a few decoding iterations. △ Less

Submitted 20 August, 2022; v1 submitted 1 August, 2022; originally announced August 2022.

Comments: 30, 7

arXiv:2207.09074 [pdf, other]

Incremental Task Learning with Incremental Rank Updates

Authors: Rakib Hyder, Ken Shao, Boyu Hou, Panos Markopoulos, Ashley Prater-Bennette, M. Salman Asif

Abstract: Incremental Task learning (ITL) is a category of continual learning that seeks to train a single network for multiple tasks (one after another), where training data for each task is only available during the training of that task. Neural networks tend to forget older tasks when they are trained for the newer tasks; this property is often known as catastrophic forgetting. To address this issue, ITL… ▽ More Incremental Task learning (ITL) is a category of continual learning that seeks to train a single network for multiple tasks (one after another), where training data for each task is only available during the training of that task. Neural networks tend to forget older tasks when they are trained for the newer tasks; this property is often known as catastrophic forgetting. To address this issue, ITL methods use episodic memory, parameter regularization, masking and pruning, or extensible network structures. In this paper, we propose a new incremental task learning framework based on low-rank factorization. In particular, we represent the network weights for each layer as a linear combination of several rank-1 matrices. To update the network for a new task, we learn a rank-1 (or low-rank) matrix and add that to the weights of every layer. We also introduce an additional selector vector that assigns different weights to the low-rank matrices learned for the previous tasks. We show that our approach performs better than the current state-of-the-art methods in terms of accuracy and forgetting. Our method also offers better memory efficiency compared to episodic memory- and mask-based approaches. Our code will be available at https://github.com/CSIPlab/task-increment-rank-update.git △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: Code will be available at https://github.com/CSIPlab/task-increment-rank-update.git

Journal ref: ECCV 2022

arXiv:2207.03295 [pdf, other]

Cooperative Backscatter NOMA with Imperfect SIC: Towards Energy Efficient Sum Rate Maximization in Sustainable 6G Networks

Authors: Manzoor Ahmed, Zain Ali, Wali Ullah Khan, Omer Waqar, Muhammad Asif, Abd Ullah Khan, Muhammad Awais Javed, Fahd N. Al-Wesabi

Abstract: The combination of backscatter communication with non-orthogonal multiple access (NOMA) has the potential to support low-powered massive connections in upcoming sixth-generation (6G) wireless networks. More specifically, backscatter communication can harvest and use the existing RF signals in the atmosphere for communication, while NOMA provides communication to multiple wireless devices over the… ▽ More The combination of backscatter communication with non-orthogonal multiple access (NOMA) has the potential to support low-powered massive connections in upcoming sixth-generation (6G) wireless networks. More specifically, backscatter communication can harvest and use the existing RF signals in the atmosphere for communication, while NOMA provides communication to multiple wireless devices over the same frequency and time resources. This paper has proposed a new resource management framework for backscatter-aided cooperative NOMA communication in upcoming 6G networks. In particular, the proposed work has simultaneously optimized the base station's transmit power, relaying node, the reflection coefficient of the backscatter tag, and time allocation under imperfect successive interference cancellation to maximize the sum rate of the system. To obtain an efficient solution for the resource management framework, we have proposed a combination of the bisection method and dual theory, where the sub-gradient method is adopted to optimize the Lagrangian multipliers. Numerical results have shown that the proposed solution provides excellent performance. When the performance of the proposed technique is compared to a brute-forcing search technique that guarantees optimal solution however, is very time-consuming, it was seen that the gap in performance is actually 0\%. Hence, the proposed framework has provided performance equal to a cumbersome brute-force search technique while offering much less complexity. The works in the literature on cooperative NOMA considered equal time distribution for cooperation and direct communication. Our results showed that optimizing the time-division can increase the performance by more than 110\% for high transmission powers. △ Less

Submitted 7 July, 2022; originally announced July 2022.

Comments: 9, 7

arXiv:2204.05172 [pdf, other]

Event Transformer

Authors: Bin Jiang, Zhihao Li, M. Salman Asif, Xun Cao, Zhan Ma

Abstract: The event camera's low power consumption and ability to capture microsecond brightness changes make it attractive for various computer vision tasks. Existing event representation methods typically convert events into frames, voxel grids, or spikes for deep neural networks (DNNs). However, these approaches often sacrifice temporal granularity or require specialized devices for processing. This work… ▽ More The event camera's low power consumption and ability to capture microsecond brightness changes make it attractive for various computer vision tasks. Existing event representation methods typically convert events into frames, voxel grids, or spikes for deep neural networks (DNNs). However, these approaches often sacrifice temporal granularity or require specialized devices for processing. This work introduces a novel token-based event representation, where each event is considered a fundamental processing unit termed an event-token. This approach preserves the sequence's intricate spatiotemporal attributes at the event level. Moreover, we propose a Three-way Attention mechanism in the Event Transformer Block (ETB) to collaboratively construct temporal and spatial correlations between events. We compare our proposed token-based event representation extensively with other prevalent methods for object classification and optical flow estimation. The experimental results showcase its competitive performance while demanding minimal computational resources on standard devices. Our code is publicly accessible at \url{https://github.com/NJUVISION/EventTransformer}. △ Less

Submitted 12 June, 2024; v1 submitted 11 April, 2022; originally announced April 2022.

Comments: Accepted by ICASSP2024

arXiv:2203.15230 [pdf, other]

Zero-Query Transfer Attacks on Context-Aware Object Detectors

Authors: Zikui Cai, Shantanu Rane, Alejandro E. Brito, Chengyu Song, Srikanth V. Krishnamurthy, Amit K. Roy-Chowdhury, M. Salman Asif

Abstract: Adversarial attacks perturb images such that a deep neural network produces incorrect classification results. A promising approach to defend against adversarial attacks on natural multi-object scenes is to impose a context-consistency check, wherein, if the detected objects are not consistent with an appropriately defined context, then an attack is suspected. Stronger attacks are needed to fool su… ▽ More Adversarial attacks perturb images such that a deep neural network produces incorrect classification results. A promising approach to defend against adversarial attacks on natural multi-object scenes is to impose a context-consistency check, wherein, if the detected objects are not consistent with an appropriately defined context, then an attack is suspected. Stronger attacks are needed to fool such context-aware detectors. We present the first approach for generating context-consistent adversarial attacks that can evade the context-consistency check of black-box object detectors operating on complex, natural scenes. Unlike many black-box attacks that perform repeated attempts and open themselves to detection, we assume a "zero-query" setting, where the attacker has no knowledge of the classification decisions of the victim system. First, we derive multiple attack plans that assign incorrect labels to victim objects in a context-consistent manner. Then we design and use a novel data structure that we call the perturbation success probability matrix, which enables us to filter the attack plans and choose the one most likely to succeed. This final attack plan is implemented using a perturbation-bounded adversarial attack algorithm. We compare our zero-query attack against a few-query scheme that repeatedly checks if the victim system is fooled. We also compare against state-of-the-art context-agnostic attacks. Against a context-aware defense, the fooling rate of our zero-query approach is significantly higher than context-agnostic approaches and higher than that achievable with up to three rounds of the few-query scheme. △ Less

Submitted 29 March, 2022; originally announced March 2022.

Comments: CVPR 2022 Accepted

arXiv:2203.02026 [pdf, other]

Provable and Efficient Continual Representation Learning

Authors: Yingcong Li, Mingchen Li, M. Salman Asif, Samet Oymak

Abstract: In continual learning (CL), the goal is to design models that can learn a sequence of tasks without catastrophic forgetting. While there is a rich set of techniques for CL, relatively little understanding exists on how representations built by previous tasks benefit new tasks that are added to the network. To address this, we study the problem of continual representation learning (CRL) where we le… ▽ More In continual learning (CL), the goal is to design models that can learn a sequence of tasks without catastrophic forgetting. While there is a rich set of techniques for CL, relatively little understanding exists on how representations built by previous tasks benefit new tasks that are added to the network. To address this, we study the problem of continual representation learning (CRL) where we learn an evolving representation as new tasks arrive. Focusing on zero-forgetting methods where tasks are embedded in subnetworks (e.g., PackNet), we first provide experiments demonstrating CRL can significantly boost sample efficiency when learning new tasks. To explain this, we establish theoretical guarantees for CRL by providing sample complexity and generalization error bounds for new tasks by formalizing the statistical benefits of previously-learned representations. Our analysis and experiments also highlight the importance of the order in which we learn the tasks. Specifically, we show that CL benefits if the initial tasks have large sample size and high "representation diversity". Diversity ensures that adding new tasks incurs small representation mismatch and can be learned with few samples while training only few additional nonzero weights. Finally, we ask whether one can ensure each task subnetwork to be efficient during inference time while retaining the benefits of representation learning. To this end, we propose an inference-efficient variation of PackNet called Efficient Sparse PackNet (ESPN) which employs joint channel & weight pruning. ESPN embeds tasks in channel-sparse subnets requiring up to 80% less FLOPs to compute while approximately retaining accuracy and is very competitive with a variety of baselines. In summary, this work takes a step towards data and compute-efficient CL with a representation learning perspective. GitHub page: https://github.com/ucr-optml/CtRL △ Less

Submitted 7 November, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

arXiv:2201.09493 [pdf, other]

STRIDE-based Cyber Security Threat Modeling for IoT-enabled Precision Agriculture Systems

Authors: Md. Rashid Al Asif, Khondokar Fida Hasan, Md Zahidul Islam, Rahamatullah Khondoker

Abstract: The concept of traditional farming is changing rapidly with the introduction of smart technologies like the Internet of Things (IoT). Under the concept of smart agriculture, precision agriculture is gaining popularity to enable Decision Support System (DSS)-based farming management that utilizes widespread IoT sensors and wireless connectivity to enable automated detection and optimization of reso… ▽ More The concept of traditional farming is changing rapidly with the introduction of smart technologies like the Internet of Things (IoT). Under the concept of smart agriculture, precision agriculture is gaining popularity to enable Decision Support System (DSS)-based farming management that utilizes widespread IoT sensors and wireless connectivity to enable automated detection and optimization of resources. Undoubtedly the success of the system would be impacted on crop productivity, where failure would impact severely. Like many other cyber-physical systems, one of the growing challenges to avoid system adversity is to ensure the system's security, privacy, and trust. But what are the vulnerabilities, threats, and security issues we should consider while deploying precision agriculture? This paper has conducted a holistic threat modeling on component levels of precision agriculture's standard infrastructure using popular threat intelligence tools STRIDE to identify common security issues. Our modeling identifies a noticing of fifty-eight potential security threats to consider. This presentation systematically presented them and advised general mitigation suggestions to support cyber security in precision agriculture. △ Less

Submitted 30 January, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

arXiv:2112.13893 [pdf]

Non-Reference Quality Monitoring of Digital Images using Gradient Statistics and Feedforward Neural Networks

Authors: Nisar Ahmed, Hafiz Muhammad Shahzad Asif, Hassan Khalid

Abstract: Digital images contain a lot of redundancies, therefore, compressions are applied to reduce the image size without the loss of reasonable image quality. The same become more prominent in the case of videos that contains image sequences and higher compression ratios are achieved in low throughput networks. Assessment of the quality of images in such scenarios becomes of particular interest. Subject… ▽ More Digital images contain a lot of redundancies, therefore, compressions are applied to reduce the image size without the loss of reasonable image quality. The same become more prominent in the case of videos that contains image sequences and higher compression ratios are achieved in low throughput networks. Assessment of the quality of images in such scenarios becomes of particular interest. Subjective evaluation in most of the scenarios becomes infeasible so objective evaluation is preferred. Among the three objective quality measures, full-reference and reduced-reference methods require an original image in some form to calculate the quality score which is not feasible in scenarios such as broadcasting or IP video. Therefore, a non-reference quality metric is proposed to assess the quality of digital images which calculates luminance and multiscale gradient statistics along with mean subtracted contrast normalized products as features to train a Feedforward Neural Network with Scaled Conjugate Gradient. The trained network has provided good regression and R2 measures and further testing on LIVE Image Quality Assessment database release-2 has shown promising results. Pearson, Kendall, and Spearman's correlation are calculated between predicted and actual quality scores and their results are comparable to the state-of-the-art systems. Moreover, the proposed metric is computationally faster than its counterparts and can be used for the quality assessment of image sequences. △ Less

Submitted 27 December, 2021; originally announced December 2021.

Comments: Fifth International Conference on Aerospace Science & Engineering (ICASE 2017) (ICASE Proceedings, Page No. 300-305)

MSC Class: 94A08 ACM Class: I.4.5; I.5.4

arXiv:2112.03223 [pdf, other]

Context-Aware Transfer Attacks for Object Detection

Authors: Zikui Cai, Xinxin Xie, Shasha Li, Mingjun Yin, Chengyu Song, Srikanth V. Krishnamurthy, Amit K. Roy-Chowdhury, M. Salman Asif

Abstract: Blackbox transfer attacks for image classifiers have been extensively studied in recent years. In contrast, little progress has been made on transfer attacks for object detectors. Object detectors take a holistic view of the image and the detection of one object (or lack thereof) often depends on other objects in the scene. This makes such detectors inherently context-aware and adversarial attacks… ▽ More Blackbox transfer attacks for image classifiers have been extensively studied in recent years. In contrast, little progress has been made on transfer attacks for object detectors. Object detectors take a holistic view of the image and the detection of one object (or lack thereof) often depends on other objects in the scene. This makes such detectors inherently context-aware and adversarial attacks in this space are more challenging than those targeting image classifiers. In this paper, we present a new approach to generate context-aware attacks for object detectors. We show that by using co-occurrence of objects and their relative locations and sizes as context information, we can successfully generate targeted mis-categorization attacks that achieve higher transfer success rates on blackbox object detectors than the state-of-the-art. We test our approach on a variety of object detectors with images from PASCAL VOC and MS COCO datasets and demonstrate up to $20$ percentage points improvement in performance compared to the other state-of-the-art methods. △ Less

Submitted 6 December, 2021; originally announced December 2021.

Comments: accepted to AAAI 2022

arXiv:2111.12862 [pdf, other]

doi 10.1109/TCI.2023.3234898

Coded Illumination for Improved Lensless Imaging

Authors: Yucheng Zheng, M. Salman Asif

Abstract: Mask-based lensless cameras can be flat, thin, and light-weight, which makes them suitable for novel designs of computational imaging systems with large surface areas and arbitrary shapes. Despite recent progress in lensless cameras, the quality of images recovered from the lensless cameras is often poor due to the ill-conditioning of the underlying measurement system. In this paper, we propose to… ▽ More Mask-based lensless cameras can be flat, thin, and light-weight, which makes them suitable for novel designs of computational imaging systems with large surface areas and arbitrary shapes. Despite recent progress in lensless cameras, the quality of images recovered from the lensless cameras is often poor due to the ill-conditioning of the underlying measurement system. In this paper, we propose to use coded illumination to improve the quality of images reconstructed with lensless cameras. In our imaging model, the scene/object is illuminated by multiple coded illumination patterns as the lensless camera records sensor measurements. We designed and tested a number of illumination patterns and observed that shifting dots (and related orthogonal) patterns provide the best overall performance. We propose a fast and low-complexity recovery algorithm that exploits the separability and block-diagonal structure in our system. We present simulation results and hardware experiment results to demonstrate that our proposed method can significantly improve the reconstruction quality. △ Less

Submitted 9 January, 2023; v1 submitted 24 November, 2021; originally announced November 2021.

Comments: Supplementary material, codes, and data are available at https://github.com/CSIPlab/codedcam

Journal ref: IEEE Transactions on Computational Imaging, 2023

arXiv:2110.12321 [pdf, other]

ADC: Adversarial attacks against object Detection that evade Context consistency checks

Authors: Mingjun Yin, Shasha Li, Chengyu Song, M. Salman Asif, Amit K. Roy-Chowdhury, Srikanth V. Krishnamurthy

Abstract: Deep Neural Networks (DNNs) have been shown to be vulnerable to adversarial examples, which are slightly perturbed input images which lead DNNs to make wrong predictions. To protect from such examples, various defense strategies have been proposed. A very recent defense strategy for detecting adversarial examples, that has been shown to be robust to current attacks, is to check for intrinsic conte… ▽ More Deep Neural Networks (DNNs) have been shown to be vulnerable to adversarial examples, which are slightly perturbed input images which lead DNNs to make wrong predictions. To protect from such examples, various defense strategies have been proposed. A very recent defense strategy for detecting adversarial examples, that has been shown to be robust to current attacks, is to check for intrinsic context consistencies in the input data, where context refers to various relationships (e.g., object-to-object co-occurrence relationships) in images. In this paper, we show that even context consistency checks can be brittle to properly crafted adversarial examples and to the best of our knowledge, we are the first to do so. Specifically, we propose an adaptive framework to generate examples that subvert such defenses, namely, Adversarial attacks against object Detection that evade Context consistency checks (ADC). In ADC, we formulate a joint optimization problem which has two attack goals, viz., (i) fooling the object detector and (ii) evading the context consistency check system, at the same time. Experiments on both PASCAL VOC and MS COCO datasets show that examples generated with ADC fool the object detector with a success rate of over 85% in most cases, and at the same time evade the recently proposed context consistency checks, with a bypassing rate of over 80% in most cases. Our results suggest that how to robustly model context and check its consistency, is still an open problem. △ Less

Submitted 23 October, 2021; originally announced October 2021.

Comments: WCAV'22 Acceptted

arXiv:2110.01823 [pdf, other]

Adversarial Attacks on Black Box Video Classifiers: Leveraging the Power of Geometric Transformations

Authors: Shasha Li, Abhishek Aich, Shitong Zhu, M. Salman Asif, Chengyu Song, Amit K. Roy-Chowdhury, Srikanth V. Krishnamurthy

Abstract: When compared to the image classification models, black-box adversarial attacks against video classification models have been largely understudied. This could be possible because, with video, the temporal dimension poses significant additional challenges in gradient estimation. Query-efficient black-box attacks rely on effectively estimated gradients towards maximizing the probability of misclassi… ▽ More When compared to the image classification models, black-box adversarial attacks against video classification models have been largely understudied. This could be possible because, with video, the temporal dimension poses significant additional challenges in gradient estimation. Query-efficient black-box attacks rely on effectively estimated gradients towards maximizing the probability of misclassifying the target video. In this work, we demonstrate that such effective gradients can be searched for by parameterizing the temporal structure of the search space with geometric transformations. Specifically, we design a novel iterative algorithm Geometric TRAnsformed Perturbations (GEO-TRAP), for attacking video classification models. GEO-TRAP employs standard geometric transformation operations to reduce the search space for effective gradients into searching for a small group of parameters that define these operations. This group of parameters describes the geometric progression of gradients, resulting in a reduced and structured search space. Our algorithm inherently leads to successful perturbations with surprisingly few queries. For example, adversarial examples generated from GEO-TRAP have better attack success rates with ~73.55% fewer queries compared to the state-of-the-art method for video adversarial attacks on the widely used Jester dataset. Overall, our algorithm exposes vulnerabilities of diverse video classification models and achieves new state-of-the-art results under black-box settings on two large datasets. Code is available here: https://github.com/sli057/Geo-TRAP △ Less

Submitted 26 October, 2021; v1 submitted 5 October, 2021; originally announced October 2021.

Comments: Accepted at NeurIPS 2021; First two authors contributed equally; Includes Supplementary Material

arXiv:2108.08421 [pdf, other]

Exploiting Multi-Object Relationships for Detecting Adversarial Attacks in Complex Scenes

Authors: Mingjun Yin, Shasha Li, Zikui Cai, Chengyu Song, M. Salman Asif, Amit K. Roy-Chowdhury, Srikanth V. Krishnamurthy

Abstract: Vision systems that deploy Deep Neural Networks (DNNs) are known to be vulnerable to adversarial examples. Recent research has shown that checking the intrinsic consistencies in the input data is a promising way to detect adversarial attacks (e.g., by checking the object co-occurrence relationships in complex scenes). However, existing approaches are tied to specific models and do not offer genera… ▽ More Vision systems that deploy Deep Neural Networks (DNNs) are known to be vulnerable to adversarial examples. Recent research has shown that checking the intrinsic consistencies in the input data is a promising way to detect adversarial attacks (e.g., by checking the object co-occurrence relationships in complex scenes). However, existing approaches are tied to specific models and do not offer generalizability. Motivated by the observation that language descriptions of natural scene images have already captured the object co-occurrence relationships that can be learned by a language model, we develop a novel approach to perform context consistency checks using such language models. The distinguishing aspect of our approach is that it is independent of the deployed object detector and yet offers very high accuracy in terms of detecting adversarial examples in practical scenes with multiple objects. △ Less

Submitted 18 August, 2021; originally announced August 2021.

Comments: ICCV'21 Accepted

arXiv:2108.07966 [pdf, other]

A Simple Framework for 3D Lensless Imaging with Programmable Masks

Authors: Yucheng Zheng, Yi Hua, Aswin C. Sankaranarayanan, M. Salman Asif

Abstract: Lensless cameras provide a framework to build thin imaging systems by replacing the lens in a conventional camera with an amplitude or phase mask near the sensor. Existing methods for lensless imaging can recover the depth and intensity of the scene, but they require solving computationally-expensive inverse problems. Furthermore, existing methods struggle to recover dense scenes with large depth… ▽ More Lensless cameras provide a framework to build thin imaging systems by replacing the lens in a conventional camera with an amplitude or phase mask near the sensor. Existing methods for lensless imaging can recover the depth and intensity of the scene, but they require solving computationally-expensive inverse problems. Furthermore, existing methods struggle to recover dense scenes with large depth variations. In this paper, we propose a lensless imaging system that captures a small number of measurements using different patterns on a programmable mask. In this context, we make three contributions. First, we present a fast recovery algorithm to recover textures on a fixed number of depth planes in the scene. Second, we consider the mask design problem, for programmable lensless cameras, and provide a design template for optimizing the mask patterns with the goal of improving depth estimation. Third, we use a refinement network as a post-processing step to identify and remove artifacts in the reconstruction. These modifications are evaluated extensively with experimental results on a lensless camera prototype to showcase the performance benefits of the optimized masks and recovery algorithms over the state of the art. △ Less

Submitted 18 August, 2021; originally announced August 2021.

Comments: Supplementary material available at https://github.com/CSIPlab/Programmable3Dcam.git

Journal ref: International Conference on Computer Vision (ICCV) 2021

arXiv:2108.02605 [pdf, other]

EENLP: Cross-lingual Eastern European NLP Index

Authors: Alexey Tikhonov, Alex Malkhasov, Andrey Manoshin, George Dima, Réka Cserháti, Md. Sadek Hossain Asif, Matt Sárdi

Abstract: Motivated by the sparsity of NLP resources for Eastern European languages, we present a broad index of existing Eastern European language resources (90+ datasets and 45+ models) published as a github repository open for updates from the community. Furthermore, to support the evaluation of commonsense reasoning tasks, we provide hand-crafted cross-lingual datasets for five different semantic tasks… ▽ More Motivated by the sparsity of NLP resources for Eastern European languages, we present a broad index of existing Eastern European language resources (90+ datasets and 45+ models) published as a github repository open for updates from the community. Furthermore, to support the evaluation of commonsense reasoning tasks, we provide hand-crafted cross-lingual datasets for five different semantic tasks (namely news categorization, paraphrase detection, Natural Language Inference (NLI) task, tweet sentiment detection, and news sentiment detection) for some of the Eastern European languages. We perform several experiments with the existing multilingual models on these datasets to define the performance baselines and compare them to the existing results for other languages. △ Less

Submitted 10 May, 2022; v1 submitted 5 August, 2021; originally announced August 2021.

Comments: Accepted for LREC 2022. 5 pages, 1 figure. Originally EEML 2021 project

MSC Class: 68T50

arXiv:2106.03668 [pdf, other]

Recovery Analysis for Plug-and-Play Priors using the Restricted Eigenvalue Condition

Authors: Jiaming Liu, M. Salman Asif, Brendt Wohlberg, Ulugbek S. Kamilov

Abstract: The plug-and-play priors (PnP) and regularization by denoising (RED) methods have become widely used for solving inverse problems by leveraging pre-trained deep denoisers as image priors. While the empirical imaging performance and the theoretical convergence properties of these algorithms have been widely investigated, their recovery properties have not previously been theoretically analyzed. We… ▽ More The plug-and-play priors (PnP) and regularization by denoising (RED) methods have become widely used for solving inverse problems by leveraging pre-trained deep denoisers as image priors. While the empirical imaging performance and the theoretical convergence properties of these algorithms have been widely investigated, their recovery properties have not previously been theoretically analyzed. We address this gap by showing how to establish theoretical recovery guarantees for PnP/RED by assuming that the solution of these methods lies near the fixed-points of a deep neural network. We also present numerical results comparing the recovery performance of PnP/RED in compressive sensing against that of recent compressive sensing algorithms based on generative models. Our numerical results suggest that PnP with a pre-trained artifact removal network provides significantly better results compared to the existing state-of-the-art methods. △ Less

Submitted 26 October, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

Comments: 27 pages, 13 figures

arXiv:2105.06371 [pdf, other]

Provably Convergent Algorithms for Solving Inverse Problems Using Generative Models

Authors: Viraj Shah, Rakib Hyder, M. Salman Asif, Chinmay Hegde

Abstract: The traditional approach of hand-crafting priors (such as sparsity) for solving inverse problems is slowly being replaced by the use of richer learned priors (such as those modeled by deep generative networks). In this work, we study the algorithmic aspects of such a learning-based approach from a theoretical perspective. For certain generative network architectures, we establish a simple non-conv… ▽ More The traditional approach of hand-crafting priors (such as sparsity) for solving inverse problems is slowly being replaced by the use of richer learned priors (such as those modeled by deep generative networks). In this work, we study the algorithmic aspects of such a learning-based approach from a theoretical perspective. For certain generative network architectures, we establish a simple non-convex algorithmic approach that (a) theoretically enjoys linear convergence guarantees for certain linear and nonlinear inverse problems, and (b) empirically improves upon conventional techniques such as back-propagation. We support our claims with the experimental results for solving various inverse problems. We also propose an extension of our approach that can handle model mismatch (i.e., situations where the generative network prior is not exactly applicable). Together, our contributions serve as building blocks towards a principled use of generative models in inverse problems with more complete algorithmic understanding. △ Less

Submitted 13 May, 2021; originally announced May 2021.

Comments: arXiv admin note: text overlap with arXiv:1810.03587, arXiv:1802.08406

arXiv:2102.05755 [pdf]

Development of Crop Yield Estimation Model using Soil and Environmental Parameters

Authors: Nisar Ahmed, Hafiz Muhammad Shahzad Asif, Gulshan Saleem, Muhammad Usman Younus

Abstract: Crop yield is affected by various soil and environmental parameters and can vary significantly. Therefore, a crop yield estimation model which can predict pre-harvest yield is required for food security. The study is conducted on tea forms operating under National Tea Research Institute, Pakistan. The data is recorded on monthly basis for ten years period. The parameters collected are minimum and… ▽ More Crop yield is affected by various soil and environmental parameters and can vary significantly. Therefore, a crop yield estimation model which can predict pre-harvest yield is required for food security. The study is conducted on tea forms operating under National Tea Research Institute, Pakistan. The data is recorded on monthly basis for ten years period. The parameters collected are minimum and maximum temperature, humidity, rainfall, PH level of the soil, usage of pesticide and labor expertise. The design of model incorporated all of these parameters and identified the parameters which are most crucial for yield predictions. Feature transformation is performed to obtain better performing model. The designed model is based on an ensemble of neural networks and provided an R-squared of 0.9461 and RMSE of 0.1204 indicating the usability of the proposed model in yield forecasting based on surface and environmental parameters. △ Less

Submitted 10 February, 2021; originally announced February 2021.

Comments: crop yield forecasting, regression, data mining, artificial neural network, ensemble learning

Journal ref: Journal of Agricultural Research, 2021

arXiv:2102.04515 [pdf]

Leaf Image-based Plant Disease Identification using Color and Texture Features

Authors: Nisar Ahmed, Hafiz Muhammad Shahzad Asif, Gulshan Saleem

Abstract: Identification of plant disease is usually done through visual inspection or during laboratory examination which causes delays resulting in yield loss by the time identification is complete. On the other hand, complex deep learning models perform the task with reasonable performance but due to their large size and high computational requirements, they are not suited to mobile and handheld devices.… ▽ More Identification of plant disease is usually done through visual inspection or during laboratory examination which causes delays resulting in yield loss by the time identification is complete. On the other hand, complex deep learning models perform the task with reasonable performance but due to their large size and high computational requirements, they are not suited to mobile and handheld devices. Our proposed approach contributes automated identification of plant diseases which follows a sequence of steps involving pre-processing, segmentation of diseased leaf area, calculation of features based on the Gray-Level Co-occurrence Matrix (GLCM), feature selection and classification. In this study, six color features and twenty-two texture features have been calculated. Support vector machines is used to perform one-vs-one classification of plant disease. The proposed model of disease identification provides an accuracy of 98.79% with a standard deviation of 0.57 on 10-fold cross-validation. The accuracy on a self-collected dataset is 82.47% for disease identification and 91.40% for healthy and diseased classification. The reported performance measures are better or comparable to the existing approaches and highest among the feature-based methods, presenting it as the most suitable method to automated leaf-based plant disease identification. This prototype system can be extended by adding more disease categories or targeting specific crop or disease categories. △ Less

Submitted 8 February, 2021; originally announced February 2021.

arXiv:2007.14621 [pdf, other]

Solving Phase Retrieval with a Learned Reference

Authors: Rakib Hyder, Zikui Cai, M. Salman Asif

Abstract: Fourier phase retrieval is a classical problem that deals with the recovery of an image from the amplitude measurements of its Fourier coefficients. Conventional methods solve this problem via iterative (alternating) minimization by leveraging some prior knowledge about the structure of the unknown image. The inherent ambiguities about shift and flip in the Fourier measurements make this problem e… ▽ More Fourier phase retrieval is a classical problem that deals with the recovery of an image from the amplitude measurements of its Fourier coefficients. Conventional methods solve this problem via iterative (alternating) minimization by leveraging some prior knowledge about the structure of the unknown image. The inherent ambiguities about shift and flip in the Fourier measurements make this problem especially difficult; and most of the existing methods use several random restarts with different permutations. In this paper, we assume that a known (learned) reference is added to the signal before capturing the Fourier amplitude measurements. Our method is inspired by the principle of adding a reference signal in holography. To recover the signal, we implement an iterative phase retrieval method as an unrolled network. Then we use back propagation to learn the reference that provides us the best reconstruction for a fixed number of phase retrieval iterations. We performed a number of simulations on a variety of datasets under different conditions and found that our proposed method for phase retrieval via unrolled network and learned reference provides near-perfect recovery at fixed (small) computational cost. We compared our method with standard Fourier phase retrieval methods and observed significant performance enhancement using the learned reference. △ Less

Submitted 29 July, 2020; originally announced July 2020.

Comments: Accepted to ECCV 2020. Code is available at https://github.com/CSIPlab/learnPR_reference

Showing 1–50 of 66 results for author: Asif, M