-
Transformation-Dependent Adversarial Attacks
Authors:
Yaoteng Tan,
Zikui Cai,
M. Salman Asif
Abstract:
We introduce transformation-dependent adversarial attacks, a new class of threats where a single additive perturbation can trigger diverse, controllable mis-predictions by systematically transforming the input (e.g., scaling, blurring, compression). Unlike traditional attacks with static effects, our perturbations embed metamorphic properties to enable different adversarial attacks as a function o…
▽ More
We introduce transformation-dependent adversarial attacks, a new class of threats where a single additive perturbation can trigger diverse, controllable mis-predictions by systematically transforming the input (e.g., scaling, blurring, compression). Unlike traditional attacks with static effects, our perturbations embed metamorphic properties to enable different adversarial attacks as a function of the transformation parameters. We demonstrate the transformation-dependent vulnerability across models (e.g., convolutional networks and vision transformers) and vision tasks (e.g., image classification and object detection). Our proposed geometric and photometric transformations enable a range of targeted errors from one crafted input (e.g., higher than 90% attack success rate for classifiers). We analyze effects of model architecture and type/variety of transformations on attack effectiveness. This work forces a paradigm shift by redefining adversarial inputs as dynamic, controllable threats. We highlight the need for robust defenses against such multifaceted, chameleon-like perturbations that current techniques are ill-prepared for.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Cross-Modal Safety Alignment: Is textual unlearning all you need?
Authors:
Trishna Chakraborty,
Erfan Shayegani,
Zikui Cai,
Nael Abu-Ghazaleh,
M. Salman Asif,
Yue Dong,
Amit K. Roy-Chowdhury,
Chengyu Song
Abstract:
Recent studies reveal that integrating new modalities into Large Language Models (LLMs), such as Vision-Language Models (VLMs), creates a new attack surface that bypasses existing safety training techniques like Supervised Fine-tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF). While further SFT and RLHF-based safety training can be conducted in multi-modal settings, collecting mu…
▽ More
Recent studies reveal that integrating new modalities into Large Language Models (LLMs), such as Vision-Language Models (VLMs), creates a new attack surface that bypasses existing safety training techniques like Supervised Fine-tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF). While further SFT and RLHF-based safety training can be conducted in multi-modal settings, collecting multi-modal training datasets poses a significant challenge. Inspired by the structural design of recent multi-modal models, where, regardless of the combination of input modalities, all inputs are ultimately fused into the language space, we aim to explore whether unlearning solely in the textual domain can be effective for cross-modality safety alignment. Our evaluation across six datasets empirically demonstrates the transferability -- textual unlearning in VLMs significantly reduces the Attack Success Rate (ASR) to less than 8\% and in some cases, even as low as nearly 2\% for both text-based and vision-text-based attacks, alongside preserving the utility. Moreover, our experiments show that unlearning with a multi-modal dataset offers no potential benefits but incurs significantly increased computational demands, possibly up to 6 times higher.
△ Less
Submitted 27 May, 2024;
originally announced June 2024.
-
Understanding Emotional Hijacking in Metaverse
Authors:
Syed Ali Asif,
Philip Gable,
Chien-Chung Shen,
Yan-Ming Chiou
Abstract:
Emotions are an integral part of being human, and experiencing a range of emotions is what makes life rich and vibrant. From basic emotions like anger, fear, happiness, and sadness to more complex ones like excitement and grief, emotions help us express ourselves and connect with the world around us. In recent years, researchers have begun adopting virtual reality (VR) technology to evoke emotions…
▽ More
Emotions are an integral part of being human, and experiencing a range of emotions is what makes life rich and vibrant. From basic emotions like anger, fear, happiness, and sadness to more complex ones like excitement and grief, emotions help us express ourselves and connect with the world around us. In recent years, researchers have begun adopting virtual reality (VR) technology to evoke emotions as realistically as possible and quantify the strength of emotions from the electroencephalogram (EEG) signals measured from the brain to understand human emotions in realistic situations better. This is achieved by creating a sense of presence in the virtual environment, the feeling that the user is there. For instance, [6] studied the excitement of a rollercoaster ride in VR, and [5] studied the fear of navigating in a VR cave.
△ Less
Submitted 23 April, 2024;
originally announced May 2024.
-
Protecting Human Users Against Cognitive Attacks in Immersive Environments
Authors:
Yan-Ming Chiou,
Bob Price,
Chien-Chung Shen,
Syed Ali Asif
Abstract:
Integrating mixed reality (MR) with artificial intelligence (AI) technologies, including vision, language, audio, reasoning, and planning, enables the AI-powered MR assistant [1] to substantially elevate human efficiency. This enhancement comes from situational awareness, quick access to essential information, and support in learning new skills in the right context throughout everyday tasks. This…
▽ More
Integrating mixed reality (MR) with artificial intelligence (AI) technologies, including vision, language, audio, reasoning, and planning, enables the AI-powered MR assistant [1] to substantially elevate human efficiency. This enhancement comes from situational awareness, quick access to essential information, and support in learning new skills in the right context throughout everyday tasks. This blend transforms interactions with both the virtual and physical environments, catering to a range of skill levels and personal preferences. For instance, computer vision enables the understanding of the user's environment, allowing for the provision of timely and relevant digital overlays in MR systems. At the same time, language models enhance comprehension of contextual information and support voice-activated dialogue to answer user questions. However, as AI-driven MR systems advance, they also unveil new vulnerabilities, posing a threat to user safety by potentially exposing them to grave dangers [5, 6].
△ Less
Submitted 23 April, 2024;
originally announced May 2024.
-
Safeguarding People's Financial Health in Metaverse with Emotionally Intelligent Virtual Buddy
Authors:
Syed Ali Asif,
Emma Cao,
Hang Chen,
Chien-Chung Shen,
Yan-Ming Chiou
Abstract:
The Metaverse, an immersive virtual world, has emerged as a shared space where people engage in various activities ranging from social interactions to commerce. Cryptocurrencies [3] and Non-Fungible Tokens (NFTs) [6] play pivotal roles within this virtual realm, resha** interactions and transactions. Cryptocurrencies, utilizing cryptographic techniques for security, enable decentralized and secu…
▽ More
The Metaverse, an immersive virtual world, has emerged as a shared space where people engage in various activities ranging from social interactions to commerce. Cryptocurrencies [3] and Non-Fungible Tokens (NFTs) [6] play pivotal roles within this virtual realm, resha** interactions and transactions. Cryptocurrencies, utilizing cryptographic techniques for security, enable decentralized and secure transactions, and NFTs represent ownership or proof of authenticity of unique digital assets through the blockchain technology. While NFTs and cryptocurrencies offer innovative opportunities for ownership, trading, and monetization within the metaverse, their use also introduces potential risks and negative consequences, such as financial scams and fraud, highlighting the need for users to exercise caution and diligence in their virtual transactions.
△ Less
Submitted 23 April, 2024;
originally announced May 2024.
-
PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos
Authors:
Qi Zhao,
M. Salman Asif,
Zhan Ma
Abstract:
The primary focus of Neural Representation for Videos (NeRV) is to effectively model its spatiotemporal consistency. However, current NeRV systems often face a significant issue of spatial inconsistency, leading to decreased perceptual quality. To address this issue, we introduce the Pyramidal Neural Representation for Videos (PNeRV), which is built on a multi-scale information connection and comp…
▽ More
The primary focus of Neural Representation for Videos (NeRV) is to effectively model its spatiotemporal consistency. However, current NeRV systems often face a significant issue of spatial inconsistency, leading to decreased perceptual quality. To address this issue, we introduce the Pyramidal Neural Representation for Videos (PNeRV), which is built on a multi-scale information connection and comprises a lightweight rescaling operator, Kronecker Fully-connected layer (KFc), and a Benign Selective Memory (BSM) mechanism. The KFc, inspired by the tensor decomposition of the vanilla Fully-connected layer, facilitates low-cost rescaling and global correlation modeling. BSM merges high-level features with granular ones adaptively. Furthermore, we provide an analysis based on the Universal Approximation Theory of the NeRV system and validate the effectiveness of the proposed PNeRV.We conducted comprehensive experiments to demonstrate that PNeRV surpasses the performance of contemporary NeRV models, achieving the best results in video regression on UVG and DAVIS under various metrics (PSNR, SSIM, LPIPS, and FVD). Compared to vanilla NeRV, PNeRV achieves a +4.49 dB gain in PSNR and a 231% increase in FVD on UVG, along with a +3.28 dB PSNR and 634% FVD increase on DAVIS.
△ Less
Submitted 13 April, 2024;
originally announced April 2024.
-
Diamond Micro-Chip for Quantum Microscopy
Authors:
Shahidul Asif,
Hang Chen,
Johannes Cremer,
Shantam Ravan,
Jeyson Tamara-Isaza,
Saurabh Lamsal,
Reza Ebadi,
Yan Li,
Ling-Jie Zhou,
Cui-Zu Chang,
John Q. Xiao,
Amir Yacoby,
Ronald L. Walsworth,
Mark J. H. Ku
Abstract:
The nitrogen vacancy (NV) center in diamond is an increasingly popular quantum sensor for microscopy of electrical current, magnetization, and spins. However, efficient NV-sample integration with a robust, high-quality interface remains an outstanding challenge to realize scalable, high-throughput microscopy. In this work, we characterize a diamond micro-chip (DMC) containing a (111)-oriented NV e…
▽ More
The nitrogen vacancy (NV) center in diamond is an increasingly popular quantum sensor for microscopy of electrical current, magnetization, and spins. However, efficient NV-sample integration with a robust, high-quality interface remains an outstanding challenge to realize scalable, high-throughput microscopy. In this work, we characterize a diamond micro-chip (DMC) containing a (111)-oriented NV ensemble; and demonstrate its utility for high-resolution quantum microscopy. We perform strain imaging of the DMC and find minimal detrimental strain variation across a field-of-view of tens of micrometer. We find good ensemble NV spin coherence and optical properties in the DMC, suitable for sensitive magnetometry. We then use the DMC to demonstrate wide-field microscopy of electrical current, and show that diffraction-limited quantum microscopy can be achieved. We also demonstrate the deterministic transfer of DMCs with multiple materials of interest for next-generation electronics and spintronics. Lastly, we develop a polymer-based technique for DMC placement. This work establishes the DMC's potential to expand the application of NV quantum microscopy in materials, device, geological, biomedical, and chemical sciences.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Overcoming Distribution Shifts in Plug-and-Play Methods with Test-Time Training
Authors:
Edward P. Chandler,
Shirin Shoushtari,
Jiaming Liu,
M. Salman Asif,
Ulugbek S. Kamilov
Abstract:
Plug-and-Play Priors (PnP) is a well-known class of methods for solving inverse problems in computational imaging. PnP methods combine physical forward models with learned prior models specified as image denoisers. A common issue with the learned models is that of a performance drop when there is a distribution shift between the training and testing data. Test-time training (TTT) was recently prop…
▽ More
Plug-and-Play Priors (PnP) is a well-known class of methods for solving inverse problems in computational imaging. PnP methods combine physical forward models with learned prior models specified as image denoisers. A common issue with the learned models is that of a performance drop when there is a distribution shift between the training and testing data. Test-time training (TTT) was recently proposed as a general strategy for improving the performance of learned models when training and testing data come from different distributions. In this paper, we propose PnP-TTT as a new method for overcoming distribution shifts in PnP. PnP-TTT uses deep equilibrium learning (DEQ) for optimizing a self-supervised loss at the fixed points of PnP iterations. PnP-TTT can be directly applied on a single test sample to improve the generalization of PnP. We show through simulations that given a sufficient number of measurements, PnP-TTT enables the use of image priors trained on natural images for image reconstruction in magnetic resonance imaging (MRI).
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
A Framework for Controlling Multiple Industrial Robots using Mobile Applications
Authors:
Daniela Alvarado,
Dr. Seemal Asif
Abstract:
Purpose: Over the last few decades, the development of the hardware and software has enabled the application of advanced systems. In the robotics field, the UI design is an intriguing area to be explored due to the creation of devices with a wide range of functionalities in a reduced size. Moreover, the idea of using the same UI to control several systems arouses a great interest considering that…
▽ More
Purpose: Over the last few decades, the development of the hardware and software has enabled the application of advanced systems. In the robotics field, the UI design is an intriguing area to be explored due to the creation of devices with a wide range of functionalities in a reduced size. Moreover, the idea of using the same UI to control several systems arouses a great interest considering that this involves less learning effort and time for the users. Therefore, this paper will present a mobile application to control two industrial robots with four modes of operation. Design/methodology/approach: The smartphone was selected to be the interface due to its wide range of capabilities and the MIT Inventor App was used to create the application, whose environment is supported by Android smartphones. For the validation, ROS was used since it is a fundamental framework utilised in industrial robotics and the Arduino Uno was used to establish the data transmission between the smartphone and the board NVIDIA Jetson TX2. In MIT Inventor App, the graphical interface was created to visualize the options available in the app whereas two scripts in python were programmed to perform the simulations in ROS and carry out the tests. Findings: The results indicated that the use of the sliders to control the robots is more favourable than the Orientation Sensor due to the sensibility of the sensor and human limitations to hold the smartphone perfectly still. Another important finding was the limitations of the autonomous mode, in which the robot grabs an object. In this case, the configuration of the Kinect camera and the controllers has a significant impact on the success of the simulation. Finally, it was observed that the delay was appropriate despite the use of the Arduino UNO to transfer the data between the Smartphone and the Nvidia Jetson TX2.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
A Study on Centralised and Decentralised Swarm Robotics Architecture for Part Delivery System
Authors:
Angelos Dimakos,
Daniel Woodhall,
Seemal Asif
Abstract:
Drones are also known as UAVs are originally designed for military purposes. With the technological advances, they can be seen in most of the aspects of life from filming to logistics. The increased use of drones made it sometimes essential to form a collaboration between them to perform the task efficiently in a defined process. This paper investigates the use of a combined centralised and decent…
▽ More
Drones are also known as UAVs are originally designed for military purposes. With the technological advances, they can be seen in most of the aspects of life from filming to logistics. The increased use of drones made it sometimes essential to form a collaboration between them to perform the task efficiently in a defined process. This paper investigates the use of a combined centralised and decentralised architecture for the collaborative operation of drones in a parts delivery scenario to enable and expedite the operation of the factories of the future. The centralised and decentralised approaches were extensively researched, with experimentation being undertaken to determine the appropriateness of each approach for this use-case. Decentralised control was utilised to remove the need for excessive communication during the operation of the drones, resulting in smoother operations. Initial results suggested that the decentralised approach is more appropriate for this use-case. The individual functionalities necessary for the implementation of a decentralised architecture were proven and assessed, determining that a combination of multiple individual functionalities, namely VSLAM, dynamic collision avoidance and object tracking, would give an appropriate solution for use in an industrial setting. A final architecture for the parts delivery system was proposed for future work, using a combined centralised and decentralised approach to combat the limitations inherent in each architecture.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
On the Design of Human-Robot Collaboration Gestures
Authors:
Anas Shrinah,
Masoud S. Bahraini,
Fahad Khan,
Seemal Asif,
Niels Lohse,
Kerstin Eder
Abstract:
Effective communication between humans and collaborative robots is essential for seamless Human-Robot Collaboration (HRC). In noisy industrial settings, nonverbal communication, such as gestures, plays a key role in conveying commands and information to robots efficiently. While existing literature has thoroughly examined gesture recognition and robots' responses to these gestures, there is a nota…
▽ More
Effective communication between humans and collaborative robots is essential for seamless Human-Robot Collaboration (HRC). In noisy industrial settings, nonverbal communication, such as gestures, plays a key role in conveying commands and information to robots efficiently. While existing literature has thoroughly examined gesture recognition and robots' responses to these gestures, there is a notable gap in exploring the design of these gestures. The criteria for creating efficient HRC gestures are scattered across numerous studies. This paper surveys the design principles of HRC gestures, as contained in the literature, aiming to consolidate a set of criteria for HRC gesture design. It also examines the methods used for designing and evaluating HRC gestures to highlight research gaps and present directions for future research in this area.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
STRIDE: Single-video based Temporally Continuous Occlusion Robust 3D Pose Estimation
Authors:
Rohit Lal,
Saketh Bachu,
Yash Garg,
Arindam Dutta,
Calvin-Khang Ta,
Dripta S. Raychaudhuri,
Hannah Dela Cruz,
M. Salman Asif,
Amit K. Roy-Chowdhury
Abstract:
The capability to accurately estimate 3D human poses is crucial for diverse fields such as action recognition, gait recognition, and virtual/augmented reality. However, a persistent and significant challenge within this field is the accurate prediction of human poses under conditions of severe occlusion. Traditional image-based estimators struggle with heavy occlusions due to a lack of temporal co…
▽ More
The capability to accurately estimate 3D human poses is crucial for diverse fields such as action recognition, gait recognition, and virtual/augmented reality. However, a persistent and significant challenge within this field is the accurate prediction of human poses under conditions of severe occlusion. Traditional image-based estimators struggle with heavy occlusions due to a lack of temporal context, resulting in inconsistent predictions. While video-based models benefit from processing temporal data, they encounter limitations when faced with prolonged occlusions that extend over multiple frames. This challenge arises because these models struggle to generalize beyond their training datasets, and the variety of occlusions is hard to capture in the training data. Addressing these challenges, we propose STRIDE (Single-video based TempoRally contInuous occlusion Robust 3D Pose Estimation), a novel Test-Time Training (TTT) approach to fit a human motion prior for each video. This approach specifically handles occlusions that were not encountered during the model's training. By employing STRIDE, we can refine a sequence of noisy initial pose estimates into accurate, temporally coherent poses during test time, effectively overcoming the limitations of prior methods. Our framework demonstrates flexibility by being model-agnostic, allowing us to use any off-the-shelf 3D pose estimation method for improving robustness and temporal consistency. We validate STRIDE's efficacy through comprehensive experiments on challenging datasets like Occluded Human3.6M, Human3.6M, and OCMotion, where it not only outperforms existing single-image and video-based pose estimation models but also showcases superior handling of substantial occlusions, achieving fast, robust, accurate, and temporally consistent 3D pose estimates.
△ Less
Submitted 13 March, 2024; v1 submitted 24 December, 2023;
originally announced December 2023.
-
Rota-Baxter operators and Loday-type algebras on the BiHom-associative conformal algebras
Authors:
Sania Asif,
Yao Wang
Abstract:
(Tri)dendriform algebras, Rota-Baxter operators, and closely related NS-algebras have a number of dominant applications in physics, especially in quantum field theory. Proceeding from the recent study relating these structures, this paper considers (tri)dendriform algebras, NS-algebras, and (twisted)Rota-Baxter operators in the context of BiHom-associative conformal algebras. A comprehensive inves…
▽ More
(Tri)dendriform algebras, Rota-Baxter operators, and closely related NS-algebras have a number of dominant applications in physics, especially in quantum field theory. Proceeding from the recent study relating these structures, this paper considers (tri)dendriform algebras, NS-algebras, and (twisted)Rota-Baxter operators in the context of BiHom-associative conformal algebras. A comprehensive investigation of the BiHom-(tri)dendriform conformal algebras and their characterization in terms of conformal bimodule has been conducted. The study of BiHom-NS-conformal algebra reveals that it is not only a generalization of NS-conformal algebra using two structural maps but is also the generalization of BiHom-(tri)dendriform conformal algebras. Additionally, it is found to have a close proximity between BiHom-twisted Rota-Baxter operators and BiHom-NS-conformal algebras. The comparative study to Rota-Baxter operators on BiHom-associative conformal algebras and Rota-Baxter operators on BiHom-(tri)dendriform conformal algebras reveals a relationship between BiHom-quadri conformal algebra and Rota-Baxter operators. In the end, the concept of Rota-Baxter system (a generalization of the Rota-Baxter operator) for BiHom-associative conformal algebras and BiHom-dendriform conformal algebras is narrated, where the interconnections of these algebras are depicted. Furthermore, a connection is established between BiHom-quadri conformal algebras and Rota-Baxter systems for BiHom-dendriform conformal algebras.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Cohomology and deformation theory of $\mathcal{O}$-operators on Hom-Lie conformal algebras
Authors:
Sania Asif,
Yao Wang,
Bouzid Mosbahi,
Imed Basdouri
Abstract:
In the present paper, we aim to introduce the cohomology of $\mathcal{O}$-operators defined on the Hom-Lie conformal algebra concerning the given representation. To obtain the desired results, we describe three different cochain complexes and discuss the interrelation of their coboundary operators. And show that differential maps on the graded Lie algebra can also be defined by using the Maurer-Ca…
▽ More
In the present paper, we aim to introduce the cohomology of $\mathcal{O}$-operators defined on the Hom-Lie conformal algebra concerning the given representation. To obtain the desired results, we describe three different cochain complexes and discuss the interrelation of their coboundary operators. And show that differential maps on the graded Lie algebra can also be defined by using the Maurer-Cartan element. We further find out that, the $\mathcal{O}$-operator on the given Hom-Lie conformal algebra serves as a Maurer-Cartan element and it leads to acquiring the notion of a differential map in terms of $\mathcal{O}$-operator $δ_{\mathcal{T}}$. Next, we provide the notion of Hom-pre-Lie conformal algebra, that induces a sub-adjacent Hom-Lie conformal algebra structure. The differential $δ_{β,α}$ of this sub-adjacent Hom-Lie conformal algebra is related to the differential $δ_{\mathcal{T}}$. Finally, we provide the deformation theory of $\mathcal{O}$-operators on the Hom-Lie conformal algebras as an application to the cohomology theory, where we discuss linear and formal deformations in detail.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Domain Expansion via Network Adaptation for Solving Inverse Problems
Authors:
Nebiyou Yismaw,
Ulugbek S. Kamilov,
M. Salman Asif
Abstract:
Deep learning-based methods deliver state-of-the-art performance for solving inverse problems that arise in computational imaging. These methods can be broadly divided into two groups: (1) learn a network to map measurements to the signal estimate, which is known to be fragile; (2) learn a prior for the signal to use in an optimization-based recovery. Despite the impressive results from the latter…
▽ More
Deep learning-based methods deliver state-of-the-art performance for solving inverse problems that arise in computational imaging. These methods can be broadly divided into two groups: (1) learn a network to map measurements to the signal estimate, which is known to be fragile; (2) learn a prior for the signal to use in an optimization-based recovery. Despite the impressive results from the latter approach, many of these methods also lack robustness to shifts in data distribution, measurements, and noise levels. Such domain shifts result in a performance gap and in some cases introduce undesired artifacts in the estimated signal. In this paper, we explore the qualitative and quantitative effects of various domain shifts and propose a flexible and parameter efficient framework that adapt pretrained networks to such shifts. We demonstrate the effectiveness of our method for a number of natural image, MRI, and CT reconstructions tasks under domain, measurement model, and noise-level shifts. Our experiments demonstrate that our method provides significantly better performance and parameter efficiency compared to existing domain adaptation techniques.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Factorized Tensor Networks for Multi-Task and Multi-Domain Learning
Authors:
Yash Garg,
Nebiyou Yismaw,
Rakib Hyder,
Ashley Prater-Bennette,
M. Salman Asif
Abstract:
Multi-task and multi-domain learning methods seek to learn multiple tasks/domains, jointly or one after another, using a single unified network. The key challenge and opportunity is to exploit shared information across tasks and domains to improve the efficiency of the unified network. The efficiency can be in terms of accuracy, storage cost, computation, or sample complexity. In this paper, we pr…
▽ More
Multi-task and multi-domain learning methods seek to learn multiple tasks/domains, jointly or one after another, using a single unified network. The key challenge and opportunity is to exploit shared information across tasks and domains to improve the efficiency of the unified network. The efficiency can be in terms of accuracy, storage cost, computation, or sample complexity. In this paper, we propose a factorized tensor network (FTN) that can achieve accuracy comparable to independent single-task/domain networks with a small number of additional parameters. FTN uses a frozen backbone network from a source model and incrementally adds task/domain-specific low-rank tensor factors to the shared frozen network. This approach can adapt to a large number of target domains and tasks without catastrophic forgetting. Furthermore, FTN requires a significantly smaller number of task-specific parameters compared to existing methods. We performed experiments on widely used multi-domain and multi-task datasets. We show the experiments on convolutional-based architecture with different backbones and on transformer-based architecture. We observed that FTN achieves similar accuracy as single-task/domain methods while using only a fraction of additional parameters per task.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Robust Multimodal Learning with Missing Modalities via Parameter-Efficient Adaptation
Authors:
Md Kaykobad Reza,
Ashley Prater-Bennette,
M. Salman Asif
Abstract:
Multimodal learning seeks to utilize data from multiple sources to improve the overall performance of downstream tasks. It is desirable for redundancies in the data to make multimodal systems robust to missing or corrupted observations in some correlated modalities. However, we observe that the performance of several existing multimodal networks significantly deteriorates if one or multiple modali…
▽ More
Multimodal learning seeks to utilize data from multiple sources to improve the overall performance of downstream tasks. It is desirable for redundancies in the data to make multimodal systems robust to missing or corrupted observations in some correlated modalities. However, we observe that the performance of several existing multimodal networks significantly deteriorates if one or multiple modalities are absent at test time. To enable robustness to missing modalities, we propose a simple and parameter-efficient adaptation procedure for pretrained multimodal networks. In particular, we exploit modulation of intermediate features to compensate for the missing modalities. We demonstrate that such adaptation can partially bridge performance drop due to missing modalities and outperform independent, dedicated networks trained for the available modality combinations in some cases. The proposed adaptation requires extremely small number of parameters (e.g., fewer than 0.7% of the total parameters) and applicable to a wide range of modality combinations and tasks. We conduct a series of experiments to highlight the missing modality robustness of our proposed method on 5 different datasets for multimodal semantic segmentation, multimodal material segmentation, and multimodal sentiment analysis tasks. Our proposed method demonstrates versatility across various tasks and datasets, and outperforms existing methods for robust multimodal learning with missing modalities.
△ Less
Submitted 26 February, 2024; v1 submitted 5 October, 2023;
originally announced October 2023.
-
Prior Mismatch and Adaptation in PnP-ADMM with a Nonconvex Convergence Analysis
Authors:
Shirin Shoushtari,
Jiaming Liu,
Edward P. Chandler,
M. Salman Asif,
Ulugbek S. Kamilov
Abstract:
Plug-and-Play (PnP) priors is a widely-used family of methods for solving imaging inverse problems by integrating physical measurement models with image priors specified using image denoisers. PnP methods have been shown to achieve state-of-the-art performance when the prior is obtained using powerful deep denoisers. Despite extensive work on PnP, the topic of distribution mismatch between the tra…
▽ More
Plug-and-Play (PnP) priors is a widely-used family of methods for solving imaging inverse problems by integrating physical measurement models with image priors specified using image denoisers. PnP methods have been shown to achieve state-of-the-art performance when the prior is obtained using powerful deep denoisers. Despite extensive work on PnP, the topic of distribution mismatch between the training and testing data has often been overlooked in the PnP literature. This paper presents a set of new theoretical and numerical results on the topic of prior distribution mismatch and domain adaptation for alternating direction method of multipliers (ADMM) variant of PnP. Our theoretical result provides an explicit error bound for PnP-ADMM due to the mismatch between the desired denoiser and the one used for inference. Our analysis contributes to the work in the area by considering the mismatch under nonconvex data-fidelity terms and expansive denoisers. Our first set of numerical results quantifies the impact of the prior distribution mismatch on the performance of PnP-ADMM on the problem of image super-resolution. Our second set of numerical results considers a simple and effective domain adaption strategy that closes the performance gap due to the use of mismatched denoisers. Our results suggest the relative robustness of PnP-ADMM to prior distribution mismatch, while also showing that the performance gap can be significantly reduced with few training samples from the desired distribution.
△ Less
Submitted 29 September, 2023;
originally announced October 2023.
-
MMSFormer: Multimodal Transformer for Material and Semantic Segmentation
Authors:
Md Kaykobad Reza,
Ashley Prater-Bennette,
M. Salman Asif
Abstract:
Leveraging information across diverse modalities is known to enhance performance on multimodal segmentation tasks. However, effectively fusing information from different modalities remains challenging due to the unique characteristics of each modality. In this paper, we propose a novel fusion strategy that can effectively fuse information from different modality combinations. We also propose a new…
▽ More
Leveraging information across diverse modalities is known to enhance performance on multimodal segmentation tasks. However, effectively fusing information from different modalities remains challenging due to the unique characteristics of each modality. In this paper, we propose a novel fusion strategy that can effectively fuse information from different modality combinations. We also propose a new model named Multi-Modal Segmentation TransFormer (MMSFormer) that incorporates the proposed fusion strategy to perform multimodal material and semantic segmentation tasks. MMSFormer outperforms current state-of-the-art models on three different datasets. As we begin with only one input modality, performance improves progressively as additional modalities are incorporated, showcasing the effectiveness of the fusion block in combining useful information from diverse input modalities. Ablation studies show that different modules in the fusion block are crucial for overall model performance. Furthermore, our ablation studies also highlight the capacity of different input modalities to improve performance in the identification of different types of materials. The code and pretrained models will be made available at https://github.com/csiplab/MMSFormer.
△ Less
Submitted 7 April, 2024; v1 submitted 7 September, 2023;
originally announced September 2023.
-
BiHom-(pre-)Poisson conformal algebra
Authors:
Sania Asif,
Yao Wang
Abstract:
The aim of this study is to introduce the notion of BiHom-Poisson conformal algebra, BiHom-pre-Poisson conformal algebra, and their related structures. We show that we can construct many new BiHom-Poisson conformal algebras for a given BiHom-Poisson conformal algebra. Moreover, the tensor product of two BiHom-Poisson conformal algebras is also a BiHom-Poisson conformal algebra. We further describe…
▽ More
The aim of this study is to introduce the notion of BiHom-Poisson conformal algebra, BiHom-pre-Poisson conformal algebra, and their related structures. We show that we can construct many new BiHom-Poisson conformal algebras for a given BiHom-Poisson conformal algebra. Moreover, the tensor product of two BiHom-Poisson conformal algebras is also a BiHom-Poisson conformal algebra. We further describe the conformal bimodule and representation theory of BiHom-Poisson conformal algebra. In addition, we define BiHom-pre-Poisson conformal algebra as the combination of BiHom-preLie conformal algebra and BiHom-dendriform conformal algebra under some compatibility conditions. We also demonstrate that how to construct BiHom-Poisson conformal algebra from BiHom-pre-Poisson conformal algebra and provide the representation theory for BiHom-pre-Poisson conformal algebra. Finally, a detailed description of $\mathcal{O}$-operators and Rota-Baxter operators on BiHom-Poisson conformal algebra is provided.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Cohomology of Twisted Rota-Baxter operators on Associative~Conformal Algebra
Authors:
Sania Asif,
Lamei Yuan,
Yao Wang
Abstract:
In this paper, we examine the concept of twisted Rota-Baxter (TRB) operators on associative conformal algebras. Our strategy begins by constructing an $L_\infty$-algebra using Maurer-Cartan elements derived from $H$-twisted Rota-Baxter ($H$-TRB) operators on associative conformal algebras. This structure leads us to explore the cohomology of the conformal $H$-TRB operator, which is characterized a…
▽ More
In this paper, we examine the concept of twisted Rota-Baxter (TRB) operators on associative conformal algebras. Our strategy begins by constructing an $L_\infty$-algebra using Maurer-Cartan elements derived from $H$-twisted Rota-Baxter ($H$-TRB) operators on associative conformal algebras. This structure leads us to explore the cohomology of the conformal $H$-TRB operator, which is characterized as the Hochschild cohomology of a specific associative conformal algebra with coefficients in a conformal bimodule. Furthermore, we study the linear and formal deformations of conformal $H$-TRB operators to explore the application of cohomology.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
On the Conformal biderivations and conformal commuting maps on the current Lie Conformal superalgebras
Authors:
Sania Asif,
Wang Yao
Abstract:
Let $L$ be a Lie conformal superalgebra and $A$ be an associative commutative algebra with unity. We define the current Lie conformal superalgebra by the tensor product $L \otimes A.$ We prove every conformal super-biderivation $\varphi_{λ}$ on $L$ is of the form of the centroid $Cent(L)$. Moreover, we show that every Lie conformal super-biderivation on $L\otimes A$ also has the same performance a…
▽ More
Let $L$ be a Lie conformal superalgebra and $A$ be an associative commutative algebra with unity. We define the current Lie conformal superalgebra by the tensor product $L \otimes A.$ We prove every conformal super-biderivation $\varphi_{λ}$ on $L$ is of the form of the centroid $Cent(L)$. Moreover, we show that every Lie conformal super-biderivation on $L\otimes A$ also has the same performance as $L$. We also prove that every Lie conformal linear super-commuting map $\varPsi_{λ}$ on $L \otimes A$ belongs to $Cent(L \otimes A)$, if the same holds for $L$ as well.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
Derivations, Cohomology and Deformation of BiHom-Associative Dialgebra
Authors:
Ahmed Zahari,
Sania Asif
Abstract:
Due to the immense importance of BiHom Type algebras and cohomology of various algebraic structures, this paper is devoted to defining the BiHom-associative dialgebra, its derivation, generalized derivation, and quasi-derivation. We provided the complete classification of these derivations of $2-$ and $3$-dimensional BiHom-associative dialgebras. We further generalized the cohomology of BiHom-asso…
▽ More
Due to the immense importance of BiHom Type algebras and cohomology of various algebraic structures, this paper is devoted to defining the BiHom-associative dialgebra, its derivation, generalized derivation, and quasi-derivation. We provided the complete classification of these derivations of $2-$ and $3$-dimensional BiHom-associative dialgebras. We further generalized the cohomology of BiHom-associative algebras to the cohomology of BiHom-associative dialgebras. As an application to cohomology, we evaluate the one-parameter formal deformation of BiHom-associative dialgebras.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
BFRT: Blockchained Federated Learning for Real-time Traffic Flow Prediction
Authors:
Collin Meese,
Hang Chen,
Syed Ali Asif,
Wanxin Li,
Chien-Chung Shen,
Mark Nejad
Abstract:
Accurate real-time traffic flow prediction can be leveraged to relieve traffic congestion and associated negative impacts. The existing centralized deep learning methodologies have demonstrated high prediction accuracy, but suffer from privacy concerns due to the sensitive nature of transportation data. Moreover, the emerging literature on traffic prediction by distributed learning approaches, inc…
▽ More
Accurate real-time traffic flow prediction can be leveraged to relieve traffic congestion and associated negative impacts. The existing centralized deep learning methodologies have demonstrated high prediction accuracy, but suffer from privacy concerns due to the sensitive nature of transportation data. Moreover, the emerging literature on traffic prediction by distributed learning approaches, including federated learning, primarily focuses on offline learning. This paper proposes BFRT, a blockchained federated learning architecture for online traffic flow prediction using real-time data and edge computing. The proposed approach provides privacy for the underlying data, while enabling decentralized model training in real-time at the Internet of Vehicles edge. We federate GRU and LSTM models and conduct extensive experiments with dynamically collected arterial traffic data shards. We prototype the proposed permissioned blockchain network on Hyperledger Fabric and perform extensive tests using virtual machines to simulate the edge nodes. Experimental results outperform the centralized models, highlighting the feasibility of our approach for facilitating privacy-preserving and decentralized real-time traffic flow prediction.
△ Less
Submitted 28 May, 2023;
originally announced May 2023.
-
PIQI: Perceptual Image Quality Index based on Ensemble of Gaussian Process Regression
Authors:
Nisar Ahmed,
Hafiz Muhammad Shahzad Asif,
Hassan Khalid
Abstract:
Digital images contain a lot of redundancies, therefore, compression techniques are applied to reduce the image size without loss of reasonable image quality. Same become more prominent in the case of videos which contains image sequences and higher compression ratios are achieved in low throughput networks. Assessment of quality of images in such scenarios has become of particular interest. Subje…
▽ More
Digital images contain a lot of redundancies, therefore, compression techniques are applied to reduce the image size without loss of reasonable image quality. Same become more prominent in the case of videos which contains image sequences and higher compression ratios are achieved in low throughput networks. Assessment of quality of images in such scenarios has become of particular interest. Subjective evaluation in most of the scenarios is infeasible so objective evaluation is preferred. Among the three objective quality measures, full-reference and reduced-reference methods require an original image in some form to calculate the image quality which is unfeasible in scenarios such as broadcasting, acquisition or enhancement. Therefore, a no-reference Perceptual Image Quality Index (PIQI) is proposed in this paper to assess the quality of digital images which calculates luminance and gradient statistics along with mean subtracted contrast normalized products in multiple scales and color spaces. These extracted features are provided to a stacked ensemble of Gaussian Process Regression (GPR) to perform the perceptual quality evaluation. The performance of the PIQI is checked on six benchmark databases and compared with twelve state-of-the-art methods and competitive results are achieved. The comparison is made based on RMSE, Pearson and Spearman correlation coefficients between ground truth and predicted quality scores. The scores of 0.0552, 0.9802 and 0.9776 are achieved respectively for these metrics on CSIQ database. Two cross-dataset evaluation experiments are performed to check the generalization of PIQI.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
Deep Ensembling for Perceptual Image Quality Assessment
Authors:
Nisar Ahmed,
H. M. Shahzad Asif,
Abdul Rauf Bhatti,
Atif Khan
Abstract:
Blind image quality assessment is a challenging task particularly due to the unavailability of reference information. Training a deep neural network requires a large amount of training data which is not readily available for image quality. Transfer learning is usually opted to overcome this limitation and different deep architectures are used for this purpose as they learn features differently. Af…
▽ More
Blind image quality assessment is a challenging task particularly due to the unavailability of reference information. Training a deep neural network requires a large amount of training data which is not readily available for image quality. Transfer learning is usually opted to overcome this limitation and different deep architectures are used for this purpose as they learn features differently. After extensive experiments, we have designed a deep architecture containing two CNN architectures as its sub-units. Moreover, a self-collected image database BIQ2021 is proposed with 12,000 images having natural distortions. The self-collected database is subjectively scored and is used for model training and validation. It is demonstrated that synthetic distortion databases cannot provide generalization beyond the distortion types used in the database and they are not ideal candidates for general-purpose image quality assessment. Moreover, a large-scale database of 18.75 million images with synthetic distortions is used to pretrain the model and then retrain it on benchmark databases for evaluation. Experiments are conducted on six benchmark databases three of which are synthetic distortion databases (LIVE, CSIQ and TID2013) and three are natural distortion databases (LIVE Challenge Database, CID2013 and KonIQ-10 k). The proposed approach has provided a Pearson correlation coefficient of 0.8992, 0.8472 and 0.9452 subsequently and Spearman correlation coefficient of 0.8863, 0.8408 and 0.9421. Moreover, the performance is demonstrated using perceptually weighted rank correlation to indicate the perceptual superiority of the proposed approach. Multiple experiments are conducted to validate the generalization performance of the proposed model by training on different subsets of the databases and validating on the test subset of BIQ2021 database.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
Classification of tridendriform algebra and related structures
Authors:
Bouzid Mosbahi,
Sania Asif,
Ahmed Zahari
Abstract:
The classification of algebraic structures and their derivations is an important and ongoing research area in mathematics and physics, and various results have been obtained in this field. This article presents the classification of tridendriform algebras that was first studied by Loday and Ronco, including an analysis of structure constant equations using computer algebra software. We further exp…
▽ More
The classification of algebraic structures and their derivations is an important and ongoing research area in mathematics and physics, and various results have been obtained in this field. This article presents the classification of tridendriform algebras that was first studied by Loday and Ronco, including an analysis of structure constant equations using computer algebra software. We further explicitly classify the derivations and centroids of tridendriform algebras, showing that there are only trivial derivations for $2$- and $3$-dimensional algebras but $21$ non-isomorphic derivations for $4$-dimensional tridendriform algebras with dimension range from $1$ to $5$. Additionally, for centroids (centroid and quasi-centroid), there are trivial isomorphism classes for $2$ dimensional tridendriform algebra, $6$ non-isomorphic classes for $3$-dimensional tridendriform algebras and $21$ for $4$-dimensional algebras. The dimensions range for centroid is from $1$ to $5$, whereas it is from $1$ to $10$ for quasi-centroid.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
On some derivations of Lie conformal superalgebras
Authors:
Lipeng Luo,
Sania Asif
Abstract:
Let $\mathcal{R}$ be a Lie conformal superalgebra. In this paper, we first investigate the conformal derivation algebra $CDer(\mathcal{R})$, the conformal triple derivation algebra $CTDer(\mathcal{R})$, and the generalized conformal triple derivation algebra $GCTDer(\mathcal{R})$. Moreover, we determine the connection of these derivation algebras. Next, we give a complete classification of the (ge…
▽ More
Let $\mathcal{R}$ be a Lie conformal superalgebra. In this paper, we first investigate the conformal derivation algebra $CDer(\mathcal{R})$, the conformal triple derivation algebra $CTDer(\mathcal{R})$, and the generalized conformal triple derivation algebra $GCTDer(\mathcal{R})$. Moreover, we determine the connection of these derivation algebras. Next, we give a complete classification of the (generalized) conformal triple derivation algebra on all finite simple Lie conformal superalgebras. More specifically, $CTDer(\mathcal{R})=CDer(\mathcal{R})$, where $\mathcal{R}$ is a finite simple Lie conformal superalgebra, but for $GCTDer(\mathcal{R})$, we obtain a conclusion that is closely related to $CDer(\mathcal{R})$. Furthermore, we evaluate the $(\varPhi, \varPsi)$-Lie triple derivations on Lie conformal superalgebra, where $\varPhi$ and $\varPsi$ are associated automorphism of $φ_{x}\in gc(\mathcal R)$. We evaluated some fundamental properties of $(\varPhi, \varPsi)$- Lie triple derivations. Later, we introduce the definition of $(A, B, C, D)$-derivation on Lie conformal superalgebra. We obtain the relationships between the generalized conformal triple derivations and the conformal $(A, B, C, D)$-derivations on Lie conformal superalgebra. Finally, we have presented the triple homomorphism of Lie conformal superalgebras.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Conformal triple derivations and triple homomorphisms of Lie conformal algebras
Authors:
Sania Asif,
Lipeng Luo,
Yanyong Hong,
Zhixiang Wu
Abstract:
Let $\mathcal{R}$ be a finite Lie conformal algebra. In this paper, we first investigate the conformal derivation algebra $CDer(\mathcal{R})$, the conformal triple derivation algebra $CTDer(\mathcal{R})$ and the generalized conformal triple derivation algebra $GCTDer(\mathcal{R})$. Mainly, we focus on the connections among these derivation algebras. Next, we give a complete classification of (gene…
▽ More
Let $\mathcal{R}$ be a finite Lie conformal algebra. In this paper, we first investigate the conformal derivation algebra $CDer(\mathcal{R})$, the conformal triple derivation algebra $CTDer(\mathcal{R})$ and the generalized conformal triple derivation algebra $GCTDer(\mathcal{R})$. Mainly, we focus on the connections among these derivation algebras. Next, we give a complete classification of (generalized) conformal triple derivation algebras on all finite simple Lie conformal algebras. In particular, $CTDer(\mathcal{R})= CDer(\mathcal{R})$, where $\mathcal{R}$ is a finite simple Lie conformal algebra. But for $GCDer(\mathcal{R})$, we obtain a conclusion that is closely related to $CDer(\mathcal{R})$. Finally, we introduce the definition of triple homomorphism of a Lie conformal algebra. Furthermore, triple homomorphisms of all finite simple Lie conformal algebras are also characterized.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos
Authors:
Qi Zhao,
M. Salman Asif,
Zhan Ma
Abstract:
Existing implicit neural representation (INR) methods do not fully exploit spatiotemporal redundancies in videos. Index-based INRs ignore the content-specific spatial features and hybrid INRs ignore the contextual dependency on adjacent frames, leading to poor modeling capability for scenes with large motion or dynamics. We analyze this limitation from the perspective of function fitting and revea…
▽ More
Existing implicit neural representation (INR) methods do not fully exploit spatiotemporal redundancies in videos. Index-based INRs ignore the content-specific spatial features and hybrid INRs ignore the contextual dependency on adjacent frames, leading to poor modeling capability for scenes with large motion or dynamics. We analyze this limitation from the perspective of function fitting and reveal the importance of frame difference. To use explicit motion information, we propose Difference Neural Representation for Videos (DNeRV), which consists of two streams for content and frame difference. We also introduce a collaborative content unit for effective feature fusion. We test DNeRV for video compression, inpainting, and interpolation. DNeRV achieves competitive results against the state-of-the-art neural compression approaches and outperforms existing implicit methods on downstream inpainting and interpolation for $960 \times 1920$ videos.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
On The Application Of Log Compression and Enhanced Denoising In Contrast Enhancement Of Digital Radiography Images
Authors:
M. S. Asif,
Mahesh Raveendranatha Panicker
Abstract:
Digital radiography (DR) is becoming popular for the point of care imaging in the recent past. To reduce the radiation exposure, controlled radiation based on as low as reasonably achievable (ALARA) principle is employed and this results in low contrast images. To address this issue, post-processing algorithms such as the Multiscale Image Contrast Amplification (MUSICA) algorithm can be used to en…
▽ More
Digital radiography (DR) is becoming popular for the point of care imaging in the recent past. To reduce the radiation exposure, controlled radiation based on as low as reasonably achievable (ALARA) principle is employed and this results in low contrast images. To address this issue, post-processing algorithms such as the Multiscale Image Contrast Amplification (MUSICA) algorithm can be used to enhance the contrast of DR images even with a low radiation dose. In this study, a modification of the MUSICA algorithm is investigated to determine the potential for further contrast improvement specifically for DR images. The conclusion is that combining log compression and its inverse at the appropriate stage with a multi-stage MUSICA and denoising is very promising. The proposed method resulted in an average of 66.5 % increase in the mean contrast-to-noise ratio (CNR) for the test images considered.
△ Less
Submitted 18 April, 2023; v1 submitted 8 April, 2023;
originally announced April 2023.
-
Fast Marching based Tissue Adaptive Delay Estimation for Aberration Corrected Delay and Sum Beamforming in Ultrasound Imaging
Authors:
M. S. Asif,
Gayathri Malamal,
A. N. Madhavanunni,
Vikram Melapudi,
V Rahul,
Abhijit Patil,
Rajesh Langoju,
Mahesh Raveendranatha Panicker
Abstract:
Conventional ultrasound (US) imaging employs the delay and sum (DAS) receive beamforming with dynamic receive focus for image reconstruction due to its simplicity and robustness. However, the DAS beamforming follows a geometrical method of delay estimation with a spatially constant speed-of-sound (SoS) of 1540 m/s throughout the medium irrespective of the tissue in-homogeneity. This approximation…
▽ More
Conventional ultrasound (US) imaging employs the delay and sum (DAS) receive beamforming with dynamic receive focus for image reconstruction due to its simplicity and robustness. However, the DAS beamforming follows a geometrical method of delay estimation with a spatially constant speed-of-sound (SoS) of 1540 m/s throughout the medium irrespective of the tissue in-homogeneity. This approximation leads to errors in delay estimations that accumulate with depth and degrades the resolution, contrast and overall accuracy of the US image. In this work, we propose a fast marching based DAS for focused transmissions which leverages the approximate SoS map to estimate the refraction corrected propagation delays for each pixel in the medium. The proposed approach is validated qualitatively and quantitatively for imaging depths of upto ~ 11 cm through simulations, where fat layer induced aberration is employed to alter the SoS in the medium. To the best of authors' knowledge, this is the first work considering the effect of SoS on image quality for deeper imaging.
△ Less
Submitted 19 April, 2023; v1 submitted 7 April, 2023;
originally announced April 2023.
-
Ensemble-based Blackbox Attacks on Dense Prediction
Authors:
Zikui Cai,
Yaoteng Tan,
M. Salman Asif
Abstract:
We propose an approach for adversarial attacks on dense prediction models (such as object detectors and segmentation). It is well known that the attacks generated by a single surrogate model do not transfer to arbitrary (blackbox) victim models. Furthermore, targeted attacks are often more challenging than the untargeted attacks. In this paper, we show that a carefully designed ensemble can create…
▽ More
We propose an approach for adversarial attacks on dense prediction models (such as object detectors and segmentation). It is well known that the attacks generated by a single surrogate model do not transfer to arbitrary (blackbox) victim models. Furthermore, targeted attacks are often more challenging than the untargeted attacks. In this paper, we show that a carefully designed ensemble can create effective attacks for a number of victim models. In particular, we show that normalization of the weights for individual models plays a critical role in the success of the attacks. We then demonstrate that by adjusting the weights of the ensemble according to the victim model can further improve the performance of the attacks. We performed a number of experiments for object detectors and segmentation to highlight the significance of the our proposed methods. Our proposed ensemble-based method outperforms existing blackbox attack methods for object detection and segmentation. Finally we show that our proposed method can also generate a single perturbation that can fool multiple blackbox detection and segmentation models simultaneously. Code is available at https://github.com/CSIPlab/EBAD.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
Disguise without Disruption: Utility-Preserving Face De-Identification
Authors:
Zikui Cai,
Zhongpai Gao,
Benjamin Planche,
Meng Zheng,
Terrence Chen,
M. Salman Asif,
Ziyan Wu
Abstract:
With the rise of cameras and smart sensors, humanity generates an exponential amount of data. This valuable information, including underrepresented cases like AI in medical settings, can fuel new deep-learning tools. However, data scientists must prioritize ensuring privacy for individuals in these untapped datasets, especially for images or videos with faces, which are prime targets for identific…
▽ More
With the rise of cameras and smart sensors, humanity generates an exponential amount of data. This valuable information, including underrepresented cases like AI in medical settings, can fuel new deep-learning tools. However, data scientists must prioritize ensuring privacy for individuals in these untapped datasets, especially for images or videos with faces, which are prime targets for identification methods. Proposed solutions to de-identify such images often compromise non-identifying facial attributes relevant to downstream tasks. In this paper, we introduce Disguise, a novel algorithm that seamlessly de-identifies facial images while ensuring the usability of the modified data. Unlike previous approaches, our solution is firmly grounded in the domains of differential privacy and ensemble-learning research. Our method involves extracting and substituting depicted identities with synthetic ones, generated using variational mechanisms to maximize obfuscation and non-invertibility. Additionally, we leverage supervision from a mixture-of-experts to disentangle and preserve other utility attributes. We extensively evaluate our method using multiple datasets, demonstrating a higher de-identification rate and superior consistency compared to prior approaches in various downstream tasks.
△ Less
Submitted 18 December, 2023; v1 submitted 23 March, 2023;
originally announced March 2023.
-
Compressive Sensing with Tensorized Autoencoder
Authors:
Rakib Hyder,
M. Salman Asif
Abstract:
Deep networks can be trained to map images into a low-dimensional latent space. In many cases, different images in a collection are articulated versions of one another; for example, same object with different lighting, background, or pose. Furthermore, in many cases, parts of images can be corrupted by noise or missing entries. In this paper, our goal is to recover images without access to the gro…
▽ More
Deep networks can be trained to map images into a low-dimensional latent space. In many cases, different images in a collection are articulated versions of one another; for example, same object with different lighting, background, or pose. Furthermore, in many cases, parts of images can be corrupted by noise or missing entries. In this paper, our goal is to recover images without access to the ground-truth (clean) images using the articulations as structural prior of the data. Such recovery problems fall under the domain of compressive sensing. We propose to learn autoencoder with tensor ring factorization on the the embedding space to impose structural constraints on the data. In particular, we use a tensor ring structure in the bottleneck layer of the autoencoder that utilizes the soft labels of the structured dataset. We empirically demonstrate the effectiveness of the proposed approach for inpainting and denoising applications. The resulting method achieves better reconstruction quality compared to other generative prior-based self-supervised recovery approaches for compressive sensing.
△ Less
Submitted 10 March, 2023;
originally announced March 2023.
-
Coded Illumination for 3D Lensless Imaging
Authors:
Yucheng Zheng,
M. Salman Asif
Abstract:
Mask-based lensless cameras offer a novel design for imaging systems by replacing the lens in a conventional camera with a layer of coded mask. Each pixel of the lensless camera encodes the information of the entire 3D scene. Existing methods for 3D reconstruction from lensless measurements suffer from poor spatial and depth resolution. This is partially due to the system ill conditioning that ari…
▽ More
Mask-based lensless cameras offer a novel design for imaging systems by replacing the lens in a conventional camera with a layer of coded mask. Each pixel of the lensless camera encodes the information of the entire 3D scene. Existing methods for 3D reconstruction from lensless measurements suffer from poor spatial and depth resolution. This is partially due to the system ill conditioning that arises because the point-spread functions (PSFs) from different depth planes are very similar. In this paper, we propose to capture multiple measurements of the scene under a sequence of coded illumination patterns to improve the 3D image reconstruction quality. In addition, we put the illumination source at a distance away from the camera. With such baseline distance between the lensless camera and illumination source, the camera observes a slice of the 3D volume, and the PSF of each depth plane becomes more resolvable from each other. We present simulation results along with experimental results with a camera prototype to demonstrate the effectiveness of our approach.
△ Less
Submitted 22 December, 2022;
originally announced December 2022.
-
Efficient Visual Computing with Camera RAW Snapshots
Authors:
Zhihao Li,
Ming Lu,
Xu Zhang,
Xin Feng,
M. Salman Asif,
Zhan Ma
Abstract:
Conventional cameras capture image irradiance on a sensor and convert it to RGB images using an image signal processor (ISP). The images can then be used for photography or visual computing tasks in a variety of applications, such as public safety surveillance and autonomous driving. One can argue that since RAW images contain all the captured information, the conversion of RAW to RGB using an ISP…
▽ More
Conventional cameras capture image irradiance on a sensor and convert it to RGB images using an image signal processor (ISP). The images can then be used for photography or visual computing tasks in a variety of applications, such as public safety surveillance and autonomous driving. One can argue that since RAW images contain all the captured information, the conversion of RAW to RGB using an ISP is not necessary for visual computing. In this paper, we propose a novel $ρ$-Vision framework to perform high-level semantic understanding and low-level compression using RAW images without the ISP subsystem used for decades. Considering the scarcity of available RAW image datasets, we first develop an unpaired CycleR2R network based on unsupervised CycleGAN to train modular unrolled ISP and inverse ISP (invISP) models using unpaired RAW and RGB images. We can then flexibly generate simulated RAW images (simRAW) using any existing RGB image dataset and finetune different models originally trained for the RGB domain to process real-world camera RAW images. We demonstrate object detection and image compression capabilities in RAW-domain using RAW-domain YOLOv3 and RAW image compressor (RIC) on snapshots from various cameras. Quantitative results reveal that RAW-domain task inference provides better detection accuracy and compression compared to RGB-domain processing. Furthermore, the proposed \r{ho}-Vision generalizes across various camera sensors and different task-specific models. Additional advantages of the proposed $ρ$-Vision that eliminates the ISP are the potential reductions in computations and processing times.
△ Less
Submitted 25 January, 2024; v1 submitted 15 December, 2022;
originally announced December 2022.
-
On generalized derivations of polynomial vector fields Lie algebras
Authors:
Princy Randriambololondrantomalala,
Sania Asif
Abstract:
In this paper, we study the generalized derivation of a Lie sub-algebra of the Lie algebra of polynomial vector fields on $\mathbb{R}^n$ where $n\geq1$, containing all constant vector fields and the Euler vector field, under some conditions on this Lie sub-algebra.
In this paper, we study the generalized derivation of a Lie sub-algebra of the Lie algebra of polynomial vector fields on $\mathbb{R}^n$ where $n\geq1$, containing all constant vector fields and the Euler vector field, under some conditions on this Lie sub-algebra.
△ Less
Submitted 30 November, 2022; v1 submitted 28 November, 2022;
originally announced November 2022.
-
Image Quality Assessment for Foliar Disease Identification (AgroPath)
Authors:
Nisar Ahmed,
Hafiz Muhammad Shahzad Asif,
Gulshan Saleem,
Muhammad Usman Younus
Abstract:
Crop diseases are a major threat to food security and their rapid identification is important to prevent yield loss. Swift identification of these diseases are difficult due to the lack of necessary infrastructure. Recent advances in computer vision and increasing penetration of smartphones have paved the way for smartphone-assisted disease identification. Most of the plant diseases leave particul…
▽ More
Crop diseases are a major threat to food security and their rapid identification is important to prevent yield loss. Swift identification of these diseases are difficult due to the lack of necessary infrastructure. Recent advances in computer vision and increasing penetration of smartphones have paved the way for smartphone-assisted disease identification. Most of the plant diseases leave particular artifacts on the foliar structure of the plant. This study was conducted in 2020 at Department of Computer Science and Engineering, University of Engineering and Technology, Lahore, Pakistan to check leaf-based plant disease identification. This study provided a deep neural network-based solution to foliar disease identification and incorporated image quality assessment to select the image of the required quality to perform identification and named it Agricultural Pathologist (Agro Path). The captured image by a novice photographer may contain noise, lack of structure, and blur which result in a failed or inaccurate diagnosis. Moreover, AgroPath model had 99.42% accuracy for foliar disease identification. The proposed addition can be especially useful for application of foliar disease identification in the field of agriculture.
△ Less
Submitted 26 September, 2022;
originally announced September 2022.
-
Leveraging Local Patch Differences in Multi-Object Scenes for Generative Adversarial Attacks
Authors:
Abhishek Aich,
Shasha Li,
Chengyu Song,
M. Salman Asif,
Srikanth V. Krishnamurthy,
Amit K. Roy-Chowdhury
Abstract:
State-of-the-art generative model-based attacks against image classifiers overwhelmingly focus on single-object (i.e., single dominant object) images. Different from such settings, we tackle a more practical problem of generating adversarial perturbations using multi-object (i.e., multiple dominant objects) images as they are representative of most real-world scenes. Our goal is to design an attac…
▽ More
State-of-the-art generative model-based attacks against image classifiers overwhelmingly focus on single-object (i.e., single dominant object) images. Different from such settings, we tackle a more practical problem of generating adversarial perturbations using multi-object (i.e., multiple dominant objects) images as they are representative of most real-world scenes. Our goal is to design an attack strategy that can learn from such natural scenes by leveraging the local patch differences that occur inherently in such images (e.g. difference between the local patch on the object `person' and the object `bike' in a traffic scene). Our key idea is to misclassify an adversarial multi-object image by confusing the victim classifier for each local patch in the image. Based on this, we propose a novel generative attack (called Local Patch Difference or LPD-Attack) where a novel contrastive loss function uses the aforesaid local differences in feature space of multi-object scenes to optimize the perturbation generator. Through various experiments across diverse victim convolutional neural networks, we show that our approach outperforms baseline generative attacks with highly transferable perturbations when evaluated under different white-box and black-box settings.
△ Less
Submitted 3 October, 2022; v1 submitted 20 September, 2022;
originally announced September 2022.
-
GAMA: Generative Adversarial Multi-Object Scene Attacks
Authors:
Abhishek Aich,
Calvin-Khang Ta,
Akash Gupta,
Chengyu Song,
Srikanth V. Krishnamurthy,
M. Salman Asif,
Amit K. Roy-Chowdhury
Abstract:
The majority of methods for crafting adversarial attacks have focused on scenes with a single dominant object (e.g., images from ImageNet). On the other hand, natural scenes include multiple dominant objects that are semantically related. Thus, it is crucial to explore designing attack strategies that look beyond learning on single-object scenes or attack single-object victim classifiers. Due to t…
▽ More
The majority of methods for crafting adversarial attacks have focused on scenes with a single dominant object (e.g., images from ImageNet). On the other hand, natural scenes include multiple dominant objects that are semantically related. Thus, it is crucial to explore designing attack strategies that look beyond learning on single-object scenes or attack single-object victim classifiers. Due to their inherent property of strong transferability of perturbations to unknown models, this paper presents the first approach of using generative models for adversarial attacks on multi-object scenes. In order to represent the relationships between different objects in the input scene, we leverage upon the open-sourced pre-trained vision-language model CLIP (Contrastive Language-Image Pre-training), with the motivation to exploit the encoded semantics in the language space along with the visual space. We call this attack approach Generative Adversarial Multi-object scene Attacks (GAMA). GAMA demonstrates the utility of the CLIP model as an attacker's tool to train formidable perturbation generators for multi-object scenes. Using the joint image-text features to train the generator, we show that GAMA can craft potent transferable perturbations in order to fool victim classifiers in various attack settings. For example, GAMA triggers ~16% more misclassification than state-of-the-art generative approaches in black-box settings where both the classifier architecture and data distribution of the attacker are different from the victim. Our code is available here: https://abhishekaich27.github.io/gama.html
△ Less
Submitted 15 October, 2022; v1 submitted 20 September, 2022;
originally announced September 2022.
-
Blackbox Attacks via Surrogate Ensemble Search
Authors:
Zikui Cai,
Chengyu Song,
Srikanth Krishnamurthy,
Amit Roy-Chowdhury,
M. Salman Asif
Abstract:
Blackbox adversarial attacks can be categorized into transfer- and query-based attacks. Transfer methods do not require any feedback from the victim model, but provide lower success rates compared to query-based methods. Query attacks often require a large number of queries for success. To achieve the best of both approaches, recent efforts have tried to combine them, but still require hundreds of…
▽ More
Blackbox adversarial attacks can be categorized into transfer- and query-based attacks. Transfer methods do not require any feedback from the victim model, but provide lower success rates compared to query-based methods. Query attacks often require a large number of queries for success. To achieve the best of both approaches, recent efforts have tried to combine them, but still require hundreds of queries to achieve high success rates (especially for targeted attacks). In this paper, we propose a novel method for Blackbox Attacks via Surrogate Ensemble Search (BASES) that can generate highly successful blackbox attacks using an extremely small number of queries. We first define a perturbation machine that generates a perturbed image by minimizing a weighted loss function over a fixed set of surrogate models. To generate an attack for a given victim model, we search over the weights in the loss function using queries generated by the perturbation machine. Since the dimension of the search space is small (same as the number of surrogate models), the search requires a small number of queries. We demonstrate that our proposed method achieves better success rate with at least 30x fewer queries compared to state-of-the-art methods on different image classifiers trained with ImageNet. In particular, our method requires as few as 3 queries per image (on average) to achieve more than a 90% success rate for targeted attacks and 1-2 queries per image for over a 99% success rate for untargeted attacks. Our method is also effective on Google Cloud Vision API and achieved a 91% untargeted attack success rate with 2.9 queries per image. We also show that the perturbations generated by our proposed method are highly transferable and can be adopted for hard-label blackbox attacks. We also show effectiveness of BASES for hiding attacks on object detectors.
△ Less
Submitted 23 November, 2022; v1 submitted 6 August, 2022;
originally announced August 2022.
-
H2-Stereo: High-Speed, High-Resolution Stereoscopic Video System
Authors:
Ming Cheng,
Yiling Xu,
Wang Shen,
M. Salman Asif,
Chao Ma,
Jun Sun,
Zhan Ma
Abstract:
High-speed, high-resolution stereoscopic (H2-Stereo) video allows us to perceive dynamic 3D content at fine granularity. The acquisition of H2-Stereo video, however, remains challenging with commodity cameras. Existing spatial super-resolution or temporal frame interpolation methods provide compromised solutions that lack temporal or spatial details, respectively. To alleviate this problem, we pro…
▽ More
High-speed, high-resolution stereoscopic (H2-Stereo) video allows us to perceive dynamic 3D content at fine granularity. The acquisition of H2-Stereo video, however, remains challenging with commodity cameras. Existing spatial super-resolution or temporal frame interpolation methods provide compromised solutions that lack temporal or spatial details, respectively. To alleviate this problem, we propose a dual camera system, in which one camera captures high-spatial-resolution low-frame-rate (HSR-LFR) videos with rich spatial details, and the other captures low-spatial-resolution high-frame-rate (LSR-HFR) videos with smooth temporal details. We then devise a Learned Information Fusion network (LIFnet) that exploits the cross-camera redundancies to enhance both camera views to high spatiotemporal resolution (HSTR) for reconstructing the H2-Stereo video effectively. We utilize a disparity network to transfer spatiotemporal information across views even in large disparity scenes, based on which, we propose disparity-guided flow-based war** for LSR-HFR view and complementary war** for HSR-LFR view. A multi-scale fusion method in feature domain is proposed to minimize occlusion-induced war** ghosts and holes in HSR-LFR view. The LIFnet is trained in an end-to-end manner using our collected high-quality Stereo Video dataset from YouTube. Extensive experiments demonstrate that our model outperforms existing state-of-the-art methods for both views on synthetic data and camera-captured real data with large disparity. Ablation studies explore various aspects, including spatiotemporal resolution, camera baseline, camera desynchronization, long/short exposures and applications, of our system to fully understand its capability for potential applications.
△ Less
Submitted 4 August, 2022;
originally announced August 2022.
-
Incremental Task Learning with Incremental Rank Updates
Authors:
Rakib Hyder,
Ken Shao,
Boyu Hou,
Panos Markopoulos,
Ashley Prater-Bennette,
M. Salman Asif
Abstract:
Incremental Task learning (ITL) is a category of continual learning that seeks to train a single network for multiple tasks (one after another), where training data for each task is only available during the training of that task. Neural networks tend to forget older tasks when they are trained for the newer tasks; this property is often known as catastrophic forgetting. To address this issue, ITL…
▽ More
Incremental Task learning (ITL) is a category of continual learning that seeks to train a single network for multiple tasks (one after another), where training data for each task is only available during the training of that task. Neural networks tend to forget older tasks when they are trained for the newer tasks; this property is often known as catastrophic forgetting. To address this issue, ITL methods use episodic memory, parameter regularization, masking and pruning, or extensible network structures. In this paper, we propose a new incremental task learning framework based on low-rank factorization. In particular, we represent the network weights for each layer as a linear combination of several rank-1 matrices. To update the network for a new task, we learn a rank-1 (or low-rank) matrix and add that to the weights of every layer. We also introduce an additional selector vector that assigns different weights to the low-rank matrices learned for the previous tasks. We show that our approach performs better than the current state-of-the-art methods in terms of accuracy and forgetting. Our method also offers better memory efficiency compared to episodic memory- and mask-based approaches. Our code will be available at https://github.com/CSIPlab/task-increment-rank-update.git
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
On the cohomology based on the generalized representations of $n$-Lie Algebras
Authors:
Afi Maha,
Sania Asif,
Chouaibi Sami,
Basdouri Imed
Abstract:
In the present paper, we define the new class of representation on $n$-Lie algebra that is called as generalized representation. We study the cohomology theory corresponding to generalized representations of $n$-Lie algebras and show its relation with the cohomology corresponding to the usual representations. Furthermore, we provide the computation for the low dimensional cocycles.
In the present paper, we define the new class of representation on $n$-Lie algebra that is called as generalized representation. We study the cohomology theory corresponding to generalized representations of $n$-Lie algebras and show its relation with the cohomology corresponding to the usual representations. Furthermore, we provide the computation for the low dimensional cocycles.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.
-
Above-room-temperature ferromagnetism in ultrathin van der Waals magnet
Authors:
Hang Chen,
Shahidul Asif,
Kapildeb Dolui,
Yang Wang,
Jeyson Tamara Isaza,
V. M. L. Durga Prasad Goli,
Matthew Whalen,
Xinhao Wang,
Zhijie Chen,
Huiqin Zhang,
Kai Liu,
Deep Jariwala,
M. Benjamin Jungfleisch,
Chitraleema Chakraborty,
Andrew F. May,
Michael A. McGuire,
Branislav K. Nikolic,
John Q. Xiao,
Mark J. H. Ku
Abstract:
Two-dimensional (2D) magnetic van der Waals materials provide a powerful platform for studying fundamental physics of low-dimensional magnetism, engineering novel magnetic phases, and enabling ultrathin and highly tunable spintronic devices. To realize high quality and practical devices for such applications, there is a critical need for robust 2D magnets with ordering temperatures above room temp…
▽ More
Two-dimensional (2D) magnetic van der Waals materials provide a powerful platform for studying fundamental physics of low-dimensional magnetism, engineering novel magnetic phases, and enabling ultrathin and highly tunable spintronic devices. To realize high quality and practical devices for such applications, there is a critical need for robust 2D magnets with ordering temperatures above room temperature that can be created via exfoliation. Here the study of exfoliated flakes of cobalt substituted Fe5GeTe2 (CFGT) exhibiting magnetism above room temperature is reported. Via quantum magnetic imaging with nitrogen-vacancy centers in diamond, ferromagnetism at room temperature was observed in CFGT flakes as thin as 16 nm. This corresponds to one of the thinnest room-temperature 2D magnet flakes exfoliated from robust single crystals, reaching a thickness relevant to practical spintronic applications. The Curie temperature Tc of CFGT ranges from 310 K in the thinnest flake studied to 328 K in the bulk. To investigate the prospect of high-temperature monolayer ferromagnetism, Monte Carlo calculations were performed which predicted a high value of Tc ~270 K in CFGT monolayers. Pathways towards further enhancing monolayer Tc are discussed. These results support CFGT as a promising platform to realize high-quality room-temperature 2D magnet devices.
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
2022 Review of Data-Driven Plasma Science
Authors:
Rushil Anirudh,
Rick Archibald,
M. Salman Asif,
Markus M. Becker,
Sadruddin Benkadda,
Peer-Timo Bremer,
Rick H. S. Budé,
C. S. Chang,
Lei Chen,
R. M. Churchill,
Jonathan Citrin,
Jim A Gaffney,
Ana Gainaru,
Walter Gekelman,
Tom Gibbs,
Satoshi Hamaguchi,
Christian Hill,
Kelli Humbird,
Sören Jalas,
Satoru Kawaguchi,
Gon-Ho Kim,
Manuel Kirchen,
Scott Klasky,
John L. Kline,
Karl Krushelnick
, et al. (38 additional authors not shown)
Abstract:
Data science and technology offer transformative tools and methods to science. This review article highlights latest development and progress in the interdisciplinary field of data-driven plasma science (DDPS). A large amount of data and machine learning algorithms go hand in hand. Most plasma data, whether experimental, observational or computational, are generated or collected by machines today.…
▽ More
Data science and technology offer transformative tools and methods to science. This review article highlights latest development and progress in the interdisciplinary field of data-driven plasma science (DDPS). A large amount of data and machine learning algorithms go hand in hand. Most plasma data, whether experimental, observational or computational, are generated or collected by machines today. It is now becoming impractical for humans to analyze all the data manually. Therefore, it is imperative to train machines to analyze and interpret (eventually) such data as intelligently as humans but far more efficiently in quantity. Despite the recent impressive progress in applications of data science to plasma science and technology, the emerging field of DDPS is still in its infancy. Fueled by some of the most challenging problems such as fusion energy, plasma processing of materials, and fundamental understanding of the universe through observable plasma phenomena, it is expected that DDPS continues to benefit significantly from the interdisciplinary marriage between plasma science and data science into the foreseeable future.
△ Less
Submitted 31 May, 2022;
originally announced May 2022.
-
Event Transformer
Authors:
Bin Jiang,
Zhihao Li,
M. Salman Asif,
Xun Cao,
Zhan Ma
Abstract:
The event camera's low power consumption and ability to capture microsecond brightness changes make it attractive for various computer vision tasks. Existing event representation methods typically convert events into frames, voxel grids, or spikes for deep neural networks (DNNs). However, these approaches often sacrifice temporal granularity or require specialized devices for processing. This work…
▽ More
The event camera's low power consumption and ability to capture microsecond brightness changes make it attractive for various computer vision tasks. Existing event representation methods typically convert events into frames, voxel grids, or spikes for deep neural networks (DNNs). However, these approaches often sacrifice temporal granularity or require specialized devices for processing. This work introduces a novel token-based event representation, where each event is considered a fundamental processing unit termed an event-token. This approach preserves the sequence's intricate spatiotemporal attributes at the event level. Moreover, we propose a Three-way Attention mechanism in the Event Transformer Block (ETB) to collaboratively construct temporal and spatial correlations between events. We compare our proposed token-based event representation extensively with other prevalent methods for object classification and optical flow estimation. The experimental results showcase its competitive performance while demanding minimal computational resources on standard devices. Our code is publicly accessible at \url{https://github.com/NJUVISION/EventTransformer}.
△ Less
Submitted 12 June, 2024; v1 submitted 11 April, 2022;
originally announced April 2022.
-
Zero-Query Transfer Attacks on Context-Aware Object Detectors
Authors:
Zikui Cai,
Shantanu Rane,
Alejandro E. Brito,
Chengyu Song,
Srikanth V. Krishnamurthy,
Amit K. Roy-Chowdhury,
M. Salman Asif
Abstract:
Adversarial attacks perturb images such that a deep neural network produces incorrect classification results. A promising approach to defend against adversarial attacks on natural multi-object scenes is to impose a context-consistency check, wherein, if the detected objects are not consistent with an appropriately defined context, then an attack is suspected. Stronger attacks are needed to fool su…
▽ More
Adversarial attacks perturb images such that a deep neural network produces incorrect classification results. A promising approach to defend against adversarial attacks on natural multi-object scenes is to impose a context-consistency check, wherein, if the detected objects are not consistent with an appropriately defined context, then an attack is suspected. Stronger attacks are needed to fool such context-aware detectors. We present the first approach for generating context-consistent adversarial attacks that can evade the context-consistency check of black-box object detectors operating on complex, natural scenes. Unlike many black-box attacks that perform repeated attempts and open themselves to detection, we assume a "zero-query" setting, where the attacker has no knowledge of the classification decisions of the victim system. First, we derive multiple attack plans that assign incorrect labels to victim objects in a context-consistent manner. Then we design and use a novel data structure that we call the perturbation success probability matrix, which enables us to filter the attack plans and choose the one most likely to succeed. This final attack plan is implemented using a perturbation-bounded adversarial attack algorithm. We compare our zero-query attack against a few-query scheme that repeatedly checks if the victim system is fooled. We also compare against state-of-the-art context-agnostic attacks. Against a context-aware defense, the fooling rate of our zero-query approach is significantly higher than context-agnostic approaches and higher than that achievable with up to three rounds of the few-query scheme.
△ Less
Submitted 29 March, 2022;
originally announced March 2022.
-
Provable and Efficient Continual Representation Learning
Authors:
Yingcong Li,
Mingchen Li,
M. Salman Asif,
Samet Oymak
Abstract:
In continual learning (CL), the goal is to design models that can learn a sequence of tasks without catastrophic forgetting. While there is a rich set of techniques for CL, relatively little understanding exists on how representations built by previous tasks benefit new tasks that are added to the network. To address this, we study the problem of continual representation learning (CRL) where we le…
▽ More
In continual learning (CL), the goal is to design models that can learn a sequence of tasks without catastrophic forgetting. While there is a rich set of techniques for CL, relatively little understanding exists on how representations built by previous tasks benefit new tasks that are added to the network. To address this, we study the problem of continual representation learning (CRL) where we learn an evolving representation as new tasks arrive. Focusing on zero-forgetting methods where tasks are embedded in subnetworks (e.g., PackNet), we first provide experiments demonstrating CRL can significantly boost sample efficiency when learning new tasks. To explain this, we establish theoretical guarantees for CRL by providing sample complexity and generalization error bounds for new tasks by formalizing the statistical benefits of previously-learned representations. Our analysis and experiments also highlight the importance of the order in which we learn the tasks. Specifically, we show that CL benefits if the initial tasks have large sample size and high "representation diversity". Diversity ensures that adding new tasks incurs small representation mismatch and can be learned with few samples while training only few additional nonzero weights. Finally, we ask whether one can ensure each task subnetwork to be efficient during inference time while retaining the benefits of representation learning. To this end, we propose an inference-efficient variation of PackNet called Efficient Sparse PackNet (ESPN) which employs joint channel & weight pruning. ESPN embeds tasks in channel-sparse subnets requiring up to 80% less FLOPs to compute while approximately retaining accuracy and is very competitive with a variety of baselines. In summary, this work takes a step towards data and compute-efficient CL with a representation learning perspective. GitHub page: https://github.com/ucr-optml/CtRL
△ Less
Submitted 7 November, 2022; v1 submitted 3 March, 2022;
originally announced March 2022.