-
Investigating the Semantic Robustness of CLIP-based Zero-Shot Anomaly Segmentation
Authors:
Kevin Stangl,
Marius Arvinte,
Weilin Xu,
Cory Cornelius
Abstract:
Zero-shot anomaly segmentation using pre-trained foundation models is a promising approach that enables effective algorithms without expensive, domain-specific training or fine-tuning. Ensuring that these methods work across various environmental conditions and are robust to distribution shifts is an open problem. We investigate the performance of WinCLIP [14] zero-shot anomaly segmentation algori…
▽ More
Zero-shot anomaly segmentation using pre-trained foundation models is a promising approach that enables effective algorithms without expensive, domain-specific training or fine-tuning. Ensuring that these methods work across various environmental conditions and are robust to distribution shifts is an open problem. We investigate the performance of WinCLIP [14] zero-shot anomaly segmentation algorithm by perturbing test data using three semantic transformations: bounded angular rotations, bounded saturation shifts, and hue shifts. We empirically measure a lower performance bound by aggregating across per-sample worst-case perturbations and find that average performance drops by up to 20% in area under the ROC curve and 40% in area under the per-region overlap curve. We find that performance is consistently lowered on three CLIP backbones, regardless of model architecture or learning objective, demonstrating a need for careful performance evaluation.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Investigating the Adversarial Robustness of Density Estimation Using the Probability Flow ODE
Authors:
Marius Arvinte,
Cory Cornelius,
Jason Martin,
Nageen Himayat
Abstract:
Beyond their impressive sampling capabilities, score-based diffusion models offer a powerful analysis tool in the form of unbiased density estimation of a query sample under the training data distribution. In this work, we investigate the robustness of density estimation using the probability flow (PF) neural ordinary differential equation (ODE) model against gradient-based likelihood maximization…
▽ More
Beyond their impressive sampling capabilities, score-based diffusion models offer a powerful analysis tool in the form of unbiased density estimation of a query sample under the training data distribution. In this work, we investigate the robustness of density estimation using the probability flow (PF) neural ordinary differential equation (ODE) model against gradient-based likelihood maximization attacks and the relation to sample complexity, where the compressed size of a sample is used as a measure of its complexity. We introduce and evaluate six gradient-based log-likelihood maximization attacks, including a novel reverse integration attack. Our experimental evaluations on CIFAR-10 show that density estimation using the PF ODE is robust against high-complexity, high-likelihood attacks, and that in some cases adversarial samples are semantically meaningful, as expected from a robust estimator.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Robust Principles: Architectural Design Principles for Adversarially Robust CNNs
Authors:
ShengYun Peng,
Weilin Xu,
Cory Cornelius,
Matthew Hull,
Kevin Li,
Rahul Duggal,
Mansi Phute,
Jason Martin,
Duen Horng Chau
Abstract:
Our research aims to unify existing works' diverging opinions on how architectural components affect the adversarial robustness of CNNs. To accomplish our goal, we synthesize a suite of three generalizable robust architectural design principles: (a) optimal range for depth and width configurations, (b) preferring convolutional over patchify stem stage, and (c) robust residual block design through…
▽ More
Our research aims to unify existing works' diverging opinions on how architectural components affect the adversarial robustness of CNNs. To accomplish our goal, we synthesize a suite of three generalizable robust architectural design principles: (a) optimal range for depth and width configurations, (b) preferring convolutional over patchify stem stage, and (c) robust residual block design through adopting squeeze and excitation blocks and non-parametric smooth activation functions. Through extensive experiments across a wide spectrum of dataset scales, adversarial training methods, model parameters, and network design spaces, our principles consistently and markedly improve AutoAttack accuracy: 1-3 percentage points (pp) on CIFAR-10 and CIFAR-100, and 4-9 pp on ImageNet. The code is publicly available at https://github.com/poloclub/robust-principles.
△ Less
Submitted 31 August, 2023; v1 submitted 30 August, 2023;
originally announced August 2023.
-
LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked
Authors:
Mansi Phute,
Alec Helbling,
Matthew Hull,
ShengYun Peng,
Sebastian Szyller,
Cory Cornelius,
Duen Horng Chau
Abstract:
Large language models (LLMs) are popular for high-quality text generation but can produce harmful content, even when aligned with human values through reinforcement learning. Adversarial prompts can bypass their safety measures. We propose LLM Self Defense, a simple approach to defend against these attacks by having an LLM screen the induced responses. Our method does not require any fine-tuning,…
▽ More
Large language models (LLMs) are popular for high-quality text generation but can produce harmful content, even when aligned with human values through reinforcement learning. Adversarial prompts can bypass their safety measures. We propose LLM Self Defense, a simple approach to defend against these attacks by having an LLM screen the induced responses. Our method does not require any fine-tuning, input preprocessing, or iterative output generation. Instead, we incorporate the generated content into a pre-defined prompt and employ another instance of an LLM to analyze the text and predict whether it is harmful. We test LLM Self Defense on GPT 3.5 and Llama 2, two of the current most prominent LLMs against various types of attacks, such as forcefully inducing affirmative responses to prompts and prompt engineering attacks. Notably, LLM Self Defense succeeds in reducing the attack success rate to virtually 0 using both GPT 3.5 and Llama 2. The code is publicly available at https://github.com/poloclub/llm-self-defense
△ Less
Submitted 2 May, 2024; v1 submitted 14 August, 2023;
originally announced August 2023.
-
RobArch: Designing Robust Architectures against Adversarial Attacks
Authors:
ShengYun Peng,
Weilin Xu,
Cory Cornelius,
Kevin Li,
Rahul Duggal,
Duen Horng Chau,
Jason Martin
Abstract:
Adversarial Training is the most effective approach for improving the robustness of Deep Neural Networks (DNNs). However, compared to the large body of research in optimizing the adversarial training process, there are few investigations into how architecture components affect robustness, and they rarely constrain model capacity. Thus, it is unclear where robustness precisely comes from. In this w…
▽ More
Adversarial Training is the most effective approach for improving the robustness of Deep Neural Networks (DNNs). However, compared to the large body of research in optimizing the adversarial training process, there are few investigations into how architecture components affect robustness, and they rarely constrain model capacity. Thus, it is unclear where robustness precisely comes from. In this work, we present the first large-scale systematic study on the robustness of DNN architecture components under fixed parameter budgets. Through our investigation, we distill 18 actionable robust network design guidelines that empower model developers to gain deep insights. We demonstrate these guidelines' effectiveness by introducing the novel Robust Architecture (RobArch) model that instantiates the guidelines to build a family of top-performing models across parameter capacities against strong adversarial attacks. RobArch achieves the new state-of-the-art AutoAttack accuracy on the RobustBench ImageNet leaderboard. The code is available at $\href{https://github.com/ShengYun-Peng/RobArch}{\text{this url}}$.
△ Less
Submitted 8 January, 2023;
originally announced January 2023.
-
Membership-Doctor: Comprehensive Assessment of Membership Inference Against Machine Learning Models
Authors:
Xinlei He,
Zheng Li,
Weilin Xu,
Cory Cornelius,
Yang Zhang
Abstract:
Machine learning models are prone to memorizing sensitive data, making them vulnerable to membership inference attacks in which an adversary aims to infer whether an input sample was used to train the model. Over the past few years, researchers have produced many membership inference attacks and defenses. However, these attacks and defenses employ a variety of strategies and are conducted in diffe…
▽ More
Machine learning models are prone to memorizing sensitive data, making them vulnerable to membership inference attacks in which an adversary aims to infer whether an input sample was used to train the model. Over the past few years, researchers have produced many membership inference attacks and defenses. However, these attacks and defenses employ a variety of strategies and are conducted in different models and datasets. The lack of comprehensive benchmark, however, means we do not understand the strengths and weaknesses of existing attacks and defenses.
We fill this gap by presenting a large-scale measurement of different membership inference attacks and defenses. We systematize membership inference through the study of nine attacks and six defenses and measure the performance of different attacks and defenses in the holistic evaluation. We then quantify the impact of the threat model on the results of these attacks. We find that some assumptions of the threat model, such as same-architecture and same-distribution between shadow and target models, are unnecessary. We are also the first to execute attacks on the real-world data collected from the Internet, instead of laboratory datasets. We further investigate what determines the performance of membership inference attacks and reveal that the commonly believed overfitting level is not sufficient for the success of the attacks. Instead, the Jensen-Shannon distance of entropy/cross-entropy between member and non-member samples correlates with attack performance much better. This gives us a new way to accurately predict membership inference risks without running the attack. Finally, we find that data augmentation degrades the performance of existing attacks to a larger extent, and we propose an adaptive attack using augmentation to train shadow and attack models that improve attack performance.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
Synthetic Dataset Generation for Adversarial Machine Learning Research
Authors:
Xiruo Liu,
Shibani Singh,
Cory Cornelius,
Colin Busho,
Mike Tan,
Anindya Paul,
Jason Martin
Abstract:
Existing adversarial example research focuses on digitally inserted perturbations on top of existing natural image datasets. This construction of adversarial examples is not realistic because it may be difficult, or even impossible, for an attacker to deploy such an attack in the real-world due to sensing and environmental effects. To better understand adversarial examples against cyber-physical s…
▽ More
Existing adversarial example research focuses on digitally inserted perturbations on top of existing natural image datasets. This construction of adversarial examples is not realistic because it may be difficult, or even impossible, for an attacker to deploy such an attack in the real-world due to sensing and environmental effects. To better understand adversarial examples against cyber-physical systems, we propose approximating the real-world through simulation. In this paper we describe our synthetic dataset generation tool that enables scalable collection of such a synthetic dataset with realistic adversarial examples. We use the CARLA simulator to collect such a dataset and demonstrate simulated attacks that undergo the same environmental transforms and processing as real-world images. Our tools have been used to collect datasets to help evaluate the efficacy of adversarial examples, and can be found at https://github.com/carla-simulator/carla/pull/4992.
△ Less
Submitted 21 July, 2022;
originally announced July 2022.
-
Toward Few-step Adversarial Training from a Frequency Perspective
Authors:
Hans Shih-Han Wang,
Cory Cornelius,
Brandon Edwards,
Jason Martin
Abstract:
We investigate adversarial-sample generation methods from a frequency domain perspective and extend standard $l_{\infty}$ Projected Gradient Descent (PGD) to the frequency domain. The resulting method, which we call Spectral Projected Gradient Descent (SPGD), has better success rate compared to PGD during early steps of the method. Adversarially training models using SPGD achieves greater adversar…
▽ More
We investigate adversarial-sample generation methods from a frequency domain perspective and extend standard $l_{\infty}$ Projected Gradient Descent (PGD) to the frequency domain. The resulting method, which we call Spectral Projected Gradient Descent (SPGD), has better success rate compared to PGD during early steps of the method. Adversarially training models using SPGD achieves greater adversarial accuracy compared to PGD when holding the number of attack steps constant. The use of SPGD can, therefore, reduce the overhead of adversarial training when utilizing adversarial generation with a smaller number of steps. However, we also prove that SPGD is equivalent to a variant of the PGD ordinarily used for the $l_{\infty}$ threat model. This PGD variant omits the sign function which is ordinarily applied to the gradient. SPGD can, therefore, be performed without explicitly transforming into the frequency domain. Finally, we visualize the perturbations SPGD generates and find they use both high and low-frequency components, which suggests that removing either high-frequency components or low-frequency components is not an effective defense.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.
-
Talk Proposal: Towards the Realistic Evaluation of Evasion Attacks using CARLA
Authors:
Cory Cornelius,
Shang-Tse Chen,
Jason Martin,
Duen Horng Chau
Abstract:
In this talk we describe our content-preserving attack on object detectors, ShapeShifter, and demonstrate how to evaluate this threat in realistic scenarios. We describe how we use CARLA, a realistic urban driving simulator, to create these scenarios, and how we use ShapeShifter to generate content-preserving attacks against those scenarios.
In this talk we describe our content-preserving attack on object detectors, ShapeShifter, and demonstrate how to evaluate this threat in realistic scenarios. We describe how we use CARLA, a realistic urban driving simulator, to create these scenarios, and how we use ShapeShifter to generate content-preserving attacks against those scenarios.
△ Less
Submitted 18 April, 2019;
originally announced April 2019.
-
The Efficacy of SHIELD under Different Threat Models
Authors:
Cory Cornelius,
Nilaksh Das,
Shang-Tse Chen,
Li Chen,
Michael E. Kounavis,
Duen Horng Chau
Abstract:
In this appraisal paper, we evaluate the efficacy of SHIELD, a compression-based defense framework for countering adversarial attacks on image classification models, which was published at KDD 2018. Here, we consider alternative threat models not studied in the original work, where we assume that an adaptive adversary is aware of the ensemble defense approach, the defensive pre-processing, and the…
▽ More
In this appraisal paper, we evaluate the efficacy of SHIELD, a compression-based defense framework for countering adversarial attacks on image classification models, which was published at KDD 2018. Here, we consider alternative threat models not studied in the original work, where we assume that an adaptive adversary is aware of the ensemble defense approach, the defensive pre-processing, and the architecture and weights of the models used in the ensemble. We define scenarios with varying levels of threat and empirically analyze the proposed defense by varying the degree of information available to the attacker, spanning from a full white-box attack to the gray-box threat model described in the original work. To evaluate the robustness of the defense against an adaptive attacker, we consider the targeted-attack success rate of the Projected Gradient Descent (PGD) attack, which is a strong gradient-based adversarial attack proposed in adversarial machine learning research. We also experiment with training the SHIELD ensemble from scratch, which is different from re-training using a pre-trained model as done in the original work. We find that the targeted PGD attack has a success rate of 64.3% against the original SHIELD ensemble in the full white box scenario, but this drops to 48.9% if the models used in the ensemble are trained from scratch instead of being retrained. Our experiments further reveal that an ensemble whose models are re-trained indeed have higher correlation in the cosine similarity space, and models that are trained from scratch are less vulnerable to targeted attacks in the white-box and gray-box scenarios.
△ Less
Submitted 2 August, 2019; v1 submitted 1 February, 2019;
originally announced February 2019.
-
ShapeShifter: Robust Physical Adversarial Attack on Faster R-CNN Object Detector
Authors:
Shang-Tse Chen,
Cory Cornelius,
Jason Martin,
Duen Horng Chau
Abstract:
Given the ability to directly manipulate image pixels in the digital input space, an adversary can easily generate imperceptible perturbations to fool a Deep Neural Network (DNN) image classifier, as demonstrated in prior work. In this work, we propose ShapeShifter, an attack that tackles the more challenging problem of crafting physical adversarial perturbations to fool image-based object detecto…
▽ More
Given the ability to directly manipulate image pixels in the digital input space, an adversary can easily generate imperceptible perturbations to fool a Deep Neural Network (DNN) image classifier, as demonstrated in prior work. In this work, we propose ShapeShifter, an attack that tackles the more challenging problem of crafting physical adversarial perturbations to fool image-based object detectors like Faster R-CNN. Attacking an object detector is more difficult than attacking an image classifier, as it needs to mislead the classification results in multiple bounding boxes with different scales. Extending the digital attack to the physical world adds another layer of difficulty, because it requires the perturbation to be robust enough to survive real-world distortions due to different viewing distances and angles, lighting conditions, and camera limitations. We show that the Expectation over Transformation technique, which was originally proposed to enhance the robustness of adversarial perturbations in image classification, can be successfully adapted to the object detection setting. ShapeShifter can generate adversarially perturbed stop signs that are consistently mis-detected by Faster R-CNN as other objects, posing a potential threat to autonomous vehicles and other safety-critical computer vision systems.
△ Less
Submitted 30 April, 2019; v1 submitted 16 April, 2018;
originally announced April 2018.