-
URET: Universal Robustness Evaluation Toolkit (for Evasion)
Authors:
Kevin Eykholt,
Taesung Lee,
Douglas Schales,
Jiyong Jang,
Ian Molloy,
Masha Zorin
Abstract:
Machine learning models are known to be vulnerable to adversarial evasion attacks as illustrated by image classification models. Thoroughly understanding such attacks is critical in order to ensure the safety and robustness of critical AI tasks. However, most evasion attacks are difficult to deploy against a majority of AI systems because they have focused on image domain with only few constraints…
▽ More
Machine learning models are known to be vulnerable to adversarial evasion attacks as illustrated by image classification models. Thoroughly understanding such attacks is critical in order to ensure the safety and robustness of critical AI tasks. However, most evasion attacks are difficult to deploy against a majority of AI systems because they have focused on image domain with only few constraints. An image is composed of homogeneous, numerical, continuous, and independent features, unlike many other input types to AI systems used in practice. Furthermore, some input types include additional semantic and functional constraints that must be observed to generate realistic adversarial inputs. In this work, we propose a new framework to enable the generation of adversarial inputs irrespective of the input type and task domain. Given an input and a set of pre-defined input transformations, our framework discovers a sequence of transformations that result in a semantically correct and functional adversarial input. We demonstrate the generality of our approach on several diverse machine learning tasks with various input representations. We also show the importance of generating adversarial examples as they enable the deployment of mitigation techniques.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
Accelerating Certified Robustness Training via Knowledge Transfer
Authors:
Pratik Vaishnavi,
Kevin Eykholt,
Amir Rahmati
Abstract:
Training deep neural network classifiers that are certifiably robust against adversarial attacks is critical to ensuring the security and reliability of AI-controlled systems. Although numerous state-of-the-art certified training methods have been developed, they are computationally expensive and scale poorly with respect to both dataset and network complexity. Widespread usage of certified traini…
▽ More
Training deep neural network classifiers that are certifiably robust against adversarial attacks is critical to ensuring the security and reliability of AI-controlled systems. Although numerous state-of-the-art certified training methods have been developed, they are computationally expensive and scale poorly with respect to both dataset and network complexity. Widespread usage of certified training is further hindered by the fact that periodic retraining is necessary to incorporate new data and network improvements. In this paper, we propose Certified Robustness Transfer (CRT), a general-purpose framework for reducing the computational overhead of any certifiably robust training method through knowledge transfer. Given a robust teacher, our framework uses a novel training loss to transfer the teacher's robustness to the student. We provide theoretical and empirical validation of CRT. Our experiments on CIFAR-10 show that CRT speeds up certified robustness training by $8 \times$ on average across three different architecture generations while achieving comparable robustness to state-of-the-art methods. We also show that CRT can scale to large-scale datasets like ImageNet.
△ Less
Submitted 25 October, 2022;
originally announced October 2022.
-
Ares: A System-Oriented Wargame Framework for Adversarial ML
Authors:
Farhan Ahmed,
Pratik Vaishnavi,
Kevin Eykholt,
Amir Rahmati
Abstract:
Since the discovery of adversarial attacks against machine learning models nearly a decade ago, research on adversarial machine learning has rapidly evolved into an eternal war between defenders, who seek to increase the robustness of ML models against adversarial attacks, and adversaries, who seek to develop better attacks capable of weakening or defeating these defenses. This domain, however, ha…
▽ More
Since the discovery of adversarial attacks against machine learning models nearly a decade ago, research on adversarial machine learning has rapidly evolved into an eternal war between defenders, who seek to increase the robustness of ML models against adversarial attacks, and adversaries, who seek to develop better attacks capable of weakening or defeating these defenses. This domain, however, has found little buy-in from ML practitioners, who are neither overtly concerned about these attacks affecting their systems in the real world nor are willing to trade off the accuracy of their models in pursuit of robustness against these attacks.
In this paper, we motivate the design and implementation of Ares, an evaluation framework for adversarial ML that allows researchers to explore attacks and defenses in a realistic wargame-like environment. Ares frames the conflict between the attacker and defender as two agents in a reinforcement learning environment with opposing objectives. This allows the introduction of system-level evaluation metrics such as time to failure and evaluation of complex strategies such as moving target defenses. We provide the results of our initial exploration involving a white-box attacker against an adversarially trained defender.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
Transferring Adversarial Robustness Through Robust Representation Matching
Authors:
Pratik Vaishnavi,
Kevin Eykholt,
Amir Rahmati
Abstract:
With the widespread use of machine learning, concerns over its security and reliability have become prevalent. As such, many have developed defenses to harden neural networks against adversarial examples, imperceptibly perturbed inputs that are reliably misclassified. Adversarial training in which adversarial examples are generated and used during training is one of the few known defenses able to…
▽ More
With the widespread use of machine learning, concerns over its security and reliability have become prevalent. As such, many have developed defenses to harden neural networks against adversarial examples, imperceptibly perturbed inputs that are reliably misclassified. Adversarial training in which adversarial examples are generated and used during training is one of the few known defenses able to reliably withstand such attacks against neural networks. However, adversarial training imposes a significant training overhead and scales poorly with model complexity and input dimension. In this paper, we propose Robust Representation Matching (RRM), a low-cost method to transfer the robustness of an adversarially trained model to a new model being trained for the same task irrespective of architectural differences. Inspired by student-teacher learning, our method introduces a novel training loss that encourages the student to learn the teacher's robust representations. Compared to prior works, RRM is superior with respect to both model performance and adversarial training time. On CIFAR-10, RRM trains a robust model $\sim 1.8\times$ faster than the state-of-the-art. Furthermore, RRM remains effective on higher-dimensional datasets. On Restricted-ImageNet, RRM trains a ResNet50 model $\sim 18\times$ faster than standard adversarial training.
△ Less
Submitted 5 May, 2022; v1 submitted 21 February, 2022;
originally announced February 2022.
-
Separation of Powers in Federated Learning
Authors:
Pau-Chen Cheng,
Kevin Eykholt,
Zhongshu Gu,
Hani Jamjoom,
K. R. Jayaram,
Enriquillo Valdez,
Ashish Verma
Abstract:
Federated Learning (FL) enables collaborative training among mutually distrusting parties. Model updates, rather than training data, are concentrated and fused in a central aggregation server. A key security challenge in FL is that an untrustworthy or compromised aggregation process might lead to unforeseeable information leakage. This challenge is especially acute due to recently demonstrated att…
▽ More
Federated Learning (FL) enables collaborative training among mutually distrusting parties. Model updates, rather than training data, are concentrated and fused in a central aggregation server. A key security challenge in FL is that an untrustworthy or compromised aggregation process might lead to unforeseeable information leakage. This challenge is especially acute due to recently demonstrated attacks that have reconstructed large fractions of training data from ostensibly "sanitized" model updates.
In this paper, we introduce TRUDA, a new cross-silo FL system, employing a trustworthy and decentralized aggregation architecture to break down information concentration with regard to a single aggregator. Based on the unique computational properties of model-fusion algorithms, all exchanged model updates in TRUDA are disassembled at the parameter-granularity and re-stitched to random partitions designated for multiple TEE-protected aggregators. Thus, each aggregator only has a fragmentary and shuffled view of model updates and is oblivious to the model architecture. Our new security mechanisms can fundamentally mitigate training reconstruction attacks, while still preserving the final accuracy of trained models and kee** performance overheads low.
△ Less
Submitted 19 May, 2021;
originally announced May 2021.
-
Adaptive Verifiable Training Using Pairwise Class Similarity
Authors:
Shiqi Wang,
Kevin Eykholt,
Taesung Lee,
Jiyong Jang,
Ian Molloy
Abstract:
Verifiable training has shown success in creating neural networks that are provably robust to a given amount of noise. However, despite only enforcing a single robustness criterion, its performance scales poorly with dataset complexity. On CIFAR10, a non-robust LeNet model has a 21.63% error rate, while a model created using verifiable training and a L-infinity robustness criterion of 8/255, has a…
▽ More
Verifiable training has shown success in creating neural networks that are provably robust to a given amount of noise. However, despite only enforcing a single robustness criterion, its performance scales poorly with dataset complexity. On CIFAR10, a non-robust LeNet model has a 21.63% error rate, while a model created using verifiable training and a L-infinity robustness criterion of 8/255, has an error rate of 57.10%. Upon examination, we find that when labeling visually similar classes, the model's error rate is as high as 61.65%. We attribute the loss in performance to inter-class similarity. Similar classes (i.e., close in the feature space) increase the difficulty of learning a robust model. While it's desirable to train a robust model for a large robustness region, pairwise class similarities limit the potential gains. Also, consideration must be made regarding the relative cost of mistaking similar classes. In security or safety critical tasks, similar classes are likely to belong to the same group, and thus are equally sensitive.
In this work, we propose a new approach that utilizes inter-class similarity to improve the performance of verifiable training and create robust models with respect to multiple adversarial criteria. First, we use agglomerate clustering to group similar classes and assign robustness criteria based on the similarity between clusters. Next, we propose two methods to apply our approach: (1) Inter-Group Robustness Prioritization, which uses a custom loss term to create a single model with multiple robustness guarantees and (2) neural decision trees, which trains multiple sub-classifiers with different robustness guarantees and combines them in a decision tree architecture. On Fashion-MNIST and CIFAR10, our approach improves clean performance by 9.63% and 30.89% respectively. On CIFAR100, our approach improves clean performance by 26.32%.
△ Less
Submitted 14 December, 2020;
originally announced December 2020.
-
Can Attention Masks Improve Adversarial Robustness?
Authors:
Pratik Vaishnavi,
Tianji Cong,
Kevin Eykholt,
Atul Prakash,
Amir Rahmati
Abstract:
Deep Neural Networks (DNNs) are known to be susceptible to adversarial examples. Adversarial examples are maliciously crafted inputs that are designed to fool a model, but appear normal to human beings. Recent work has shown that pixel discretization can be used to make classifiers for MNIST highly robust to adversarial examples. However, pixel discretization fails to provide significant protectio…
▽ More
Deep Neural Networks (DNNs) are known to be susceptible to adversarial examples. Adversarial examples are maliciously crafted inputs that are designed to fool a model, but appear normal to human beings. Recent work has shown that pixel discretization can be used to make classifiers for MNIST highly robust to adversarial examples. However, pixel discretization fails to provide significant protection on more complex datasets. In this paper, we take the first step towards reconciling these contrary findings. Focusing on the observation that discrete pixelization in MNIST makes the background completely black and foreground completely white, we hypothesize that the important property for increasing robustness is the elimination of image background using attention masks before classifying an object. To examine this hypothesis, we create foreground attention masks for two different datasets, GTSRB and MS-COCO. Our initial results suggest that using attention mask leads to improved robustness. On the adversarially trained classifiers, we see an adversarial robustness increase of over 20% on MS-COCO.
△ Less
Submitted 21 December, 2019; v1 submitted 26 November, 2019;
originally announced November 2019.
-
Towards Model-Agnostic Adversarial Defenses using Adversarially Trained Autoencoders
Authors:
Pratik Vaishnavi,
Kevin Eykholt,
Atul Prakash,
Amir Rahmati
Abstract:
Adversarial machine learning is a well-studied field of research where an adversary causes predictable errors in a machine learning algorithm through precise manipulation of the input. Numerous techniques have been proposed to harden machine learning algorithms and mitigate the effect of adversarial attacks. Of these techniques, adversarial training, which augments the training data with adversari…
▽ More
Adversarial machine learning is a well-studied field of research where an adversary causes predictable errors in a machine learning algorithm through precise manipulation of the input. Numerous techniques have been proposed to harden machine learning algorithms and mitigate the effect of adversarial attacks. Of these techniques, adversarial training, which augments the training data with adversarial samples, has proven to be an effective defense with respect to a certain class of attacks. However, adversarial training is computationally expensive and its improvements are limited to a single model. In this work, we take a first step toward creating a model-agnostic adversarial defense. We propose Adversarially-Trained Autoencoder Augmentation (AAA), the first model-agnostic adversarial defense that is robust against certain adaptive adversaries. We show that AAA allows us to achieve a partially model-agnostic defense by training a single autoencoder to protect multiple pre-trained classifiers; achieving adversarial performance on par or better than adversarial training without modifying the classifiers. Furthermore, we demonstrate that AAA can be used to create a fully model-agnostic defense for MNIST and Fashion MNIST datasets by improving the adversarial performance of a never before seen pre-trained classifier by at least 45% with no additional training. Finally, using a natural image corruption dataset, we show that our approach improves robustness to naturally corrupted images,which has been identified as strongly indicative of true adversarial robustness.
△ Less
Submitted 29 March, 2020; v1 submitted 12 September, 2019;
originally announced September 2019.
-
Robust Classification using Robust Feature Augmentation
Authors:
Kevin Eykholt,
Swati Gupta,
Atul Prakash,
Amir Rahmati,
Pratik Vaishnavi,
Haizhong Zheng
Abstract:
Existing deep neural networks, say for image classification, have been shown to be vulnerable to adversarial images that can cause a DNN misclassification, without any perceptible change to an image. In this work, we propose shock absorbing robust features such as binarization, e.g., rounding, and group extraction, e.g., color or shape, to augment the classification pipeline, resulting in more rob…
▽ More
Existing deep neural networks, say for image classification, have been shown to be vulnerable to adversarial images that can cause a DNN misclassification, without any perceptible change to an image. In this work, we propose shock absorbing robust features such as binarization, e.g., rounding, and group extraction, e.g., color or shape, to augment the classification pipeline, resulting in more robust classifiers. Experimentally, we show that augmenting ML models with these techniques leads to improved overall robustness on adversarial inputs as well as significant improvements in training time. On the MNIST dataset, we achieved 14x speedup in training time to obtain 90% adversarial accuracy com-pared to the state-of-the-art adversarial training method of Madry et al., as well as retained higher adversarial accuracy over a broader range of attacks. We also find robustness improvements on traffic sign classification using robust feature augmentation. Finally, we give theoretical insights for why one can expect robust feature augmentation to reduce adversarial input space
△ Less
Submitted 17 September, 2019; v1 submitted 26 May, 2019;
originally announced May 2019.
-
Designing Adversarially Resilient Classifiers using Resilient Feature Engineering
Authors:
Kevin Eykholt,
Atul Prakash
Abstract:
We provide a methodology, resilient feature engineering, for creating adversarially resilient classifiers. According to existing work, adversarial attacks identify weakly correlated or non-predictive features learned by the classifier during training and design the adversarial noise to utilize these features. Therefore, highly predictive features should be used first during classification in order…
▽ More
We provide a methodology, resilient feature engineering, for creating adversarially resilient classifiers. According to existing work, adversarial attacks identify weakly correlated or non-predictive features learned by the classifier during training and design the adversarial noise to utilize these features. Therefore, highly predictive features should be used first during classification in order to determine the set of possible output labels. Our methodology focuses the problem of designing resilient classifiers into a problem of designing resilient feature extractors for these highly predictive features. We provide two theorems, which support our methodology. The Serial Composition Resilience and Parallel Composition Resilience theorems show that the output of adversarially resilient feature extractors can be combined to create an equally resilient classifier. Based on our theoretical results, we outline the design of an adversarially resilient classifier.
△ Less
Submitted 17 December, 2018;
originally announced December 2018.
-
Physical Adversarial Examples for Object Detectors
Authors:
Kevin Eykholt,
Ivan Evtimov,
Earlence Fernandes,
Bo Li,
Amir Rahmati,
Florian Tramer,
Atul Prakash,
Tadayoshi Kohno,
Dawn Song
Abstract:
Deep neural networks (DNNs) are vulnerable to adversarial examples-maliciously crafted inputs that cause DNNs to make incorrect predictions. Recent work has shown that these attacks generalize to the physical domain, to create perturbations on physical objects that fool image classifiers under a variety of real-world conditions. Such attacks pose a risk to deep learning models used in safety-criti…
▽ More
Deep neural networks (DNNs) are vulnerable to adversarial examples-maliciously crafted inputs that cause DNNs to make incorrect predictions. Recent work has shown that these attacks generalize to the physical domain, to create perturbations on physical objects that fool image classifiers under a variety of real-world conditions. Such attacks pose a risk to deep learning models used in safety-critical cyber-physical systems. In this work, we extend physical attacks to more challenging object detection models, a broader class of deep learning algorithms widely used to detect and label multiple objects within a scene. Improving upon a previous physical attack on image classifiers, we create perturbed physical objects that are either ignored or mislabeled by object detection models. We implement a Disappearance Attack, in which we cause a Stop sign to "disappear" according to the detector-either by covering thesign with an adversarial Stop sign poster, or by adding adversarial stickers onto the sign. In a video recorded in a controlled lab environment, the state-of-the-art YOLOv2 detector failed to recognize these adversarial Stop signs in over 85% of the video frames. In an outdoor experiment, YOLO was fooled by the poster and sticker attacks in 72.5% and 63.5% of the video frames respectively. We also use Faster R-CNN, a different object detection model, to demonstrate the transferability of our adversarial perturbations. The created poster perturbation is able to fool Faster R-CNN in 85.9% of the video frames in a controlled lab environment, and 40.2% of the video frames in an outdoor environment. Finally, we present preliminary results with a new Creation Attack, where in innocuous physical stickers fool a model into detecting nonexistent objects.
△ Less
Submitted 5 October, 2018; v1 submitted 20 July, 2018;
originally announced July 2018.
-
Tyche: Risk-Based Permissions for Smart Home Platforms
Authors:
Amir Rahmati,
Earlence Fernandes,
Kevin Eykholt,
Atul Prakash
Abstract:
Emerging smart home platforms, which interface with a variety of physical devices and support third-party application development, currently use permission models inspired by smartphone operating systems-they group functionally similar device operations into separate units, and require users to grant apps access to devices at that granularity. Unfortunately, this leads to two issues: (1) apps that…
▽ More
Emerging smart home platforms, which interface with a variety of physical devices and support third-party application development, currently use permission models inspired by smartphone operating systems-they group functionally similar device operations into separate units, and require users to grant apps access to devices at that granularity. Unfortunately, this leads to two issues: (1) apps that do not require access to all of the granted device operations have overprivileged access to them, (2) apps might pose a higher risk to users than needed because physical device operations are fundamentally risk-asymmetric-"door.unlock" provides access to burglars, and "door.lock" can potentially lead to getting locked out. Overprivileged apps with access to mixed-risk operations only increase the potential for damage. We present Tyche, a system that leverages the risk-asymmetry in physical device operations to limit the risk that apps pose to smart home users, without increasing the user's decision overhead. Tyche introduces the notion of risk-based permissions. When using risk-based permissions, device operations are grouped into units of similar risk, and users grant apps access to devices at that risk-based granularity. Starting from a set of permissions derived from the popular Samsung SmartThings platform, we conduct a user study involving domain-experts and Mechanical Turk users to compute a relative ranking of risks associated with device operations. We find that user assessment of risk closely matches that of domain experts. Using this ranking, we define risk-based grou**s of device operations, and apply it to existing SmartThings apps, showing that risk-based permissions indeed limit risk if apps are malicious or exploitable.
△ Less
Submitted 3 December, 2018; v1 submitted 14 January, 2018;
originally announced January 2018.
-
Note on Attacking Object Detectors with Adversarial Stickers
Authors:
Kevin Eykholt,
Ivan Evtimov,
Earlence Fernandes,
Bo Li,
Dawn Song,
Tadayoshi Kohno,
Amir Rahmati,
Atul Prakash,
Florian Tramer
Abstract:
Deep learning has proven to be a powerful tool for computer vision and has seen widespread adoption for numerous tasks. However, deep learning algorithms are known to be vulnerable to adversarial examples. These adversarial inputs are created such that, when provided to a deep learning algorithm, they are very likely to be mislabeled. This can be problematic when deep learning is used to assist in…
▽ More
Deep learning has proven to be a powerful tool for computer vision and has seen widespread adoption for numerous tasks. However, deep learning algorithms are known to be vulnerable to adversarial examples. These adversarial inputs are created such that, when provided to a deep learning algorithm, they are very likely to be mislabeled. This can be problematic when deep learning is used to assist in safety critical decisions. Recent research has shown that classifiers can be attacked by physical adversarial examples under various physical conditions. Given the fact that state-of-the-art objection detection algorithms are harder to be fooled by the same set of adversarial examples, here we show that these detectors can also be attacked by physical adversarial examples. In this note, we briefly show both static and dynamic test results. We design an algorithm that produces physical adversarial inputs, which can fool the YOLO object detector and can also attack Faster-RCNN with relatively high success rate based on transferability. Furthermore, our algorithm can compress the size of the adversarial inputs to stickers that, when attached to the targeted object, result in the detector either mislabeling or not detecting the object a high percentage of the time. This note provides a small set of results. Our upcoming paper will contain a thorough evaluation on other object detectors, and will present the algorithm.
△ Less
Submitted 23 July, 2018; v1 submitted 21 December, 2017;
originally announced December 2017.
-
Robust Physical-World Attacks on Deep Learning Models
Authors:
Kevin Eykholt,
Ivan Evtimov,
Earlence Fernandes,
Bo Li,
Amir Rahmati,
Chaowei Xiao,
Atul Prakash,
Tadayoshi Kohno,
Dawn Song
Abstract:
Recent studies show that the state-of-the-art deep neural networks (DNNs) are vulnerable to adversarial examples, resulting from small-magnitude perturbations added to the input. Given that that emerging physical systems are using DNNs in safety-critical situations, adversarial examples could mislead these systems and cause dangerous situations.Therefore, understanding adversarial examples in the…
▽ More
Recent studies show that the state-of-the-art deep neural networks (DNNs) are vulnerable to adversarial examples, resulting from small-magnitude perturbations added to the input. Given that that emerging physical systems are using DNNs in safety-critical situations, adversarial examples could mislead these systems and cause dangerous situations.Therefore, understanding adversarial examples in the physical world is an important step towards develo** resilient learning algorithms. We propose a general attack algorithm,Robust Physical Perturbations (RP2), to generate robust visual adversarial perturbations under different physical conditions. Using the real-world case of road sign classification, we show that adversarial examples generated using RP2 achieve high targeted misclassification rates against standard-architecture road sign classifiers in the physical world under various environmental conditions, including viewpoints. Due to the current lack of a standardized testing method, we propose a two-stage evaluation methodology for robust physical adversarial examples consisting of lab and field tests. Using this methodology, we evaluate the efficacy of physical adversarial manipulations on real objects. Witha perturbation in the form of only black and white stickers,we attack a real stop sign, causing targeted misclassification in 100% of the images obtained in lab settings, and in 84.8%of the captured video frames obtained on a moving vehicle(field test) for the target classifier.
△ Less
Submitted 10 April, 2018; v1 submitted 27 July, 2017;
originally announced July 2017.
-
Internet of Things Security Research: A Rehash of Old Ideas or New Intellectual Challenges?
Authors:
Earlence Fernandes,
Amir Rahmati,
Kevin Eykholt,
Atul Prakash
Abstract:
The Internet of Things (IoT) is a new computing paradigm that spans wearable devices, homes, hospitals, cities, transportation, and critical infrastructure. Building security into this new computing paradigm is a major technical challenge today. However, what are the security problems in IoT that we can solve using existing security principles? And, what are the new problems and challenges in this…
▽ More
The Internet of Things (IoT) is a new computing paradigm that spans wearable devices, homes, hospitals, cities, transportation, and critical infrastructure. Building security into this new computing paradigm is a major technical challenge today. However, what are the security problems in IoT that we can solve using existing security principles? And, what are the new problems and challenges in this space that require new security mechanisms? This article summarizes the intellectual similarities and differences between classic information technology security research and IoT security research.
△ Less
Submitted 18 July, 2017; v1 submitted 23 May, 2017;
originally announced May 2017.