Skip to main content

Showing 1–50 of 71 results for author: Mądry, A

.
  1. arXiv:2406.16846  [pdf, other

    cs.LG cs.CY stat.ML

    Data Debiasing with Datamodels (D3M): Improving Subgroup Robustness via Data Selection

    Authors: Saachi Jain, Kimia Hamidieh, Kristian Georgiev, Andrew Ilyas, Marzyeh Ghassemi, Aleksander Madry

    Abstract: Machine learning models can fail on subgroups that are underrepresented during training. While techniques such as dataset balancing can improve performance on underperforming groups, they require access to training group annotations and can end up removing large portions of the dataset. In this paper, we introduce Data Debiasing with Datamodels (D3M), a debiasing approach which isolates and remove… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2405.05596  [pdf, other

    cs.CY cs.HC cs.IR cs.LG stat.ME

    Measuring Strategization in Recommendation: Users Adapt Their Behavior to Shape Future Content

    Authors: Sarah H. Cen, Andrew Ilyas, Jennifer Allen, Hannah Li, Aleksander Madry

    Abstract: Most modern recommendation algorithms are data-driven: they generate personalized recommendations by observing users' past behaviors. A common assumption in recommendation is that how a user interacts with a piece of content (e.g., whether they choose to "like" it) is a reflection of the content, but not of the algorithm that generated it. Although this assumption is convenient, it fails to captur… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  3. arXiv:2404.11534  [pdf, other

    cs.LG cs.AI stat.ML

    Decomposing and Editing Predictions by Modeling Model Computation

    Authors: Harshay Shah, Andrew Ilyas, Aleksander Madry

    Abstract: How does the internal computation of a machine learning model transform inputs into predictions? In this paper, we introduce a task called component modeling that aims to address this question. The goal of component modeling is to decompose an ML model's prediction in terms of its components -- simple functions (e.g., convolution filters, attention heads) that are the "building blocks" of model co… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  4. arXiv:2403.00194  [pdf, other

    cs.LG

    Ask Your Distribution Shift if Pre-Training is Right for You

    Authors: Benjamin Cohen-Wang, Joshua Vendrow, Aleksander Madry

    Abstract: Pre-training is a widely used approach to develop models that are robust to distribution shifts. However, in practice, its effectiveness varies: fine-tuning a pre-trained model improves robustness significantly in some cases but not at all in others (compared to training from scratch). In this work, we seek to characterize the failure modes that pre-training can and cannot address. In particular,… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  5. arXiv:2401.12926  [pdf, other

    cs.LG stat.ML

    DsDm: Model-Aware Dataset Selection with Datamodels

    Authors: Logan Engstrom, Axel Feldmann, Aleksander Madry

    Abstract: When selecting data for training large-scale models, standard practice is to filter for examples that match human notions of data quality. Such filtering yields qualitatively clean datapoints that intuitively should improve model behavior. However, in practice the opposite can often happen: we find that selecting according to similarity with "high quality" data sources may not increase (and can ev… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  6. arXiv:2312.17666  [pdf, other

    cs.CY cs.GT cs.LG stat.ML

    User Strategization and Trustworthy Algorithms

    Authors: Sarah H. Cen, Andrew Ilyas, Aleksander Madry

    Abstract: Many human-facing algorithms -- including those that power recommender systems or hiring decision tools -- are trained on data provided by their users. The developers of these algorithms commonly adopt the assumption that the data generating process is exogenous: that is, how a user reacts to a given prompt (e.g., a recommendation or hiring suggestion) depends on the prompt and not on the algorith… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

  7. arXiv:2312.06205  [pdf, other

    cs.CV cs.LG

    The Journey, Not the Destination: How Data Guides Diffusion Models

    Authors: Kristian Georgiev, Joshua Vendrow, Hadi Salman, Sung Min Park, Aleksander Madry

    Abstract: Diffusion models trained on large datasets can synthesize photo-realistic images of remarkable quality and diversity. However, attributing these images back to the training data-that is, identifying specific training examples which caused an image to be generated-remains a challenge. In this paper, we propose a framework that: (i) provides a formal notion of data attribution in the context of diff… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: 29 pages, 17 figures

  8. arXiv:2307.10163  [pdf, other

    cs.CR cs.LG stat.ML

    Rethinking Backdoor Attacks

    Authors: Alaa Khaddaj, Guillaume Leclerc, Aleksandar Makelov, Kristian Georgiev, Hadi Salman, Andrew Ilyas, Aleksander Madry

    Abstract: In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation. Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them. In this work, we present a different approach to the… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

    Comments: ICML 2023

  9. arXiv:2306.12517  [pdf, other

    cs.LG cs.CV

    FFCV: Accelerating Training by Removing Data Bottlenecks

    Authors: Guillaume Leclerc, Andrew Ilyas, Logan Engstrom, Sung Min Park, Hadi Salman, Aleksander Madry

    Abstract: We present FFCV, a library for easy and fast machine learning model training. FFCV speeds up model training by eliminating (often subtle) data bottlenecks from the training process. In particular, we combine techniques such as an efficient file storage format, caching, data pre-loading, asynchronous data transfer, and just-in-time compilation to (a) make data loading and transfer significantly mor… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

  10. arXiv:2304.10525  [pdf, other

    cs.CY cs.LG cs.SI stat.AP

    A User-Driven Framework for Regulating and Auditing Social Media

    Authors: Sarah H. Cen, Aleksander Madry, Devavrat Shah

    Abstract: People form judgments and make decisions based on the information that they observe. A growing portion of that information is not only provided, but carefully curated by social media platforms. Although lawmakers largely agree that platforms should not operate without any oversight, there is little consensus on how to regulate social media. There is consensus, however, that creating a strict, glob… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Comments: 21 pages, 4 figures

  11. arXiv:2303.14186  [pdf, other

    stat.ML cs.LG

    TRAK: Attributing Model Behavior at Scale

    Authors: Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, Aleksander Madry

    Abstract: The goal of data attribution is to trace model predictions back to training data. Despite a long line of work towards this goal, existing approaches to data attribution tend to force users to choose between computational tractability and efficacy. That is, computationally tractable methods can struggle with accurately attributing model predictions in non-convex settings (e.g., in the context of de… ▽ More

    Submitted 3 April, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

  12. arXiv:2302.07865  [pdf, other

    cs.LG

    Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation

    Authors: Joshua Vendrow, Saachi Jain, Logan Engstrom, Aleksander Madry

    Abstract: Distribution shift is a major source of failure for machine learning models. However, evaluating model reliability under distribution shift can be challenging, especially since it may be difficult to acquire counterfactual examples that exhibit a specified shift. In this work, we introduce the notion of a dataset interface: a framework that, given an input dataset and a user-specified shift, retur… ▽ More

    Submitted 19 June, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

  13. arXiv:2302.06588  [pdf, other

    cs.LG

    Raising the Cost of Malicious AI-Powered Image Editing

    Authors: Hadi Salman, Alaa Khaddaj, Guillaume Leclerc, Andrew Ilyas, Aleksander Madry

    Abstract: We present an approach to mitigating the risks of malicious image editing posed by large diffusion models. The key idea is to immunize images so as to make them resistant to manipulation by these models. This immunization relies on injection of imperceptible adversarial perturbations designed to disrupt the operation of the targeted diffusion models, forcing them to generate unrealistic images. We… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

  14. arXiv:2211.12491  [pdf, other

    cs.LG cs.CV stat.ML

    ModelDiff: A Framework for Comparing Learning Algorithms

    Authors: Harshay Shah, Sung Min Park, Andrew Ilyas, Aleksander Madry

    Abstract: We study the problem of (learning) algorithm comparison, where the goal is to find differences between models trained with two different learning algorithms. We begin by formalizing this goal as one of finding distinguishing feature transformations, i.e., input transformations that change the predictions of models trained with one learning algorithm but not the other. We then present ModelDiff, a… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

  15. arXiv:2207.05739  [pdf, other

    cs.LG

    A Data-Based Perspective on Transfer Learning

    Authors: Saachi Jain, Hadi Salman, Alaa Khaddaj, Eric Wong, Sung Min Park, Aleksander Madry

    Abstract: It is commonly believed that in transfer learning including more pre-training data translates into better performance. However, recent evidence suggests that removing data from the source dataset can actually help too. In this work, we take a closer look at the role of the source dataset's composition in transfer learning and present a framework for probing its impact on downstream performance. Ou… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

  16. arXiv:2207.02842  [pdf, other

    cs.LG

    When does Bias Transfer in Transfer Learning?

    Authors: Hadi Salman, Saachi Jain, Andrew Ilyas, Logan Engstrom, Eric Wong, Aleksander Madry

    Abstract: Using transfer learning to adapt a pre-trained "source model" to a downstream "target task" can dramatically increase performance with seemingly no downside. In this work, we demonstrate that there can exist a downside after all: bias transfer, or the tendency for biases of the source model to persist even after adapting the model to the target class. Through a combination of synthetic and natural… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

  17. arXiv:2206.14754  [pdf, other

    cs.LG

    Distilling Model Failures as Directions in Latent Space

    Authors: Saachi Jain, Hannah Lawrence, Ankur Moitra, Aleksander Madry

    Abstract: Existing methods for isolating hard subpopulations and spurious correlations in datasets often require human intervention. This can make these methods labor-intensive and dataset-specific. To address these shortcomings, we present a scalable method for automatically distilling a model's failure modes. Specifically, we harness linear classifiers to identify consistent error patterns, and, in turn,… ▽ More

    Submitted 2 December, 2022; v1 submitted 29 June, 2022; originally announced June 2022.

  18. arXiv:2206.11228  [pdf, other

    q-bio.NC cs.LG

    Adversarially trained neural representations may already be as robust as corresponding biological neural representations

    Authors: Chong Guo, Michael J. Lee, Guillaume Leclerc, Joel Dapello, Yug Rao, Aleksander Madry, James J. DiCarlo

    Abstract: Visual systems of primates are the gold standard of robust perception. There is thus a general belief that mimicking the neural representations that underlie those systems will yield artificial visual systems that are adversarially robust. In this work, we develop a method for performing adversarial visual attacks directly on primate brain activity. We then leverage this method to demonstrate that… ▽ More

    Submitted 19 June, 2022; originally announced June 2022.

    Comments: 10 pages, 6 figures, ICML2022

  19. arXiv:2204.08945  [pdf, other

    cs.CV cs.AI cs.LG

    Missingness Bias in Model Debugging

    Authors: Saachi Jain, Hadi Salman, Eric Wong, Pengchuan Zhang, Vibhav Vineet, Sai Vemprala, Aleksander Madry

    Abstract: Missingness, or the absence of features from an input, is a concept fundamental to many model debugging tools. However, in computer vision, pixels cannot simply be removed from an image. One thus tends to resort to heuristics such as blacking out pixels, which may in turn introduce bias into the debugging process. We study such biases and, in particular, show how transformer-based architectures ca… ▽ More

    Submitted 13 June, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: Published at ICLR 2022

  20. arXiv:2202.00622  [pdf, other

    stat.ML cs.CV cs.LG

    Datamodels: Predicting Predictions from Training Data

    Authors: Andrew Ilyas, Sung Min Park, Logan Engstrom, Guillaume Leclerc, Aleksander Madry

    Abstract: We present a conceptual framework, datamodeling, for analyzing the behavior of a model class in terms of the training data. For any fixed "target" example $x$, training set $S$, and learning algorithm, a datamodel is a parameterized function $2^S \to \mathbb{R}$ that for any subset of $S' \subset S$ -- using only information about which examples of $S$ are contained in $S'$ -- predicts the outcome… ▽ More

    Submitted 1 February, 2022; originally announced February 2022.

  21. arXiv:2112.15329  [pdf, other

    cs.LG cs.CV

    On Distinctive Properties of Universal Perturbations

    Authors: Sung Min Park, Kuo-An Wei, Kai Xiao, Jerry Li, Aleksander Madry

    Abstract: We identify properties of universal adversarial perturbations (UAPs) that distinguish them from standard adversarial perturbations. Specifically, we show that targeted UAPs generated by projected gradient descent exhibit two human-aligned properties: semantic locality and spatial invariance, which standard targeted adversarial perturbations lack. We also demonstrate that UAPs contain significantly… ▽ More

    Submitted 31 December, 2021; originally announced December 2021.

  22. arXiv:2112.01008  [pdf, other

    cs.LG cs.CV

    Editing a classifier by rewriting its prediction rules

    Authors: Shibani Santurkar, Dimitris Tsipras, Mahalaxmi Elango, David Bau, Antonio Torralba, Aleksander Madry

    Abstract: We present a methodology for modifying the behavior of a classifier by directly rewriting its prediction rules. Our approach requires virtually no additional data collection and can be applied to a variety of settings, including adapting a model to new environments, and modifying it to ignore spurious features. Our code is available at https://github.com/MadryLab/EditingClassifiers .

    Submitted 2 December, 2021; originally announced December 2021.

  23. arXiv:2111.10368  [pdf, ps, other

    cs.DS

    Faster Sparse Minimum Cost Flow by Electrical Flow Localization

    Authors: Kyriakos Axiotis, Aleksander Mądry, Adrian Vladu

    Abstract: We give an $\widetilde{O}({m^{3/2 - 1/762} \log (U+W))}$ time algorithm for minimum cost flow with capacities bounded by $U$ and costs bounded by $W$. For sparse graphs with general capacities, this is the first algorithm to improve over the $\widetilde{O}({m^{3/2} \log^{O(1)} (U+W)})$ running time obtained by an appropriate instantiation of an interior point method [Daitch-Spielman, 2008]. Our… ▽ More

    Submitted 19 November, 2021; originally announced November 2021.

  24. arXiv:2110.08220  [pdf, other

    cs.LG cs.CV

    Combining Diverse Feature Priors

    Authors: Saachi Jain, Dimitris Tsipras, Aleksander Madry

    Abstract: To improve model generalization, model designers often restrict the features that their models use, either implicitly or explicitly. In this work, we explore the design space of leveraging such feature priors by viewing them as distinct perspectives on the data. Specifically, we find that models trained with diverse sets of feature priors have less overlap** failure modes, and can thus be combin… ▽ More

    Submitted 14 July, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

  25. arXiv:2110.07719  [pdf, other

    cs.CV cs.AI cs.LG

    Certified Patch Robustness via Smoothed Vision Transformers

    Authors: Hadi Salman, Saachi Jain, Eric Wong, Aleksander Mądry

    Abstract: Certified patch defenses can guarantee robustness of an image classifier to arbitrary changes within a bounded contiguous region. But, currently, this robustness comes at a cost of degraded standard accuracies and slower inference times. We demonstrate how using vision transformers enables significantly better certified patch robustness that is also more computationally efficient and does not incu… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

  26. arXiv:2106.03805  [pdf, other

    cs.CV cs.LG stat.ML

    3DB: A Framework for Debugging Computer Vision Models

    Authors: Guillaume Leclerc, Hadi Salman, Andrew Ilyas, Sai Vemprala, Logan Engstrom, Vibhav Vineet, Kai Xiao, Pengchuan Zhang, Shibani Santurkar, Greg Yang, Ashish Kapoor, Aleksander Madry

    Abstract: We introduce 3DB: an extendable, unified framework for testing and debugging vision models using photorealistic simulation. We demonstrate, through a wide range of use cases, that 3DB allows users to discover vulnerabilities in computer vision systems and gain insights into how models make decisions. 3DB captures and generalizes many robustness analyses from prior work, and enables one to study th… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

  27. arXiv:2105.04857  [pdf, other

    cs.LG stat.ML

    Leveraging Sparse Linear Layers for Debuggable Deep Networks

    Authors: Eric Wong, Shibani Santurkar, Aleksander Mądry

    Abstract: We show how fitting sparse linear models over learned deep feature representations can lead to more debuggable neural networks. These networks remain highly accurate while also being more amenable to human interpretation, as we demonstrate quantiatively via numerical and human experiments. We further illustrate how the resulting sparse explanations can help to identify spurious correlations, expla… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

  28. arXiv:2012.12235  [pdf, other

    cs.CV cs.LG

    Unadversarial Examples: Designing Objects for Robust Vision

    Authors: Hadi Salman, Andrew Ilyas, Logan Engstrom, Sai Vemprala, Aleksander Madry, Ashish Kapoor

    Abstract: We study a class of realistic computer vision settings wherein one can influence the design of the objects being recognized. We develop a framework that leverages this capability to significantly improve vision models' performance and robustness. This framework exploits the sensitivity of modern machine learning algorithms to input perturbations in order to design "robust objects," i.e., objects t… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

  29. arXiv:2012.10544  [pdf, other

    cs.LG cs.AI cs.CR cs.CV

    Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses

    Authors: Micah Goldblum, Dimitris Tsipras, Chulin Xie, Xinyun Chen, Avi Schwarzschild, Dawn Song, Aleksander Madry, Bo Li, Tom Goldstein

    Abstract: As machine learning systems grow in scale, so do their training data requirements, forcing practitioners to automate and outsource the curation of training data in order to achieve state-of-the-art performance. The absence of trustworthy human supervision over the data collection process exposes organizations to security vulnerabilities; training data can be manipulated to control and degrade the… ▽ More

    Submitted 31 March, 2021; v1 submitted 18 December, 2020; originally announced December 2020.

  30. arXiv:2008.04859  [pdf, other

    cs.CV cs.LG stat.ML

    BREEDS: Benchmarks for Subpopulation Shift

    Authors: Shibani Santurkar, Dimitris Tsipras, Aleksander Madry

    Abstract: We develop a methodology for assessing the robustness of models to subpopulation shift---specifically, their ability to generalize to novel data subpopulations that were not observed during training. Our approach leverages the class structure underlying existing datasets to control the data subpopulations that comprise the training and test distributions. This enables us to synthesize realistic di… ▽ More

    Submitted 11 August, 2020; originally announced August 2020.

  31. arXiv:2007.08489  [pdf, other

    cs.CV cs.LG stat.ML

    Do Adversarially Robust ImageNet Models Transfer Better?

    Authors: Hadi Salman, Andrew Ilyas, Logan Engstrom, Ashish Kapoor, Aleksander Madry

    Abstract: Transfer learning is a widely-used paradigm in deep learning, where models pre-trained on standard datasets can be efficiently adapted to downstream tasks. Typically, better pre-trained models yield better transfer results, suggesting that initial accuracy is a key aspect of transfer learning performance. In this work, we identify another such aspect: we find that adversarially robust models, whil… ▽ More

    Submitted 7 December, 2020; v1 submitted 16 July, 2020; originally announced July 2020.

    Comments: NeurIPS 2020

  32. arXiv:2006.09994  [pdf, other

    cs.CV cs.LG

    Noise or Signal: The Role of Image Backgrounds in Object Recognition

    Authors: Kai Xiao, Logan Engstrom, Andrew Ilyas, Aleksander Madry

    Abstract: We assess the tendency of state-of-the-art object recognition models to depend on signals from image backgrounds. We create a toolkit for disentangling foreground and background signal on ImageNet images, and find that (a) models can achieve non-trivial accuracy by relying on the background alone, (b) models often misclassify images even in the presence of correctly classified foregrounds--up to 8… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

  33. arXiv:2005.12729  [pdf, other

    cs.LG cs.RO stat.ML

    Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO

    Authors: Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry

    Abstract: We study the roots of algorithmic progress in deep policy gradient algorithms through a case study on two popular algorithms: Proximal Policy Optimization (PPO) and Trust Region Policy Optimization (TRPO). Specifically, we investigate the consequences of "code-level optimizations:" algorithm augmentations found only in implementations or described as auxiliary details to the core algorithm. Seemin… ▽ More

    Submitted 25 May, 2020; originally announced May 2020.

    Comments: ICLR 2020 version. arXiv admin note: text overlap with arXiv:1811.02553

  34. arXiv:2005.11295  [pdf, other

    cs.CV cs.LG stat.ML

    From ImageNet to Image Classification: Contextualizing Progress on Benchmarks

    Authors: Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Andrew Ilyas, Aleksander Madry

    Abstract: Building rich machine learning datasets in a scalable manner often necessitates a crowd-sourced data collection pipeline. In this work, we use human studies to investigate the consequences of employing such a pipeline, focusing on the popular ImageNet dataset. We study how specific design choices in the ImageNet creation process impact the fidelity of the resulting dataset---including the introduc… ▽ More

    Submitted 22 May, 2020; originally announced May 2020.

  35. arXiv:2005.09619  [pdf, other

    stat.ML cs.CV cs.LG

    Identifying Statistical Bias in Dataset Replication

    Authors: Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Jacob Steinhardt, Aleksander Madry

    Abstract: Dataset replication is a useful tool for assessing whether improvements in test accuracy on a specific benchmark correspond to improvements in models' ability to generalize reliably. In this work, we present unintuitive yet significant ways in which standard approaches to dataset replication introduce statistical bias, skewing the resulting observations. We study ImageNet-v2, a replication of the… ▽ More

    Submitted 2 September, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

  36. arXiv:2003.04863  [pdf, ps, other

    cs.DS

    Circulation Control for Faster Minimum Cost Flow in Unit-Capacity Graphs

    Authors: Kyriakos Axiotis, Aleksander Mądry, Adrian Vladu

    Abstract: We present an $m^{4/3+o(1)}\log W$-time algorithm for solving the minimum cost flow problem in graphs with unit capacity, where $W$ is the maximum absolute value of any edge weight. For sparse graphs, this improves over the best known running time for this problem and, by well-known reductions, also implies improved running times for the shortest path problem with negative weights, minimum cost bi… ▽ More

    Submitted 9 April, 2020; v1 submitted 10 March, 2020; originally announced March 2020.

    Comments: improved running time to m^{4/3+o(1)} log W

  37. arXiv:2002.10376  [pdf, other

    cs.LG stat.ML

    The Two Regimes of Deep Network Training

    Authors: Guillaume Leclerc, Aleksander Madry

    Abstract: Learning rate schedule has a major impact on the performance of deep learning models. Still, the choice of a schedule is often heuristical. We aim to develop a precise understanding of the effects of different learning rate schedules and the appropriate way to select them. To this end, we isolate two distinct phases of training, the first, which we refer to as the "large-step" regime, exhibits a r… ▽ More

    Submitted 24 February, 2020; originally announced February 2020.

    Comments: 14 pages (5 of appendix), 14 figures

  38. arXiv:2002.08347  [pdf, other

    cs.LG cs.CR stat.ML

    On Adaptive Attacks to Adversarial Example Defenses

    Authors: Florian Tramer, Nicholas Carlini, Wieland Brendel, Aleksander Madry

    Abstract: Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS---and chosen for illustrative and pedagogical purposes---can be circumvented despite attempting to perform evaluations using adaptive at… ▽ More

    Submitted 23 October, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

    Comments: NeurIPS 2020

  39. arXiv:1912.02771  [pdf, other

    stat.ML cs.CR cs.LG

    Label-Consistent Backdoor Attacks

    Authors: Alexander Turner, Dimitris Tsipras, Aleksander Madry

    Abstract: Deep neural networks have been demonstrated to be vulnerable to backdoor attacks. Specifically, by injecting a small number of maliciously constructed inputs into the training set, an adversary is able to plant a backdoor into the trained model. This backdoor can then be activated during inference by a backdoor trigger to fully control the model's behavior. While such attacks are very effective, t… ▽ More

    Submitted 6 December, 2019; v1 submitted 5 December, 2019; originally announced December 2019.

  40. arXiv:1906.09453  [pdf, other

    cs.CV cs.LG cs.NE stat.ML

    Image Synthesis with a Single (Robust) Classifier

    Authors: Shibani Santurkar, Dimitris Tsipras, Brandon Tran, Andrew Ilyas, Logan Engstrom, Aleksander Madry

    Abstract: We show that the basic classification framework alone can be used to tackle some of the most challenging tasks in image synthesis. In contrast to other state-of-the-art approaches, the toolkit we develop is rather minimal: it uses a single, off-the-shelf classifier for all these tasks. The crux of our approach is that we train this classifier to be adversarially robust. It turns out that adversari… ▽ More

    Submitted 8 August, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

  41. arXiv:1906.00945  [pdf, other

    stat.ML cs.CV cs.LG cs.NE

    Adversarial Robustness as a Prior for Learned Representations

    Authors: Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Brandon Tran, Aleksander Madry

    Abstract: An important goal in deep learning is to learn versatile, high-level feature representations of input data. However, standard networks' representations seem to possess shortcomings that, as we illustrate, prevent them from fully realizing this goal. In this work, we show that robust optimization can be re-cast as a tool for enforcing priors on the features learned by deep neural networks. It turns… ▽ More

    Submitted 27 September, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

  42. arXiv:1905.02175  [pdf, other

    stat.ML cs.CR cs.CV cs.LG

    Adversarial Examples Are Not Bugs, They Are Features

    Authors: Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, Aleksander Madry

    Abstract: Adversarial examples have attracted significant attention in machine learning, but the reasons for their existence and pervasiveness remain unclear. We demonstrate that adversarial examples can be directly attributed to the presence of non-robust features: features derived from patterns in the data distribution that are highly predictive, yet brittle and incomprehensible to humans. After capturing… ▽ More

    Submitted 12 August, 2019; v1 submitted 6 May, 2019; originally announced May 2019.

  43. arXiv:1902.06705  [pdf, ps, other

    cs.LG cs.CR stat.ML

    On Evaluating Adversarial Robustness

    Authors: Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian Goodfellow, Aleksander Madry, Alexey Kurakin

    Abstract: Correctly evaluating defenses against adversarial examples has proven to be extremely difficult. Despite the significant amount of recent work attempting to design defenses that withstand adaptive attacks, few have succeeded; most papers that propose defenses are quickly shown to be incorrect. We believe a large contributing factor is the difficulty of performing security evaluations. In this pa… ▽ More

    Submitted 20 February, 2019; v1 submitted 18 February, 2019; originally announced February 2019.

    Comments: Living document; source available at https://github.com/evaluating-adversarial-robustness/adv-eval-paper/

  44. arXiv:1811.02553  [pdf, other

    cs.LG cs.NE cs.RO stat.ML

    A Closer Look at Deep Policy Gradients

    Authors: Andrew Ilyas, Logan Engstrom, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry

    Abstract: We study how the behavior of deep policy gradient algorithms reflects the conceptual framework motivating their development. To this end, we propose a fine-grained analysis of state-of-the-art methods based on key elements of this framework: gradient estimation, value prediction, and optimization landscapes. Our results show that the behavior of deep policy gradient algorithms often deviates from… ▽ More

    Submitted 25 May, 2020; v1 submitted 6 November, 2018; originally announced November 2018.

    Comments: ICLR 2020 version

  45. arXiv:1811.00636  [pdf, other

    cs.LG cs.CR stat.ML

    Spectral Signatures in Backdoor Attacks

    Authors: Brandon Tran, Jerry Li, Aleksander Madry

    Abstract: A recent line of work has uncovered a new form of data poisoning: so-called \emph{backdoor} attacks. These attacks are particularly dangerous because they do not affect a network's behavior on typical, benign data. Rather, the network only deviates from its expected output when triggered by a perturbation planted by an adversary. In this paper, we identify a new property of all known backdoor at… ▽ More

    Submitted 1 November, 2018; originally announced November 2018.

    Comments: 16 pages, accepted to NIPS 2018

  46. arXiv:1809.03008  [pdf, other

    cs.LG cs.CR cs.NE stat.ML

    Training for Faster Adversarial Robustness Verification via Inducing ReLU Stability

    Authors: Kai Y. Xiao, Vincent Tjeng, Nur Muhammad Shafiullah, Aleksander Madry

    Abstract: We explore the concept of co-design in the context of neural network verification. Specifically, we aim to train deep neural networks that not only are robust to adversarial perturbations but also whose robustness can be verified more easily. To this end, we identify two properties of network models - weight sparsity and so-called ReLU stability - that turn out to significantly impact the complexi… ▽ More

    Submitted 23 April, 2019; v1 submitted 9 September, 2018; originally announced September 2018.

    Journal ref: International Conference on Learning Representations (ICLR) 2019

  47. arXiv:1807.07978  [pdf, other

    stat.ML cs.CR cs.LG

    Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors

    Authors: Andrew Ilyas, Logan Engstrom, Aleksander Madry

    Abstract: We study the problem of generating adversarial examples in a black-box setting in which only loss-oracle access to a model is available. We introduce a framework that conceptually unifies much of the existing work on black-box attacks, and we demonstrate that the current state-of-the-art methods are optimal in a natural sense. Despite this optimality, we show how to improve black-box attacks by br… ▽ More

    Submitted 27 March, 2019; v1 submitted 20 July, 2018; originally announced July 2018.

    Comments: To appear at ICLR 2019; Code available at https://git.io/blackbox-bandits

  48. arXiv:1805.12152  [pdf, other

    stat.ML cs.CV cs.LG cs.NE

    Robustness May Be at Odds with Accuracy

    Authors: Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, Aleksander Madry

    Abstract: We show that there may exist an inherent tension between the goal of adversarial robustness and that of standard generalization. Specifically, training robust models may not only be more resource-consuming, but also lead to a reduction of standard accuracy. We demonstrate that this trade-off between the standard accuracy of a model and its robustness to adversarial perturbations provably exists in… ▽ More

    Submitted 9 September, 2019; v1 submitted 30 May, 2018; originally announced May 2018.

    Comments: ICLR'19

  49. arXiv:1805.11604  [pdf, other

    stat.ML cs.LG cs.NE

    How Does Batch Normalization Help Optimization?

    Authors: Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, Aleksander Madry

    Abstract: Batch Normalization (BatchNorm) is a widely adopted technique that enables faster and more stable training of deep neural networks (DNNs). Despite its pervasiveness, the exact reasons for BatchNorm's effectiveness are still poorly understood. The popular belief is that this effectiveness stems from controlling the change of the layers' input distributions during training to reduce the so-called "i… ▽ More

    Submitted 14 April, 2019; v1 submitted 29 May, 2018; originally announced May 2018.

    Comments: In NeurIPS'18

  50. arXiv:1804.11285  [pdf, other

    cs.LG cs.NE stat.ML

    Adversarially Robust Generalization Requires More Data

    Authors: Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, Aleksander Mądry

    Abstract: Machine learning models are often susceptible to adversarial perturbations of their inputs. Even small perturbations can cause state-of-the-art classifiers with high "standard" accuracy to produce an incorrect prediction with high confidence. To better understand this phenomenon, we study adversarially robust learning from the viewpoint of generalization. We show that already in a simple natural d… ▽ More

    Submitted 2 May, 2018; v1 submitted 30 April, 2018; originally announced April 2018.

    Comments: Small changes for biblatex compatibility