Search | arXiv e-print repository

UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI

Authors: Ilia Shumailov, Jamie Hayes, Eleni Triantafillou, Guillermo Ortiz-Jimenez, Nicolas Papernot, Matthew Jagielski, Itay Yona, Heidi Howard, Eugene Bagdasaryan

Abstract: Exact unlearning was first introduced as a privacy mechanism that allowed a user to retract their data from machine learning models on request. Shortly after, inexact schemes were proposed to mitigate the impractical costs associated with exact unlearning. More recently unlearning is often discussed as an approach for removal of impermissible knowledge i.e. knowledge that the model should not poss… ▽ More Exact unlearning was first introduced as a privacy mechanism that allowed a user to retract their data from machine learning models on request. Shortly after, inexact schemes were proposed to mitigate the impractical costs associated with exact unlearning. More recently unlearning is often discussed as an approach for removal of impermissible knowledge i.e. knowledge that the model should not possess such as unlicensed copyrighted, inaccurate, or malicious information. The promise is that if the model does not have a certain malicious capability, then it cannot be used for the associated malicious purpose. In this paper we revisit the paradigm in which unlearning is used for in Large Language Models (LLMs) and highlight an underlying inconsistency arising from in-context learning. Unlearning can be an effective control mechanism for the training phase, yet it does not prevent the model from performing an impermissible act during inference. We introduce a concept of ununlearning, where unlearned knowledge gets reintroduced in-context, effectively rendering the model capable of behaving as if it knows the forgotten knowledge. As a result, we argue that content filtering for impermissible knowledge will be required and even exact unlearning schemes are not enough for effective content regulation. We discuss feasibility of ununlearning for modern LLMs and examine broader implications. △ Less

Submitted 27 June, 2024; originally announced July 2024.

arXiv:2406.08039 [pdf, other]

Beyond the Mean: Differentially Private Prototypes for Private Transfer Learning

Authors: Dariush Wahdany, Matthew Jagielski, Adam Dziedzic, Franziska Boenisch

Abstract: Machine learning (ML) models have been shown to leak private information from their training datasets. Differential Privacy (DP), typically implemented through the differential private stochastic gradient descent algorithm (DP-SGD), has become the standard solution to bound leakage from the models. Despite recent improvements, DP-SGD-based approaches for private learning still usually struggle in… ▽ More Machine learning (ML) models have been shown to leak private information from their training datasets. Differential Privacy (DP), typically implemented through the differential private stochastic gradient descent algorithm (DP-SGD), has become the standard solution to bound leakage from the models. Despite recent improvements, DP-SGD-based approaches for private learning still usually struggle in the high privacy ($\varepsilon\le1)$ and low data regimes, and when the private training datasets are imbalanced. To overcome these limitations, we propose Differentially Private Prototype Learning (DPPL) as a new paradigm for private transfer learning. DPPL leverages publicly pre-trained encoders to extract features from private data and generates DP prototypes that represent each private class in the embedding space and can be publicly released for inference. Since our DP prototypes can be obtained from only a few private training data points and without iterative noise addition, they offer high-utility predictions and strong privacy guarantees even under the notion of pure DP. We additionally show that privacy-utility trade-offs can be further improved when leveraging the public data beyond pre-training of the encoder: in particular, we can privately sample our DP prototypes from the publicly available data points used to train the encoder. Our experimental evaluation with four state-of-the-art encoders, four vision datasets, and under different data and imbalancedness regimes demonstrate DPPL's high performance under strong privacy guarantees in challenging private learning setups. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Submitted to NeurIPS 2024

MSC Class: 68T01

arXiv:2405.20485 [pdf, other]

Phantom: General Trigger Attacks on Retrieval Augmented Language Generation

Authors: Harsh Chaudhari, Giorgio Severi, John Abascal, Matthew Jagielski, Christopher A. Choquette-Choo, Milad Nasr, Cristina Nita-Rotaru, Alina Oprea

Abstract: Retrieval Augmented Generation (RAG) expands the capabilities of modern large language models (LLMs) in chatbot applications, enabling developers to adapt and personalize the LLM output without expensive training or fine-tuning. RAG systems use an external knowledge database to retrieve the most relevant documents for a given query, providing this context to the LLM generator. While RAG achieves i… ▽ More Retrieval Augmented Generation (RAG) expands the capabilities of modern large language models (LLMs) in chatbot applications, enabling developers to adapt and personalize the LLM output without expensive training or fine-tuning. RAG systems use an external knowledge database to retrieve the most relevant documents for a given query, providing this context to the LLM generator. While RAG achieves impressive utility in many applications, its adoption to enable personalized generative models introduces new security risks. In this work, we propose new attack surfaces for an adversary to compromise a victim's RAG system, by injecting a single malicious document in its knowledge database. We design Phantom, general two-step attack framework against RAG augmented LLMs. The first step involves crafting a poisoned document designed to be retrieved by the RAG system within the top-k results only when an adversarial trigger, a specific sequence of words acting as backdoor, is present in the victim's queries. In the second step, a specially crafted adversarial string within the poisoned document triggers various adversarial attacks in the LLM generator, including denial of service, reputation damage, privacy violations, and harmful behaviors. We demonstrate our attacks on multiple LLM architectures, including Gemma, Vicuna, and Llama. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2404.02052 [pdf, other]

Noise Masking Attacks and Defenses for Pretrained Speech Models

Authors: Matthew Jagielski, Om Thakkar, Lun Wang

Abstract: Speech models are often trained on sensitive data in order to improve model performance, leading to potential privacy leakage. Our work considers noise masking attacks, introduced by Amid et al. 2022, which attack automatic speech recognition (ASR) models by requesting a transcript of an utterance which is partially replaced with noise. They show that when a record has been seen at training time,… ▽ More Speech models are often trained on sensitive data in order to improve model performance, leading to potential privacy leakage. Our work considers noise masking attacks, introduced by Amid et al. 2022, which attack automatic speech recognition (ASR) models by requesting a transcript of an utterance which is partially replaced with noise. They show that when a record has been seen at training time, the model will transcribe the noisy record with its memorized sensitive transcript. In our work, we extend these attacks beyond ASR models, to attack pretrained speech encoders. Our method fine-tunes the encoder to produce an ASR model, and then performs noise masking on this model, which we find recovers private information from the pretraining data, despite the model never having seen transcripts at pretraining time! We show how to improve the precision of these attacks and investigate a number of countermeasures to our attacks. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: accepted to ICASSP 2024

arXiv:2403.06634 [pdf, other]

Stealing Part of a Production Language Model

Authors: Nicholas Carlini, Daniel Paleka, Krishnamurthy Dj Dvijotham, Thomas Steinke, Jonathan Hayase, A. Feder Cooper, Katherine Lee, Matthew Jagielski, Milad Nasr, Arthur Conmy, Eric Wallace, David Rolnick, Florian Tramèr

Abstract: We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \… ▽ More We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \$20 USD, our attack extracts the entire projection matrix of OpenAI's Ada and Babbage language models. We thereby confirm, for the first time, that these black-box models have a hidden dimension of 1024 and 2048, respectively. We also recover the exact hidden dimension size of the gpt-3.5-turbo model, and estimate it would cost under \$2,000 in queries to recover the entire projection matrix. We conclude with potential defenses and mitigations, and discuss the implications of possible future work that could extend our attack. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2402.09403 [pdf, other]

Auditing Private Prediction

Authors: Karan Chadha, Matthew Jagielski, Nicolas Papernot, Christopher Choquette-Choo, Milad Nasr

Abstract: Differential privacy (DP) offers a theoretical upper bound on the potential privacy leakage of analgorithm, while empirical auditing establishes a practical lower bound. Auditing techniques exist forDP training algorithms. However machine learning can also be made private at inference. We propose thefirst framework for auditing private prediction where we instantiate adversaries with varying poiso… ▽ More Differential privacy (DP) offers a theoretical upper bound on the potential privacy leakage of analgorithm, while empirical auditing establishes a practical lower bound. Auditing techniques exist forDP training algorithms. However machine learning can also be made private at inference. We propose thefirst framework for auditing private prediction where we instantiate adversaries with varying poisoningand query capabilities. This enables us to study the privacy leakage of four private prediction algorithms:PATE [Papernot et al., 2016], CaPC [Choquette-Choo et al., 2020], PromptPATE [Duan et al., 2023],and Private-kNN [Zhu et al., 2020]. To conduct our audit, we introduce novel techniques to empiricallyevaluate privacy leakage in terms of Renyi DP. Our experiments show that (i) the privacy analysis ofprivate prediction can be improved, (ii) algorithms which are easier to poison lead to much higher privacyleakage, and (iii) the privacy leakage is significantly lower for adversaries without query control than thosewith full control. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2311.17035 [pdf, other]

Scalable Extraction of Training Data from (Production) Language Models

Authors: Milad Nasr, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Eric Wallace, Florian Tramèr, Katherine Lee

Abstract: This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset. We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT. Existing techniques from… ▽ More This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset. We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT. Existing techniques from the literature suffice to attack unaligned models; in order to attack the aligned ChatGPT, we develop a new divergence attack that causes the model to diverge from its chatbot-style generations and emit training data at a rate 150x higher than when behaving properly. Our methods show practical attacks can recover far more data than previously thought, and reveal that current alignment techniques do not eliminate memorization. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2309.05610 [pdf, other]

Privacy Side Channels in Machine Learning Systems

Authors: Edoardo Debenedetti, Giorgio Severi, Nicholas Carlini, Christopher A. Choquette-Choo, Matthew Jagielski, Milad Nasr, Eric Wallace, Florian Tramèr

Abstract: Most current approaches for protecting privacy in machine learning (ML) assume that models exist in a vacuum, when in reality, ML models are part of larger systems that include components for training data filtering, output monitoring, and more. In this work, we introduce privacy side channels: attacks that exploit these system-level components to extract private information at far higher rates th… ▽ More Most current approaches for protecting privacy in machine learning (ML) assume that models exist in a vacuum, when in reality, ML models are part of larger systems that include components for training data filtering, output monitoring, and more. In this work, we introduce privacy side channels: attacks that exploit these system-level components to extract private information at far higher rates than is otherwise possible for standalone models. We propose four categories of side channels that span the entire ML lifecycle (training data filtering, input preprocessing, output post-processing, and query filtering) and allow for either enhanced membership inference attacks or even novel threats such as extracting users' test queries. For example, we show that deduplicating training data before applying differentially-private training creates a side-channel that completely invalidates any provable privacy guarantees. Moreover, we show that systems which block language models from regenerating training data can be exploited to allow exact reconstruction of private keys contained in the training set -- even if the model did not memorize these keys. Taken together, our results demonstrate the need for a holistic, end-to-end privacy analysis of machine learning. △ Less

Submitted 11 September, 2023; originally announced September 2023.

arXiv:2307.14692 [pdf, other]

Backdoor Attacks for In-Context Learning with Language Models

Authors: Nikhil Kandpal, Matthew Jagielski, Florian Tramèr, Nicholas Carlini

Abstract: Because state-of-the-art language models are expensive to train, most practitioners must make use of one of the few publicly available language models or language model APIs. This consolidation of trust increases the potency of backdoor attacks, where an adversary tampers with a machine learning model in order to make it perform some malicious behavior on inputs that contain a predefined backdoor… ▽ More Because state-of-the-art language models are expensive to train, most practitioners must make use of one of the few publicly available language models or language model APIs. This consolidation of trust increases the potency of backdoor attacks, where an adversary tampers with a machine learning model in order to make it perform some malicious behavior on inputs that contain a predefined backdoor trigger. We show that the in-context learning ability of large language models significantly complicates the question of develo** backdoor attacks, as a successful backdoor must work against various prompting strategies and should not affect the model's general purpose capabilities. We design a new attack for eliciting targeted misclassification when language models are prompted to perform a particular target task and demonstrate the feasibility of this attack by backdooring multiple large language models ranging in size from 1.3 billion to 6 billion parameters. Finally we study defenses to mitigate the potential harms of our attack: for example, while in the white-box setting we show that fine-tuning models for as few as 500 steps suffices to remove the backdoor behavior, in the black-box setting we are unable to develop a successful defense that relies on prompt engineering alone. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: AdvML Frontiers Workshop 2023

arXiv:2306.15447 [pdf, other]

Are aligned neural networks adversarially aligned?

Authors: Nicholas Carlini, Milad Nasr, Christopher A. Choquette-Choo, Matthew Jagielski, Irena Gao, Anas Awadalla, Pang Wei Koh, Daphne Ippolito, Katherine Lee, Florian Tramer, Ludwig Schmidt

Abstract: Large language models are now tuned to align with the goals of their creators, namely to be "helpful and harmless." These models should respond helpfully to user questions, but refuse to answer requests that could cause harm. However, adversarial users can construct inputs which circumvent attempts at alignment. In this work, we study adversarial alignment, and ask to what extent these models rema… ▽ More Large language models are now tuned to align with the goals of their creators, namely to be "helpful and harmless." These models should respond helpfully to user questions, but refuse to answer requests that could cause harm. However, adversarial users can construct inputs which circumvent attempts at alignment. In this work, we study adversarial alignment, and ask to what extent these models remain aligned when interacting with an adversarial user who constructs worst-case inputs (adversarial examples). These inputs are designed to cause the model to emit harmful content that would otherwise be prohibited. We show that existing NLP-based optimization attacks are insufficiently powerful to reliably attack aligned text models: even when current NLP-based attacks fail, we can find adversarial inputs with brute force. As a result, the failure of current attacks should not be seen as proof that aligned text models remain aligned under adversarial inputs. However the recent trend in large-scale ML models is multimodal models that allow users to provide images that influence the text that is generated. We show these models can be easily attacked, i.e., induced to perform arbitrary un-aligned behavior through adversarial perturbation of the input image. We conjecture that improved NLP attacks may demonstrate this same level of adversarial control over text-only models. △ Less

Submitted 6 May, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.00133 [pdf, ps, other]

A Note On Interpreting Canary Exposure

Authors: Matthew Jagielski

Abstract: Canary exposure, introduced in Carlini et al. is frequently used to empirically evaluate, or audit, the privacy of machine learning model training. The goal of this note is to provide some intuition on how to interpret canary exposure, including by relating it to membership inference attacks and differential privacy. Canary exposure, introduced in Carlini et al. is frequently used to empirically evaluate, or audit, the privacy of machine learning model training. The goal of this note is to provide some intuition on how to interpret canary exposure, including by relating it to membership inference attacks and differential privacy. △ Less

Submitted 2 June, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

Comments: short note, edited to add a sentence on independence of canary losses, including adding Pillutla et al

arXiv:2305.10403 [pdf, other]

PaLM 2 Technical Report

Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yan** Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yu**g Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction. PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities. When discussing the PaLM 2 family, it is important to distinguish between pre-trained models (of various sizes), fine-tuned variants of these models, and the user-facing products that use these models. In particular, user-facing products typically include additional pre- and post-processing steps. Additionally, the underlying models may evolve over time. Therefore, one should not expect the performance of user-facing products to exactly match the results reported in this report. △ Less

Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

arXiv:2305.08846 [pdf, other]

Privacy Auditing with One (1) Training Run

Authors: Thomas Steinke, Milad Nasr, Matthew Jagielski

Abstract: We propose a scheme for auditing differentially private machine learning systems with a single training run. This exploits the parallelism of being able to add or remove multiple training examples independently. We analyze this using the connection between differential privacy and statistical generalization, which avoids the cost of group privacy. Our auditing scheme requires minimal assumptions a… ▽ More We propose a scheme for auditing differentially private machine learning systems with a single training run. This exploits the parallelism of being able to add or remove multiple training examples independently. We analyze this using the connection between differential privacy and statistical generalization, which avoids the cost of group privacy. Our auditing scheme requires minimal assumptions about the algorithm and can be applied in the black-box or white-box setting. △ Less

Submitted 15 May, 2023; originally announced May 2023.

arXiv:2305.05973 [pdf, other]

Synthetic Query Generation for Privacy-Preserving Deep Retrieval Systems using Differentially Private Language Models

Authors: Aldo Gael Carranza, Rezsa Farahani, Natalia Ponomareva, Alex Kurakin, Matthew Jagielski, Milad Nasr

Abstract: We address the challenge of ensuring differential privacy (DP) guarantees in training deep retrieval systems. Training these systems often involves the use of contrastive-style losses, which are typically non-per-example decomposable, making them difficult to directly DP-train with since common techniques require per-example gradients. To address this issue, we propose an approach that prioritizes… ▽ More We address the challenge of ensuring differential privacy (DP) guarantees in training deep retrieval systems. Training these systems often involves the use of contrastive-style losses, which are typically non-per-example decomposable, making them difficult to directly DP-train with since common techniques require per-example gradients. To address this issue, we propose an approach that prioritizes ensuring query privacy prior to training a deep retrieval system. Our method employs DP language models (LMs) to generate private synthetic queries representative of the original data. These synthetic queries can be used in downstream retrieval system training without compromising privacy. Our approach demonstrates a significant enhancement in retrieval quality compared to direct DP-training, all while maintaining query-level privacy guarantees. This work highlights the potential of harnessing LMs to overcome limitations in standard DP-training methods. △ Less

Submitted 23 May, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

Comments: Accepted to NAACL 2024

arXiv:2304.06929 [pdf]

Advancing Differential Privacy: Where We Are Now and Future Directions for Real-World Deployment

Authors: Rachel Cummings, Damien Desfontaines, David Evans, Roxana Geambasu, Yangsibo Huang, Matthew Jagielski, Peter Kairouz, Gautam Kamath, Sewoong Oh, Olga Ohrimenko, Nicolas Papernot, Ryan Rogers, Milan Shen, Shuang Song, Weijie Su, Andreas Terzis, Abhradeep Thakurta, Sergei Vassilvitskii, Yu-Xiang Wang, Li Xiong, Sergey Yekhanin, Da Yu, Huanyu Zhang, Wanrong Zhang

Abstract: In this article, we present a detailed review of current practices and state-of-the-art methodologies in the field of differential privacy (DP), with a focus of advancing DP's deployment in real-world applications. Key points and high-level contents of the article were originated from the discussions from "Differential Privacy (DP): Challenges Towards the Next Frontier," a workshop held in July 20… ▽ More In this article, we present a detailed review of current practices and state-of-the-art methodologies in the field of differential privacy (DP), with a focus of advancing DP's deployment in real-world applications. Key points and high-level contents of the article were originated from the discussions from "Differential Privacy (DP): Challenges Towards the Next Frontier," a workshop held in July 2022 with experts from industry, academia, and the public sector seeking answers to broad questions pertaining to privacy and its implications in the design of industry-grade systems. This article aims to provide a reference point for the algorithmic and design decisions within the realm of privacy, highlighting important challenges and potential research directions. Covering a wide spectrum of topics, this article delves into the infrastructure needs for designing private systems, methods for achieving better privacy/utility trade-offs, performing privacy attacks and auditing, as well as communicating privacy with broader audiences and stakeholders. △ Less

Submitted 12 March, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

arXiv:2303.03446 [pdf, other]

Students Parrot Their Teachers: Membership Inference on Model Distillation

Authors: Matthew Jagielski, Milad Nasr, Christopher Choquette-Choo, Katherine Lee, Nicholas Carlini

Abstract: Model distillation is frequently proposed as a technique to reduce the privacy leakage of machine learning. These empirical privacy defenses rely on the intuition that distilled ``student'' models protect the privacy of training data, as they only interact with this data indirectly through a ``teacher'' model. In this work, we design membership inference attacks to systematically study the privacy… ▽ More Model distillation is frequently proposed as a technique to reduce the privacy leakage of machine learning. These empirical privacy defenses rely on the intuition that distilled ``student'' models protect the privacy of training data, as they only interact with this data indirectly through a ``teacher'' model. In this work, we design membership inference attacks to systematically study the privacy provided by knowledge distillation to both the teacher and student training sets. Our new attacks show that distillation alone provides only limited privacy across a number of domains. We explain the success of our attacks on distillation by showing that membership inference attacks on a private dataset can succeed even if the target model is *never* queried on any actual training points, but only on inputs whose predictions are highly influenced by training data. Finally, we show that our attacks are strongest when student and teacher sets are similar, or when the attacker can poison the teacher set. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: 16 pages, 12 figures

arXiv:2302.13464 [pdf, other]

Randomness in ML Defenses Helps Persistent Attackers and Hinders Evaluators

Authors: Keane Lucas, Matthew Jagielski, Florian Tramèr, Lujo Bauer, Nicholas Carlini

Abstract: It is becoming increasingly imperative to design robust ML defenses. However, recent work has found that many defenses that initially resist state-of-the-art attacks can be broken by an adaptive adversary. In this work we take steps to simplify the design of defenses and argue that white-box defenses should eschew randomness when possible. We begin by illustrating a new issue with the deployment o… ▽ More It is becoming increasingly imperative to design robust ML defenses. However, recent work has found that many defenses that initially resist state-of-the-art attacks can be broken by an adaptive adversary. In this work we take steps to simplify the design of defenses and argue that white-box defenses should eschew randomness when possible. We begin by illustrating a new issue with the deployment of randomized defenses that reduces their security compared to their deterministic counterparts. We then provide evidence that making defenses deterministic simplifies robustness evaluation, without reducing the effectiveness of a truly robust defense. Finally, we introduce a new defense evaluation framework that leverages a defense's deterministic nature to better evaluate its adversarial robustness. △ Less

Submitted 26 February, 2023; originally announced February 2023.

arXiv:2302.10149 [pdf, other]

Poisoning Web-Scale Training Datasets is Practical

Authors: Nicholas Carlini, Matthew Jagielski, Christopher A. Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, Florian Tramèr

Abstract: Deep learning models are often trained on distributed, web-scale datasets crawled from the internet. In this paper, we introduce two new dataset poisoning attacks that intentionally introduce malicious examples to a model's performance. Our attacks are immediately practical and could, today, poison 10 popular datasets. Our first attack, split-view poisoning, exploits the mutable nature of internet… ▽ More Deep learning models are often trained on distributed, web-scale datasets crawled from the internet. In this paper, we introduce two new dataset poisoning attacks that intentionally introduce malicious examples to a model's performance. Our attacks are immediately practical and could, today, poison 10 popular datasets. Our first attack, split-view poisoning, exploits the mutable nature of internet content to ensure a dataset annotator's initial view of the dataset differs from the view downloaded by subsequent clients. By exploiting specific invalid trust assumptions, we show how we could have poisoned 0.01% of the LAION-400M or COYO-700M datasets for just $60 USD. Our second attack, frontrunning poisoning, targets web-scale datasets that periodically snapshot crowd-sourced content -- such as Wikipedia -- where an attacker only needs a time-limited window to inject malicious examples. In light of both attacks, we notify the maintainers of each affected dataset and recommended several low-overhead defenses. △ Less

Submitted 6 May, 2024; v1 submitted 20 February, 2023; originally announced February 2023.

arXiv:2302.07956 [pdf, other]

Tight Auditing of Differentially Private Machine Learning

Authors: Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski, Nicholas Carlini, Andreas Terzis

Abstract: Auditing mechanisms for differential privacy use probabilistic means to empirically estimate the privacy level of an algorithm. For private machine learning, existing auditing mechanisms are tight: the empirical privacy estimate (nearly) matches the algorithm's provable privacy guarantee. But these auditing techniques suffer from two limitations. First, they only give tight estimates under implaus… ▽ More Auditing mechanisms for differential privacy use probabilistic means to empirically estimate the privacy level of an algorithm. For private machine learning, existing auditing mechanisms are tight: the empirical privacy estimate (nearly) matches the algorithm's provable privacy guarantee. But these auditing techniques suffer from two limitations. First, they only give tight estimates under implausible worst-case assumptions (e.g., a fully adversarial dataset). Second, they require thousands or millions of training runs to produce non-trivial statistical estimates of the privacy leakage. This work addresses both issues. We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets -- if the adversary can see all model updates during training. Prior auditing works rely on the same assumption, which is permitted under the standard differential privacy threat model. This threat model is also applicable, e.g., in federated learning settings. Moreover, our auditing scheme requires only two training runs (instead of thousands) to produce tight privacy estimates, by adapting recent advances in tight composition theorems for differential privacy. We demonstrate the utility of our improved auditing schemes by surfacing implementation bugs in private machine learning code that eluded prior auditing techniques. △ Less

Submitted 15 February, 2023; originally announced February 2023.

arXiv:2301.13188 [pdf, other]

Extracting Training Data from Diffusion Models

Authors: Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace

Abstract: Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the… ▽ More Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models, ranging from photographs of individual people to trademarked company logos. We also train hundreds of diffusion models in various settings to analyze how different modeling and data decisions affect privacy. Overall, our results show that diffusion models are much less private than prior generative models such as GANs, and that mitigating these vulnerabilities may require new advances in privacy-preserving training. △ Less

Submitted 30 January, 2023; originally announced January 2023.

arXiv:2210.17546 [pdf, other]

Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy

Authors: Daphne Ippolito, Florian Tramèr, Milad Nasr, Chiyuan Zhang, Matthew Jagielski, Katherine Lee, Christopher A. Choquette-Choo, Nicholas Carlini

Abstract: Studying data memorization in neural language models helps us understand the risks (e.g., to privacy or copyright) associated with models regurgitating training data and aids in the development of countermeasures. Many prior works -- and some recently deployed defenses -- focus on "verbatim memorization", defined as a model generation that exactly matches a substring from the training set. We argu… ▽ More Studying data memorization in neural language models helps us understand the risks (e.g., to privacy or copyright) associated with models regurgitating training data and aids in the development of countermeasures. Many prior works -- and some recently deployed defenses -- focus on "verbatim memorization", defined as a model generation that exactly matches a substring from the training set. We argue that verbatim memorization definitions are too restrictive and fail to capture more subtle forms of memorization. Specifically, we design and implement an efficient defense that perfectly prevents all verbatim memorization. And yet, we demonstrate that this "perfect" filter does not prevent the leakage of training data. Indeed, it is easily circumvented by plausible and minimally modified "style-transfer" prompts -- and in some cases even the non-modified original prompts -- to extract memorized information. We conclude by discussing potential alternative definitions and why defining memorization is a difficult yet crucial open question for neural language models. △ Less

Submitted 11 September, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

arXiv:2208.12911 [pdf, other]

Network-Level Adversaries in Federated Learning

Authors: Giorgio Severi, Matthew Jagielski, Gökberk Yar, Yuxuan Wang, Alina Oprea, Cristina Nita-Rotaru

Abstract: Federated learning is a popular strategy for training models on distributed, sensitive data, while preserving data privacy. Prior work identified a range of security threats on federated learning protocols that poison the data or the model. However, federated learning is a networked system where the communication between clients and server plays a critical role for the learning task performance. W… ▽ More Federated learning is a popular strategy for training models on distributed, sensitive data, while preserving data privacy. Prior work identified a range of security threats on federated learning protocols that poison the data or the model. However, federated learning is a networked system where the communication between clients and server plays a critical role for the learning task performance. We highlight how communication introduces another vulnerability surface in federated learning and study the impact of network-level adversaries on training federated learning models. We show that attackers drop** the network traffic from carefully selected clients can significantly decrease model accuracy on a target population. Moreover, we show that a coordinated poisoning campaign from a few clients can amplify the drop** attacks. Finally, we develop a server-side defense which mitigates the impact of our attacks by identifying and up-sampling clients likely to positively contribute towards target accuracy. We comprehensively evaluate our attacks and defenses on three datasets, assuming encrypted communication channels and attackers with partial visibility of the network. △ Less

Submitted 26 August, 2022; originally announced August 2022.

Comments: 12 pages. Appearing at IEEE CNS 2022

arXiv:2208.12348 [pdf, other]

SNAP: Efficient Extraction of Private Properties with Poisoning

Authors: Harsh Chaudhari, John Abascal, Alina Oprea, Matthew Jagielski, Florian Tramèr, Jonathan Ullman

Abstract: Property inference attacks allow an adversary to extract global properties of the training dataset from a machine learning model. Such attacks have privacy implications for data owners sharing their datasets to train machine learning models. Several existing approaches for property inference attacks against deep neural networks have been proposed, but they all rely on the attacker training a large… ▽ More Property inference attacks allow an adversary to extract global properties of the training dataset from a machine learning model. Such attacks have privacy implications for data owners sharing their datasets to train machine learning models. Several existing approaches for property inference attacks against deep neural networks have been proposed, but they all rely on the attacker training a large number of shadow models, which induces a large computational overhead. In this paper, we consider the setting of property inference attacks in which the attacker can poison a subset of the training dataset and query the trained target model. Motivated by our theoretical analysis of model confidences under poisoning, we design an efficient property inference attack, SNAP, which obtains higher attack success and requires lower amounts of poisoning than the state-of-the-art poisoning-based property inference attack by Mahloujifar et al. For example, on the Census dataset, SNAP achieves 34% higher success rate than Mahloujifar et al. while being 56.5x faster. We also extend our attack to infer whether a certain property was present at all during training and estimate the exact proportion of a property of interest efficiently. We evaluate our attack on several properties of varying proportions from four datasets and demonstrate SNAP's generality and effectiveness. An open-source implementation of SNAP can be found at https://github.com/johnmath/snap-sp23. △ Less

Submitted 21 June, 2023; v1 submitted 25 August, 2022; originally announced August 2022.

Comments: 28 pages, 16 figures

arXiv:2207.00099 [pdf, other]

Measuring Forgetting of Memorized Training Examples

Authors: Matthew Jagielski, Om Thakkar, Florian Tramèr, Daphne Ippolito, Katherine Lee, Nicholas Carlini, Eric Wallace, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Chiyuan Zhang

Abstract: Machine learning models exhibit two seemingly contradictory phenomena: training data memorization, and various forms of forgetting. In memorization, models overfit specific training examples and become susceptible to privacy attacks. In forgetting, examples which appeared early in training are forgotten by the end. In this work, we connect these phenomena. We propose a technique to measure to what… ▽ More Machine learning models exhibit two seemingly contradictory phenomena: training data memorization, and various forms of forgetting. In memorization, models overfit specific training examples and become susceptible to privacy attacks. In forgetting, examples which appeared early in training are forgotten by the end. In this work, we connect these phenomena. We propose a technique to measure to what extent models "forget" the specifics of training examples, becoming less susceptible to privacy attacks on examples they have not seen recently. We show that, while non-convex models can memorize data forever in the worst-case, standard image, speech, and language models empirically do forget examples over time. We identify nondeterminism as a potential explanation, showing that deterministically trained models do not forget. Our results suggest that examples seen early when training with extremely large datasets - for instance those examples used to pre-train a model - may observe privacy benefits at the expense of examples seen later. △ Less

Submitted 9 May, 2023; v1 submitted 30 June, 2022; originally announced July 2022.

Comments: Appeared at ICLR '23, 22 pages, 12 figures

arXiv:2206.10469 [pdf, other]

The Privacy Onion Effect: Memorization is Relative

Authors: Nicholas Carlini, Matthew Jagielski, Chiyuan Zhang, Nicolas Papernot, Andreas Terzis, Florian Tramer

Abstract: Machine learning models trained on private datasets have been shown to leak their private data. While recent work has found that the average data point is rarely leaked, the outlier samples are frequently subject to memorization and, consequently, privacy leakage. We demonstrate and analyse an Onion Effect of memorization: removing the "layer" of outlier points that are most vulnerable to a privac… ▽ More Machine learning models trained on private datasets have been shown to leak their private data. While recent work has found that the average data point is rarely leaked, the outlier samples are frequently subject to memorization and, consequently, privacy leakage. We demonstrate and analyse an Onion Effect of memorization: removing the "layer" of outlier points that are most vulnerable to a privacy attack exposes a new layer of previously-safe points to the same attack. We perform several experiments to study this effect, and understand why it occurs. The existence of this effect has various consequences. For example, it suggests that proposals to defend against memorization without training with rigorous privacy guarantees are unlikely to be effective. Further, it suggests that privacy-enhancing technologies such as machine unlearning could actually harm the privacy of other users. △ Less

Submitted 22 June, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

arXiv:2205.09986 [pdf, other]

SafeNet: The Unreasonable Effectiveness of Ensembles in Private Collaborative Learning

Authors: Harsh Chaudhari, Matthew Jagielski, Alina Oprea

Abstract: Secure multiparty computation (MPC) has been proposed to allow multiple mutually distrustful data owners to jointly train machine learning (ML) models on their combined data. However, by design, MPC protocols faithfully compute the training functionality, which the adversarial ML community has shown to leak private information and can be tampered with in poisoning attacks. In this work, we argue t… ▽ More Secure multiparty computation (MPC) has been proposed to allow multiple mutually distrustful data owners to jointly train machine learning (ML) models on their combined data. However, by design, MPC protocols faithfully compute the training functionality, which the adversarial ML community has shown to leak private information and can be tampered with in poisoning attacks. In this work, we argue that model ensembles, implemented in our framework called SafeNet, are a highly MPC-amenable way to avoid many adversarial ML attacks. The natural partitioning of data amongst owners in MPC training allows this approach to be highly scalable at training time, provide provable protection from poisoning attacks, and provably defense against a number of privacy attacks. We demonstrate SafeNet's efficiency, accuracy, and resilience to poisoning on several machine learning datasets and models trained in end-to-end and transfer learning scenarios. For instance, SafeNet reduces backdoor attack success significantly, while achieving $39\times$ faster training and $36 \times$ less communication than the four-party MPC framework of Dalskov et al. Our experiments show that ensembling retains these benefits even in many non-iid settings. The simplicity, cheap setup, and robustness properties of ensembling make it a strong first choice for training ML models privately in MPC. △ Less

Submitted 8 September, 2022; v1 submitted 20 May, 2022; originally announced May 2022.

arXiv:2205.06369 [pdf, other]

How to Combine Membership-Inference Attacks on Multiple Updated Models

Authors: Matthew Jagielski, Stanley Wu, Alina Oprea, Jonathan Ullman, Roxana Geambasu

Abstract: A large body of research has shown that machine learning models are vulnerable to membership inference (MI) attacks that violate the privacy of the participants in the training data. Most MI research focuses on the case of a single standalone model, while production machine-learning platforms often update models over time, on data that often shifts in distribution, giving the attacker more informa… ▽ More A large body of research has shown that machine learning models are vulnerable to membership inference (MI) attacks that violate the privacy of the participants in the training data. Most MI research focuses on the case of a single standalone model, while production machine-learning platforms often update models over time, on data that often shifts in distribution, giving the attacker more information. This paper proposes new attacks that take advantage of one or more model updates to improve MI. A key part of our approach is to leverage rich information from standalone MI attacks mounted separately against the original and updated models, and to combine this information in specific ways to improve attack effectiveness. We propose a set of combination functions and tuning methods for each, and present both analytical and quantitative justification for various options. Our results on four public datasets show that our attacks are effective at using update information to give the adversary a significant advantage over attacks on standalone models, but also compared to a prior MI attack that takes advantage of model updates in a related machine-unlearning setting. We perform the first measurements of the impact of distribution shift on MI attacks with model updates, and show that a more drastic distribution shift results in significantly higher MI risk than a gradual shift. Our code is available at https://www.github.com/stanleykywu/model-updates. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: 31 pages, 9 figures

arXiv:2205.02414 [pdf, other]

doi 10.1145/3531146.3533128

Subverting Fair Image Search with Generative Adversarial Perturbations

Authors: Avijit Ghosh, Matthew Jagielski, Christo Wilson

Abstract: In this work we explore the intersection fairness and robustness in the context of ranking: when a ranking model has been calibrated to achieve some definition of fairness, is it possible for an external adversary to make the ranking model behave unfairly without having access to the model or training data? To investigate this question, we present a case study in which we develop and then attack a… ▽ More In this work we explore the intersection fairness and robustness in the context of ranking: when a ranking model has been calibrated to achieve some definition of fairness, is it possible for an external adversary to make the ranking model behave unfairly without having access to the model or training data? To investigate this question, we present a case study in which we develop and then attack a state-of-the-art, fairness-aware image search engine using images that have been maliciously modified using a Generative Adversarial Perturbation (GAP) model. These perturbations attempt to cause the fair re-ranking algorithm to unfairly boost the rank of images containing people from an adversary-selected subpopulation. We present results from extensive experiments demonstrating that our attacks can successfully confer significant unfair advantage to people from the majority class relative to fairly-ranked baseline search results. We demonstrate that our attacks are robust across a number of variables, that they have close to zero impact on the relevance of search results, and that they succeed under a strict threat model. Our findings highlight the danger of deploying fair machine learning algorithms in-the-wild when (1) the data necessary to achieve fairness may be adversarially manipulated, and (2) the models themselves are not robust against attacks. △ Less

Submitted 6 May, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

Comments: Accepted as a full paper at the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT 22)

arXiv:2204.00032 [pdf, other]

Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets

Authors: Florian Tramèr, Reza Shokri, Ayrton San Joaquin, Hoang Le, Matthew Jagielski, Sanghyun Hong, Nicholas Carlini

Abstract: We introduce a new class of attacks on machine learning models. We show that an adversary who can poison a training dataset can cause models trained on this dataset to leak significant private details of training points belonging to other parties. Our active inference attacks connect two independent lines of work targeting the integrity and privacy of machine learning training data. Our attacks… ▽ More We introduce a new class of attacks on machine learning models. We show that an adversary who can poison a training dataset can cause models trained on this dataset to leak significant private details of training points belonging to other parties. Our active inference attacks connect two independent lines of work targeting the integrity and privacy of machine learning training data. Our attacks are effective across membership inference, attribute inference, and data extraction. For example, our targeted attacks can poison <0.1% of the training dataset to boost the performance of inference attacks by 1 to 2 orders of magnitude. Further, an adversary who controls a significant fraction of the training data (e.g., 50%) can launch untargeted attacks that enable 8x more precise inference on all other users' otherwise-private data points. Our results cast doubts on the relevance of cryptographic privacy guarantees in multiparty computation protocols for machine learning, if parties can arbitrarily select their share of training data. △ Less

Submitted 6 October, 2022; v1 submitted 31 March, 2022; originally announced April 2022.

Comments: ACM CCS 2022

arXiv:2202.12219 [pdf, other]

Debugging Differential Privacy: A Case Study for Privacy Auditing

Authors: Florian Tramer, Andreas Terzis, Thomas Steinke, Shuang Song, Matthew Jagielski, Nicholas Carlini

Abstract: Differential Privacy can provide provable privacy guarantees for training data in machine learning. However, the presence of proofs does not preclude the presence of errors. Inspired by recent advances in auditing which have been used for estimating lower bounds on differentially private algorithms, here we show that auditing can also be used to find flaws in (purportedly) differentially private s… ▽ More Differential Privacy can provide provable privacy guarantees for training data in machine learning. However, the presence of proofs does not preclude the presence of errors. Inspired by recent advances in auditing which have been used for estimating lower bounds on differentially private algorithms, here we show that auditing can also be used to find flaws in (purportedly) differentially private schemes. In this case study, we audit a recent open source implementation of a differentially private deep learning algorithm and find, with 99.99999999% confidence, that the implementation does not satisfy the claimed differential privacy guarantee. △ Less

Submitted 28 March, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

arXiv:2202.07646 [pdf, other]

Quantifying Memorization Across Neural Language Models

Authors: Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramer, Chiyuan Zhang

Abstract: Large language models (LMs) have been shown to memorize parts of their training data, and when prompted appropriately, they will emit the memorized training data verbatim. This is undesirable because memorization violates privacy (exposing user data), degrades utility (repeated easy-to-memorize text is often low quality), and hurts fairness (some texts are memorized over others). We describe thr… ▽ More Large language models (LMs) have been shown to memorize parts of their training data, and when prompted appropriately, they will emit the memorized training data verbatim. This is undesirable because memorization violates privacy (exposing user data), degrades utility (repeated easy-to-memorize text is often low quality), and hurts fairness (some texts are memorized over others). We describe three log-linear relationships that quantify the degree to which LMs emit memorized training data. Memorization significantly grows as we increase (1) the capacity of a model, (2) the number of times an example has been duplicated, and (3) the number of tokens of context used to prompt the model. Surprisingly, we find the situation becomes more complicated when generalizing these results across model families. On the whole, we find that memorization in LMs is more prevalent than previously believed and will likely get worse as models continues to scale, at least without active mitigations. △ Less

Submitted 6 March, 2023; v1 submitted 15 February, 2022; originally announced February 2022.

arXiv:2112.12938 [pdf, other]

Counterfactual Memorization in Neural Language Models

Authors: Chiyuan Zhang, Daphne Ippolito, Katherine Lee, Matthew Jagielski, Florian Tramèr, Nicholas Carlini

Abstract: Modern neural language models that are widely used in various NLP tasks risk memorizing sensitive information from their training data. Understanding this memorization is important in real world applications and also from a learning-theoretical perspective. An open question in previous studies of language model memorization is how to filter out "common" memorization. In fact, most memorization cri… ▽ More Modern neural language models that are widely used in various NLP tasks risk memorizing sensitive information from their training data. Understanding this memorization is important in real world applications and also from a learning-theoretical perspective. An open question in previous studies of language model memorization is how to filter out "common" memorization. In fact, most memorization criteria strongly correlate with the number of occurrences in the training set, capturing memorized familiar phrases, public knowledge, templated texts, or other repeated data. We formulate a notion of counterfactual memorization which characterizes how a model's predictions change if a particular document is omitted during training. We identify and study counterfactually-memorized training examples in standard text datasets. We estimate the influence of each memorized training example on the validation set and on generated texts, showing how this can provide direct evidence of the source of memorization at test time. △ Less

Submitted 13 October, 2023; v1 submitted 23 December, 2021; originally announced December 2021.

Comments: NeurIPS 2023; 42 pages, 33 figures

arXiv:2012.07805 [pdf, other]

Extracting Training Data from Large Language Models

Authors: Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, Colin Raffel

Abstract: It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover individual training examples by querying the language model. We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and ar… ▽ More It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover individual training examples by querying the language model. We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model's training data. These extracted examples include (public) personally identifiable information (names, phone numbers, and email addresses), IRC conversations, code, and 128-bit UUIDs. Our attack is possible even though each of the above sequences are included in just one document in the training data. We comprehensively evaluate our extraction attack to understand the factors that contribute to its success. Worryingly, we find that larger models are more vulnerable than smaller models. We conclude by drawing lessons and discussing possible safeguards for training large language models. △ Less

Submitted 15 June, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

arXiv:2006.14026 [pdf, other]

Subpopulation Data Poisoning Attacks

Authors: Matthew Jagielski, Giorgio Severi, Niklas Pousette Harger, Alina Oprea

Abstract: Machine learning systems are deployed in critical settings, but they might fail in unexpected ways, impacting the accuracy of their predictions. Poisoning attacks against machine learning induce adversarial modification of data used by a machine learning algorithm to selectively change its output when it is deployed. In this work, we introduce a novel data poisoning attack called a \emph{subpopula… ▽ More Machine learning systems are deployed in critical settings, but they might fail in unexpected ways, impacting the accuracy of their predictions. Poisoning attacks against machine learning induce adversarial modification of data used by a machine learning algorithm to selectively change its output when it is deployed. In this work, we introduce a novel data poisoning attack called a \emph{subpopulation attack}, which is particularly relevant when datasets are large and diverse. We design a modular framework for subpopulation attacks, instantiate it with different building blocks, and show that the attacks are effective for a variety of datasets and machine learning models. We further optimize the attacks in continuous domains using influence functions and gradient optimization methods. Compared to existing backdoor poisoning attacks, subpopulation attacks have the advantage of inducing misclassification in naturally distributed data points at inference time, making the attacks extremely stealthy. We also show that our attack strategy can be used to improve upon existing targeted attacks. We prove that, under some assumptions, subpopulation attacks are impossible to defend against, and empirically demonstrate the limitations of existing defenses against our attacks, highlighting the difficulty of protecting machine learning against this threat. △ Less

Submitted 12 May, 2021; v1 submitted 24 June, 2020; originally announced June 2020.

Comments: May12 update: add sever + backdoor defenses, comparison to witches' brew attack, better comparison to related work, transferability of representations for cmatch

arXiv:2006.07709 [pdf, other]

Auditing Differentially Private Machine Learning: How Private is Private SGD?

Authors: Matthew Jagielski, Jonathan Ullman, Alina Oprea

Abstract: We investigate whether Differentially Private SGD offers better privacy in practice than what is guaranteed by its state-of-the-art analysis. We do so via novel data poisoning attacks, which we show correspond to realistic privacy attacks. While previous work (Ma et al., arXiv 2019) proposed this connection between differential privacy and data poisoning as a defense against data poisoning, our us… ▽ More We investigate whether Differentially Private SGD offers better privacy in practice than what is guaranteed by its state-of-the-art analysis. We do so via novel data poisoning attacks, which we show correspond to realistic privacy attacks. While previous work (Ma et al., arXiv 2019) proposed this connection between differential privacy and data poisoning as a defense against data poisoning, our use as a tool for understanding the privacy of a specific mechanism is new. More generally, our work takes a quantitative, empirical approach to understanding the privacy afforded by specific implementations of differentially private algorithms that we believe has the potential to complement and influence analytical work on differential privacy. △ Less

Submitted 13 June, 2020; originally announced June 2020.

arXiv:2003.04884 [pdf, other]

Cryptanalytic Extraction of Neural Network Models

Authors: Nicholas Carlini, Matthew Jagielski, Ilya Mironov

Abstract: We argue that the machine learning problem of model extraction is actually a cryptanalytic problem in disguise, and should be studied as such. Given oracle access to a neural network, we introduce a differential attack that can efficiently steal the parameters of the remote model up to floating point precision. Our attack relies on the fact that ReLU neural networks are piecewise linear functions,… ▽ More We argue that the machine learning problem of model extraction is actually a cryptanalytic problem in disguise, and should be studied as such. Given oracle access to a neural network, we introduce a differential attack that can efficiently steal the parameters of the remote model up to floating point precision. Our attack relies on the fact that ReLU neural networks are piecewise linear functions, and thus queries at the critical points reveal information about the model parameters. We evaluate our attack on multiple neural network models and extract models that are 2^20 times more precise and require 100x fewer queries than prior work. For example, we extract a 100,000 parameter neural network trained on the MNIST digit recognition task with 2^21.5 queries in under an hour, such that the extracted model agrees with the oracle on all inputs up to a worst-case error of 2^-25, or a model with 4,000 parameters in 2^18.5 queries with worst-case error of 2^-40.4. Code is available at https://github.com/google-research/cryptanalytic-model-extraction. △ Less

Submitted 22 July, 2020; v1 submitted 10 March, 2020; originally announced March 2020.

arXiv:1909.01838 [pdf, other]

High Accuracy and High Fidelity Extraction of Neural Networks

Authors: Matthew Jagielski, Nicholas Carlini, David Berthelot, Alex Kurakin, Nicolas Papernot

Abstract: In a model extraction attack, an adversary steals a copy of a remotely deployed machine learning model, given oracle prediction access. We taxonomize model extraction attacks around two objectives: *accuracy*, i.e., performing well on the underlying learning task, and *fidelity*, i.e., matching the predictions of the remote victim classifier on any input. To extract a high-accuracy model, we dev… ▽ More In a model extraction attack, an adversary steals a copy of a remotely deployed machine learning model, given oracle prediction access. We taxonomize model extraction attacks around two objectives: *accuracy*, i.e., performing well on the underlying learning task, and *fidelity*, i.e., matching the predictions of the remote victim classifier on any input. To extract a high-accuracy model, we develop a learning-based attack exploiting the victim to supervise the training of an extracted model. Through analytical and empirical arguments, we then explain the inherent limitations that prevent any learning-based strategy from extracting a truly high-fidelity model---i.e., extracting a functionally-equivalent model whose predictions are identical to those of the victim model on all possible inputs. Addressing these limitations, we expand on prior work to develop the first practical functionally-equivalent extraction attack for direct extraction (i.e., without training) of a model's weights. We perform experiments both on academic datasets and a state-of-the-art image classifier trained with 1 billion proprietary images. In addition to broadening the scope of model extraction research, our work demonstrates the practicality of model extraction attacks against production-grade systems. △ Less

Submitted 3 March, 2020; v1 submitted 3 September, 2019; originally announced September 2019.

Comments: USENIX Security 2020, 18 pages, 6 figures

arXiv:1812.02696 [pdf, other]

Differentially Private Fair Learning

Authors: Matthew Jagielski, Michael Kearns, Jieming Mao, Alina Oprea, Aaron Roth, Saeed Sharifi-Malvajerdi, Jonathan Ullman

Abstract: Motivated by settings in which predictive models may be required to be non-discriminatory with respect to certain attributes (such as race), but even collecting the sensitive attribute may be forbidden or restricted, we initiate the study of fair learning under the constraint of differential privacy. We design two learning algorithms that simultaneously promise differential privacy and equalized o… ▽ More Motivated by settings in which predictive models may be required to be non-discriminatory with respect to certain attributes (such as race), but even collecting the sensitive attribute may be forbidden or restricted, we initiate the study of fair learning under the constraint of differential privacy. We design two learning algorithms that simultaneously promise differential privacy and equalized odds, a 'fairness' condition that corresponds to equalizing false positive and negative rates across protected groups. Our first algorithm is a private implementation of the equalized odds post-processing approach of [Hardt et al., 2016]. This algorithm is appealingly simple, but must be able to use protected group membership explicitly at test time, which can be viewed as a form of 'disparate treatment'. Our second algorithm is a differentially private version of the oracle-efficient in-processing approach of [Agarwal et al., 2018] that can be used to find the optimal fair classifier, given access to a subroutine that can solve the original (not necessarily fair) learning problem. This algorithm is more complex but need not have access to protected group membership at test time. We identify new tradeoffs between fairness, accuracy, and privacy that emerge only when requiring all three properties, and show that these tradeoffs can be milder if group membership may be used at test time. We conclude with a brief experimental evaluation. △ Less

Submitted 31 May, 2019; v1 submitted 6 December, 2018; originally announced December 2018.

arXiv:1809.02861 [pdf, other]

Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks

Authors: Ambra Demontis, Marco Melis, Maura Pintor, Matthew Jagielski, Battista Biggio, Alina Oprea, Cristina Nita-Rotaru, Fabio Roli

Abstract: Transferability captures the ability of an attack against a machine-learning model to be effective against a different, potentially unknown, model. Empirical evidence for transferability has been shown in previous work, but the underlying reasons why an attack transfers or not are not yet well understood. In this paper, we present a comprehensive analysis aimed to investigate the transferability o… ▽ More Transferability captures the ability of an attack against a machine-learning model to be effective against a different, potentially unknown, model. Empirical evidence for transferability has been shown in previous work, but the underlying reasons why an attack transfers or not are not yet well understood. In this paper, we present a comprehensive analysis aimed to investigate the transferability of both test-time evasion and training-time poisoning attacks. We provide a unifying optimization framework for evasion and poisoning attacks, and a formal definition of transferability of such attacks. We highlight two main factors contributing to attack transferability: the intrinsic adversarial vulnerability of the target model, and the complexity of the surrogate model used to optimize the attack. Based on these insights, we define three metrics that impact an attack's transferability. Interestingly, our results derived from theoretical analysis hold for both evasion and poisoning attacks, and are confirmed experimentally using a wide range of linear and non-linear classifiers and datasets. △ Less

Submitted 13 June, 2019; v1 submitted 8 September, 2018; originally announced September 2018.

MSC Class: 68T10; 68T45

arXiv:1804.00308 [pdf, other]

Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning

Authors: Matthew Jagielski, Alina Oprea, Battista Biggio, Chang Liu, Cristina Nita-Rotaru, Bo Li

Abstract: As machine learning becomes widely used for automated decisions, attackers have strong incentives to manipulate the results and models generated by machine learning algorithms. In this paper, we perform the first systematic study of poisoning attacks and their countermeasures for linear regression models. In poisoning attacks, attackers deliberately influence the training data to manipulate the re… ▽ More As machine learning becomes widely used for automated decisions, attackers have strong incentives to manipulate the results and models generated by machine learning algorithms. In this paper, we perform the first systematic study of poisoning attacks and their countermeasures for linear regression models. In poisoning attacks, attackers deliberately influence the training data to manipulate the results of a predictive model. We propose a theoretically-grounded optimization framework specifically designed for linear regression and demonstrate its effectiveness on a range of datasets and models. We also introduce a fast statistical attack that requires limited knowledge of the training process. Finally, we design a new principled defense method that is highly resilient against all poisoning attacks. We provide formal guarantees about its convergence and an upper bound on the effect of poisoning attacks when the defense is deployed. We evaluate extensively our attacks and defenses on three realistic datasets from health care, loan assessment, and real estate domains. △ Less

Submitted 28 September, 2021; v1 submitted 1 April, 2018; originally announced April 2018.

Comments: Preprint of the work accepted for publication at the 39th IEEE Symposium on Security and Privacy, San Francisco, CA, USA, May 21-23, 2018; Sept 28 '21 update: add citation to trimmed losses

arXiv:1610.08921 [pdf, other]

doi 10.1016/j.physa.2017.04.115

Theory of earthquakes interevent times applied to financial markets

Authors: Maciej Jagielski, Ryszard Kutner, Didier Sornette

Abstract: We analyze the probability density function (PDF) of waiting times between financial loss exceedances. The empirical PDFs are fitted with the self-excited Hawkes conditional Poisson process with a long power law memory kernel. The Hawkes process is the simplest extension of the Poisson process that takes into account how past events influence the occurrence of future events. By analyzing the empir… ▽ More We analyze the probability density function (PDF) of waiting times between financial loss exceedances. The empirical PDFs are fitted with the self-excited Hawkes conditional Poisson process with a long power law memory kernel. The Hawkes process is the simplest extension of the Poisson process that takes into account how past events influence the occurrence of future events. By analyzing the empirical data for 15 different financial assets, we show that the formalism of the Hawkes process used for earthquakes can successfully model the PDF of interevent times between successive market losses. △ Less

Submitted 28 October, 2016; v1 submitted 27 October, 2016; originally announced October 2016.

arXiv:1610.08918 [pdf, ps, other]

doi 10.1016/j.physa.2017.01.077

Income and wealth distribution of the richest Norwegian individuals: An inequality analysis

Authors: Maciej Jagielski, Kordian Czyżewski, Ryszard Kutner, H. Eugene Stanley

Abstract: Using the empirical data from the Norwegian tax office, we analyse the wealth and income of the richest individuals in Norway during the period 2010--2013. We find that both annual income and wealth level of the richest individuals are describable using the Pareto law. We find that the robust mean Pareto exponent over the four-year period to be $\approx 2.3$ for income and $\approx 1.5$ for wealth… ▽ More Using the empirical data from the Norwegian tax office, we analyse the wealth and income of the richest individuals in Norway during the period 2010--2013. We find that both annual income and wealth level of the richest individuals are describable using the Pareto law. We find that the robust mean Pareto exponent over the four-year period to be $\approx 2.3$ for income and $\approx 1.5$ for wealth. △ Less

Submitted 27 October, 2016; originally announced October 2016.

arXiv:1509.06315 [pdf, other]

doi 10.1103/PhysRevE.94.042305

Universality of market superstatistics

Authors: Mateusz Denys, Maciej Jagielski, Tomasz Gubiec, Ryszard Kutner, H. Eugene Stanley

Abstract: We use a continuous-time random walk (CTRW) to model market fluctuation data from times when traders experience excessive losses or excessive profits. We analytically derive "superstatistics" that accurately model empirical market activity data (supplied by Bogachev, Ludescher, Tsallis, and Bunde)that exhibit transition thresholds. We measure the interevent times between excessive losses and exces… ▽ More We use a continuous-time random walk (CTRW) to model market fluctuation data from times when traders experience excessive losses or excessive profits. We analytically derive "superstatistics" that accurately model empirical market activity data (supplied by Bogachev, Ludescher, Tsallis, and Bunde)that exhibit transition thresholds. We measure the interevent times between excessive losses and excessive profits, and use the mean interevent time as a control variable to derive a universal description of empirical data collapse. Our superstatistic value is a weighted sum of two components, (i) a powerlaw corrected by the lower incomplete gamma function, which asymptotically tends toward robustness but initially gives an exponential, and (ii) a powerlaw damped by the upper incomplete gamma function, which tends toward the power-law only during short interevent times. We find that the scaling shape exponents that drive both components subordinate themselves and a "superscaling" configuration emerges. We use superstatistics to describe the hierarchical activity when component (i) reproduces the negative feedback and component (ii) reproduces the stylized fact of volatility clustering. Our results indicate that there is a functional (but not literal) balance between excessive profits and excessive losses that can be described using the same body of superstatistics, but different calibration values and driving parameters. △ Less

Submitted 10 September, 2015; originally announced September 2015.

Journal ref: Phys. Rev. E 94, 042305 (2016)

arXiv:1411.1560 [pdf, ps, other]

doi 10.1016/j.physa.2015.03.071

Income Distribution in the European Union Versus in the United States

Authors: Maciej Jagielski, Rafał Duczmal, Ryszard Kutner

Abstract: We prove that the refined approach -- our extension of the Yakovenko et al. formalism -- is universal in the sense that it describes well both household incomes in the European Union and the individual incomes in the United States for social classes of any income. This formalism allowed the study of the impact of the recent world-wide financial crisis on the annual incomes of different social clas… ▽ More We prove that the refined approach -- our extension of the Yakovenko et al. formalism -- is universal in the sense that it describes well both household incomes in the European Union and the individual incomes in the United States for social classes of any income. This formalism allowed the study of the impact of the recent world-wide financial crisis on the annual incomes of different social classes. Hence, we indicate the existence of a possible precursor of a market crisis. Besides, we find the most painful impact of the crisis on incomes of all social classes. △ Less

Submitted 6 November, 2014; originally announced November 2014.

arXiv:1312.2722 [pdf, ps, other]

Modelling of the European Union income distribution by extended Yakovenko formula

Authors: Maciej Jagielski, Ryszard Kutner

Abstract: We found a unified formula for description of the household incomes of all society classes, for instance, for the European Union in years 2005-2010. The formula is more general than well known that of Yakovenko et al. because, it satisfactorily describes not only the household incomes of low- and medium-income society classes but also the household incomes of the high-income society class. As a st… ▽ More We found a unified formula for description of the household incomes of all society classes, for instance, for the European Union in years 2005-2010. The formula is more general than well known that of Yakovenko et al. because, it satisfactorily describes not only the household incomes of low- and medium-income society classes but also the household incomes of the high-income society class. As a striking result, we found that the high-income society class almost disappeared in year 2009, in opposite to situation in remaining years, where this class played a significant role. △ Less

Submitted 10 December, 2013; originally announced December 2013.

arXiv:1312.2362 [pdf, other]

Modelling the income distribution in the European Union: An application for the initial analysis of the recent worldwide financial crisis

Authors: Maciej Jagielski, Ryszard Kutner

Abstract: By using methods of statistical physics, we focus on the quantitative analysis of the economic income data descending from different databases. To explain our approach, we introduce the necessary theoretical background, the extended Yakovenko et al. (EY) model. This model gives an analytical description of the annual household incomes of all society classes in the European Union (i.e., the low-, m… ▽ More By using methods of statistical physics, we focus on the quantitative analysis of the economic income data descending from different databases. To explain our approach, we introduce the necessary theoretical background, the extended Yakovenko et al. (EY) model. This model gives an analytical description of the annual household incomes of all society classes in the European Union (i.e., the low-, medium-, and high-income ones) by a single unified formula based on unified formalism. We show that the EY model is very useful for the analyses of various income datasets, in particular, in the case of a smooth matching of two different datasets. The completed database which we have constructed using this matching emphasises the significance of the high-income society class in the analysis of all household incomes. For instance, the Pareto exponent, which characterises this class, defines the Zipf law having an exponent much lower than the one characterising the medium-income society class. This result makes it possible to clearly distinguish between medium- and high-income society classes. By using our approach, we found that the high-income society class almost disappeared in 2009, which defines this year as the most difficult for the EU. To our surprise, this is a contrast with 2008, considered the first year of a worldwide financial crisis, when the status of the high-income society class was similar to that of 2010. This, perhaps, emphasises that the crisis in the EU was postponed by about one year in comparison with the United States. △ Less

Submitted 9 December, 2013; originally announced December 2013.

arXiv:1301.6519 [pdf, ps, other]

doi 10.12693/APhysPolA.123.538

Ab initio analysis of all income society classes in the European Union

Authors: Maciej Jagielski, Ryszard Kutner

Abstract: We found a unified formula for description of the household incomes of all society classes, for instance, of those of the European Union in year 2007. This formula is a stationary solution of the threshold Fokker-Planck equation (derived from the threshold nonlinear Langevin one). The formula is more general than the well known that of Yakovenko et al. because it satisfactorily describes not only… ▽ More We found a unified formula for description of the household incomes of all society classes, for instance, of those of the European Union in year 2007. This formula is a stationary solution of the threshold Fokker-Planck equation (derived from the threshold nonlinear Langevin one). The formula is more general than the well known that of Yakovenko et al. because it satisfactorily describes not only household incomes of low- and medium-income society classes but also the household incomes of the high-income society class. △ Less

Submitted 28 January, 2013; originally announced January 2013.

Comments: Accepted for publication in Acta Physica Polonica A. arXiv admin note: substantial text overlap with arXiv:1301.2076

arXiv:1301.2076 [pdf, ps, other]

doi 10.1016/j.physa.2013.01.028

Modeling of income distribution in the European Union with the Fokker-Planck equation

Authors: Maciej Jagielski, Ryszard Kutner

Abstract: Herein, we applied statistical physics to study incomes of three (low-, medium- and high-income) society classes instead of the two (low- and medium-income)classes studied so far. In the frame of the threshold nonlinear Langevin dynamics and its threshold Fokker-Planck counterpart, we derived a unified formula for description of income of all society classes, by way of example, of those of the Eur… ▽ More Herein, we applied statistical physics to study incomes of three (low-, medium- and high-income) society classes instead of the two (low- and medium-income)classes studied so far. In the frame of the threshold nonlinear Langevin dynamics and its threshold Fokker-Planck counterpart, we derived a unified formula for description of income of all society classes, by way of example, of those of the European Union in year 2006 and 2008. Hence, the formula is more general than the well known that of Yakovenko et al. That is, our formula well describes not only two regions but simultaneously the third region in the plot of the complementary cumulative distribution function vs. an annual household income. Furthermore, the known stylised facts concerning this income are well described by our formula. Namely, the formula provides the Boltzmann-Gibbs income distribution function for the low-income society class and the weak Pareto law for the medium-income society class, as expected. Importantly, it predicts (to satisfactory approximation) the Zipf law for the high-income society class. Moreover, the region of medium-income society class is now distinctly reduced because the bottom of high-income society class is distinctly lowered. This reduction made, in fact, the medium-income society class an intermediate-income society class. △ Less

Submitted 10 January, 2013; originally announced January 2013.

Comments: Accepted for publication in Physcia A

Showing 1–48 of 48 results for author: Jagielski, M