Skip to main content

Showing 1–25 of 25 results for author: Tople, S

.
  1. arXiv:2402.14397  [pdf, other

    cs.CR cs.LG

    Closed-Form Bounds for DP-SGD against Record-level Inference

    Authors: Giovanni Cherubin, Boris Köpf, Andrew Paverd, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin

    Abstract: Machine learning models trained with differentially-private (DP) algorithms such as DP-SGD enjoy resilience against a wide range of privacy attacks. Although it is possible to derive bounds for some attacks based solely on an $(\varepsilon,δ)$-DP guarantee, meaningful bounds require a small enough privacy budget (i.e., injecting a large amount of noise), which results in a large loss in utility. T… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  2. arXiv:2311.15792  [pdf, other

    cs.LG cs.CR

    Rethinking Privacy in Machine Learning Pipelines from an Information Flow Control Perspective

    Authors: Lukas Wutschitz, Boris Köpf, Andrew Paverd, Saravan Rajmohan, Ahmed Salem, Shruti Tople, Santiago Zanella-Béguelin, Menglin Xia, Victor Rühle

    Abstract: Modern machine learning systems use models trained on ever-growing corpora. Typically, metadata such as ownership, access control, or licensing information is ignored during training. Instead, to mitigate privacy risks, we rely on generic techniques such as dataset sanitization and differentially private model training, with inherent privacy/utility trade-offs that hurt model performance. Moreover… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  3. arXiv:2310.18362  [pdf, ps, other

    cs.CL cs.CR cs.LG

    SoK: Memorization in General-Purpose Large Language Models

    Authors: Valentin Hartmann, Anshuman Suri, Vincent Bindschaedler, David Evans, Shruti Tople, Robert West

    Abstract: Large Language Models (LLMs) are advancing at a remarkable pace, with myriad applications under development. Unlike most earlier machine learning models, they are no longer built for one specific application but are designed to excel in a wide range of tasks. A major part of this success is due to their huge training datasets and the unprecedented number of model parameters, which allow them to me… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  4. arXiv:2310.08015  [pdf, other

    cs.LG cs.CR

    Why Train More? Effective and Efficient Membership Inference via Memorization

    Authors: Jihye Choi, Shruti Tople, Varun Chandrasekaran, Somesh Jha

    Abstract: Membership Inference Attacks (MIAs) aim to identify specific data samples within the private training dataset of machine learning models, leading to serious privacy violations and other sophisticated threats. Many practical black-box MIAs require query access to the data distribution (the same distribution where the private data is drawn) to train shadow models. By doing so, the adversary obtains… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  5. arXiv:2306.05093  [pdf, other

    cs.CR cs.LG

    Investigating the Effect of Misalignment on Membership Privacy in the White-box Setting

    Authors: Ana-Maria Cretu, Daniel Jones, Yves-Alexandre de Montjoye, Shruti Tople

    Abstract: Machine learning models have been shown to leak sensitive information about their training datasets. Models are increasingly deployed on devices, raising concerns that white-box access to the model parameters increases the attack surface compared to black-box access which only provides query access. Directly extending the shadow modelling technique from the black-box to the white-box setting has b… ▽ More

    Submitted 12 March, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: To appear in the Proceedings on Privacy Enhancing Technologies (PoPETs 2024)

  6. arXiv:2302.01190  [pdf, other

    stat.ML cs.CR cs.LG

    On the Efficacy of Differentially Private Few-shot Image Classification

    Authors: Marlon Tobaben, Aliaksandra Shysheya, John Bronskill, Andrew Paverd, Shruti Tople, Santiago Zanella-Beguelin, Richard E Turner, Antti Honkela

    Abstract: There has been significant recent progress in training differentially private (DP) models which achieve accuracy that approaches the best non-private models. These DP models are typically pretrained on large public datasets and then fine-tuned on private downstream datasets that are relatively large and similar in distribution to the pretraining data. However, in many applications including person… ▽ More

    Submitted 19 December, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: 49 pages, 24 figures; published in TMLR 12/2023 https://openreview.net/forum?id=hFsr59Imzm

    Journal ref: Transactions on Machine Learning Research, ISSN 2835-8856, 2023

  7. arXiv:2302.00539  [pdf, other

    cs.LG

    Analyzing Leakage of Personally Identifiable Information in Language Models

    Authors: Nils Lukas, Ahmed Salem, Robert Sim, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin

    Abstract: Language Models (LMs) have been shown to leak information about training data through sentence-level membership inference and reconstruction attacks. Understanding the risk of LMs leaking Personally Identifiable Information (PII) has received less attention, which can be attributed to the false assumption that dataset curation techniques such as scrubbing are sufficient to prevent PII leakage. Scr… ▽ More

    Submitted 23 April, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: IEEE Symposium on Security and Privacy (S&P) 2023

  8. arXiv:2212.10986  [pdf, other

    cs.LG cs.CR cs.GT

    SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning

    Authors: Ahmed Salem, Giovanni Cherubin, David Evans, Boris Köpf, Andrew Paverd, Anshuman Suri, Shruti Tople, Santiago Zanella-Béguelin

    Abstract: Deploying machine learning models in production may allow adversaries to infer sensitive information about training data. There is a vast literature analyzing different types of inference risks, ranging from membership inference to reconstruction attacks. Inspired by the success of games (i.e., probabilistic experiments) to study security properties in cryptography, some authors describe privacy i… ▽ More

    Submitted 20 April, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

    Comments: 20 pages, to appear in 2023 IEEE Symposium on Security and Privacy

  9. arXiv:2210.01834  [pdf, other

    cs.LG cs.CR

    Invariant Aggregator for Defending against Federated Backdoor Attacks

    Authors: Xiaoyang Wang, Dimitrios Dimitriadis, Sanmi Koyejo, Shruti Tople

    Abstract: Federated learning enables training high-utility models across several clients without directly sharing their private data. As a downside, the federated setting makes the model vulnerable to various adversarial attacks in the presence of malicious clients. Despite the theoretical and empirical success in defending against attacks that aim to degrade models' utility, defense against backdoor attack… ▽ More

    Submitted 8 March, 2024; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: AISTATS 2024 camera-ready

  10. arXiv:2209.08615  [pdf, other

    cs.LG cs.AI cs.CR

    Membership Inference Attacks and Generalization: A Causal Perspective

    Authors: Teodora Baluta, Shiqi Shen, S. Hitarth, Shruti Tople, Prateek Saxena

    Abstract: Membership inference (MI) attacks highlight a privacy weakness in present stochastic training methods for neural networks. It is not well understood, however, why they arise. Are they a natural consequence of imperfect generalization only? Which underlying causes should we address during training to mitigate these attacks? Towards answering such questions, we propose the first approach to explain… ▽ More

    Submitted 30 October, 2022; v1 submitted 18 September, 2022; originally announced September 2022.

    Comments: 26 pages, 15 figures; added CC-license block icons and links, typos corrected, added reference to Github

  11. arXiv:2209.08541  [pdf, other

    cs.CR cs.LG

    Distribution inference risks: Identifying and mitigating sources of leakage

    Authors: Valentin Hartmann, Léo Meynent, Maxime Peyrard, Dimitrios Dimitriadis, Shruti Tople, Robert West

    Abstract: A large body of work shows that machine learning (ML) models can leak sensitive or confidential information about their training data. Recently, leakage due to distribution inference (or property inference) attacks is gaining attention. In this attack, the goal of an adversary is to infer distributional information about the training data. So far, research on distribution inference has focused on… ▽ More

    Submitted 18 September, 2022; originally announced September 2022.

    Comments: 14 pages, 8 figures

  12. arXiv:2206.05199  [pdf, other

    cs.LG cs.CR

    Bayesian Estimation of Differential Privacy

    Authors: Santiago Zanella-Béguelin, Lukas Wutschitz, Shruti Tople, Ahmed Salem, Victor Rühle, Andrew Paverd, Mohammad Naseri, Boris Köpf, Daniel Jones

    Abstract: Algorithms such as Differentially Private SGD enable training machine learning models with formal privacy guarantees. However, there is a discrepancy between the protection that such algorithms guarantee in theory and the protection they afford in practice. An emerging strand of work empirically estimates the protection afforded by differentially private training as a confidence interval for the p… ▽ More

    Submitted 15 June, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

    Comments: 17 pages, 8 figures. Joint main authors: Santiago Zanella-Béguelin, Lukas Wutschitz, and Shruti Tople

  13. arXiv:2110.03369  [pdf, other

    cs.LG cs.AI cs.CR

    The Connection between Out-of-Distribution Generalization and Privacy of ML Models

    Authors: Divyat Mahajan, Shruti Tople, Amit Sharma

    Abstract: With the goal of generalizing to out-of-distribution (OOD) data, recent domain generalization methods aim to learn "stable" feature representations whose effect on the output remains invariant across domains. Given the theoretical connection between generalization and privacy, we ask whether better OOD generalization leads to better privacy for machine learning models, where privacy is measured th… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: Prior version accepted at Workshop on Privacy Preserving Machine Learning, NeurIPS 2020. Code: https://github.com/microsoft/robustdg

  14. arXiv:2105.13144  [pdf, other

    cs.LG cs.CR

    Causally Constrained Data Synthesis for Private Data Release

    Authors: Varun Chandrasekaran, Darren Edge, Somesh Jha, Amit Sharma, Cheng Zhang, Shruti Tople

    Abstract: Making evidence based decisions requires data. However for real-world applications, the privacy of data is critical. Using synthetic data which reflects certain statistical properties of the original data preserves the privacy of the original data. To this end, prior works utilize differentially private data release mechanisms to provide formal privacy guarantees. However, such mechanisms have una… ▽ More

    Submitted 27 May, 2021; originally announced May 2021.

  15. arXiv:2009.05683  [pdf, other

    cs.CR cs.LG

    MACE: A Flexible Framework for Membership Privacy Estimation in Generative Models

    Authors: Yixi Xu, Sumit Mukherjee, Xiyang Liu, Shruti Tople, Rahul Dodhia, Juan Lavista Ferres

    Abstract: Generative machine learning models are being increasingly viewed as a way to share sensitive data between institutions. While there has been work on develo** differentially private generative modeling approaches, these approaches generally lead to sub-par sample quality, limiting their use in real world applications. Another line of work has focused on develo** generative models which lead to… ▽ More

    Submitted 12 October, 2022; v1 submitted 11 September, 2020; originally announced September 2020.

  16. arXiv:2007.12934  [pdf, other

    cs.CR cs.LG stat.ML

    SOTERIA: In Search of Efficient Neural Networks for Private Inference

    Authors: Anshul Aggarwal, Trevor E. Carlson, Reza Shokri, Shruti Tople

    Abstract: ML-as-a-service is gaining popularity where a cloud server hosts a trained model and offers prediction (inference) service to users. In this setting, our objective is to protect the confidentiality of both the users' input queries as well as the model parameters at the server, with modest computation and communication overhead. Prior solutions primarily propose fine-tuning cryptographic methods to… ▽ More

    Submitted 25 July, 2020; originally announced July 2020.

  17. arXiv:2006.07500  [pdf, other

    cs.LG cs.AI stat.ML

    Domain Generalization using Causal Matching

    Authors: Divyat Mahajan, Shruti Tople, Amit Sharma

    Abstract: In the domain generalization literature, a common objective is to learn representations independent of the domain after conditioning on the class label. We show that this objective is not sufficient: there exist counter-examples where a model fails to generalize to unseen domains even after satisfying class-conditional domain invariance. We formalize this observation through a structural causal mo… ▽ More

    Submitted 29 June, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: Proceedings of the 38th International Conference on Machine Learning (ICML), PMLR 139, 2021. (Long Talk)

  18. arXiv:2006.07267  [pdf, other

    cs.LG cs.CR stat.ML

    Leakage of Dataset Properties in Multi-Party Machine Learning

    Authors: Wanrong Zhang, Shruti Tople, Olga Ohrimenko

    Abstract: Secure multi-party machine learning allows several parties to build a model on their pooled data to increase utility while not explicitly sharing data with each other. We show that such multi-party computation can cause leakage of global dataset properties between the parties even when parties obtain only black-box access to the final model. In particular, a ``curious'' party can infer the distrib… ▽ More

    Submitted 17 June, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: Published in USENIX Security Symposium, 2021

  19. arXiv:2004.02229  [pdf, other

    cs.CR cs.LG

    FALCON: Honest-Majority Maliciously Secure Framework for Private Deep Learning

    Authors: Sameer Wagh, Shruti Tople, Fabrice Benhamouda, Eyal Kushilevitz, Prateek Mittal, Tal Rabin

    Abstract: We propose Falcon, an end-to-end 3-party protocol for efficient private training and inference of large machine learning models. Falcon presents four main advantages - (i) It is highly expressive with support for high capacity networks such as VGG16 (ii) it supports batch normalization which is important for training complex networks such as AlexNet (iii) Falcon guarantees security with abort agai… ▽ More

    Submitted 7 September, 2020; v1 submitted 5 April, 2020; originally announced April 2020.

    Comments: Revised version, contains some more experiments and fixes minor typos in the paper

  20. arXiv:2001.02438  [pdf, other

    cs.LG cs.CR cs.IR stat.ML

    To Transfer or Not to Transfer: Misclassification Attacks Against Transfer Learned Text Classifiers

    Authors: Bijeeta Pal, Shruti Tople

    Abstract: Transfer learning --- transferring learned knowledge --- has brought a paradigm shift in the way models are trained. The lucrative benefits of improved accuracy and reduced training time have shown promise in training models with constrained computational resources and fewer training samples. Specifically, publicly available text-based models such as GloVe and BERT that are trained on large corpus… ▽ More

    Submitted 8 January, 2020; originally announced January 2020.

  21. arXiv:1912.07942  [pdf, other

    cs.LG cs.CL cs.CR stat.ML

    Analyzing Information Leakage of Updates to Natural Language Models

    Authors: Santiago Zanella-Béguelin, Lukas Wutschitz, Shruti Tople, Victor Rühle, Andrew Paverd, Olga Ohrimenko, Boris Köpf, Marc Brockschmidt

    Abstract: To continuously improve quality and reflect changes in data, machine learning applications have to regularly retrain and update their core models. We show that a differential analysis of language model snapshots before and after an update can reveal a surprising amount of detailed information about changes in the training data. We propose two new metrics---\emph{differential score} and \emph{diffe… ▽ More

    Submitted 5 August, 2021; v1 submitted 17 December, 2019; originally announced December 2019.

  22. arXiv:1912.02919  [pdf, other

    cs.LG cs.CR stat.ML

    An Empirical Study on the Intrinsic Privacy of SGD

    Authors: Stephanie L. Hyland, Shruti Tople

    Abstract: Introducing noise in the training of machine learning systems is a powerful way to protect individual privacy via differential privacy guarantees, but comes at a cost to utility. This work looks at whether the inherent randomness of stochastic gradient descent (SGD) could contribute to privacy, effectively reducing the amount of \emph{additional} noise required to achieve a given privacy guarantee… ▽ More

    Submitted 28 February, 2022; v1 submitted 5 December, 2019; originally announced December 2019.

    Comments: 21 pages, 11 figures, 8 tables

  23. arXiv:1911.09052  [pdf, other

    cs.GT cs.LG stat.ML

    Collaborative Machine Learning Markets with Data-Replication-Robust Payments

    Authors: Olga Ohrimenko, Shruti Tople, Sebastian Tschiatschek

    Abstract: We study the problem of collaborative machine learning markets where multiple parties can achieve improved performance on their machine learning tasks by combining their training data. We discuss desired properties for these machine learning markets in terms of fair revenue distribution and potential threats, including data replication. We then instantiate a collaborative market for cases where pa… ▽ More

    Submitted 8 November, 2019; originally announced November 2019.

  24. arXiv:1909.12732  [pdf, other

    cs.LG stat.ML

    Alleviating Privacy Attacks via Causal Learning

    Authors: Shruti Tople, Amit Sharma, Aditya Nori

    Abstract: Machine learning models, especially deep neural networks have been shown to be susceptible to privacy attacks such as membership inference where an adversary can detect whether a data point was used for training a black-box model. Such privacy risks are exacerbated when a model's predictions are used on an unseen data distribution. To alleviate privacy attacks, we demonstrate the benefit of predic… ▽ More

    Submitted 17 July, 2020; v1 submitted 27 September, 2019; originally announced September 2019.

    Comments: Accepted at International Conference on Machine Learning, 2020

  25. arXiv:1810.00602  [pdf, other

    cs.CR cs.AI cs.CV

    Privado: Practical and Secure DNN Inference with Enclaves

    Authors: Karan Grover, Shruti Tople, Shweta Shinde, Ranjita Bhagwan, Ramachandran Ramjee

    Abstract: Cloud providers are extending support for trusted hardware primitives such as Intel SGX. Simultaneously, the field of deep learning is seeing enormous innovation as well as an increase in adoption. In this paper, we ask a timely question: "Can third-party cloud services use Intel SGX enclaves to provide practical, yet secure DNN Inference-as-a-service?" We first demonstrate that DNN models executi… ▽ More

    Submitted 5 September, 2019; v1 submitted 1 October, 2018; originally announced October 2018.

    Comments: 13 pages, 5 figures