Skip to main content

Showing 1–50 of 112 results for author: Chaudhuri, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.19156  [pdf, other

    cs.LG

    Beyond Discrepancy: A Closer Look at the Theory of Distribution Shift

    Authors: Robi Bhattacharjee, Nick Rittler, Kamalika Chaudhuri

    Abstract: Many machine learning models appear to deploy effortlessly under distribution shift, and perform well on a target distribution that is considerably different from the training distribution. Yet, learning theory of distribution shift bounds performance on the target distribution as a function of the discrepancy between the source and target, rarely guaranteeing high target accuracy. Motivated by th… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  2. arXiv:2405.17247  [pdf, other

    cs.LG

    An Introduction to Vision-Language Modeling

    Authors: Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, Megan Richards, Samuel Lavoie , et al. (16 additional authors not shown)

    Abstract: Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technol… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  3. arXiv:2405.15140  [pdf, other

    cs.LG

    Better Membership Inference Privacy Measurement through Discrepancy

    Authors: Ruihan Wu, Pengrun Huang, Kamalika Chaudhuri

    Abstract: Membership Inference Attacks have emerged as a dominant method for empirically measuring privacy leakage from machine learning models. Here, privacy is measured by the {\em{advantage}} or gap between a score or a function computed on the training and the test data. A major barrier to the practical deployment of these attacks is that they do not scale to large well-generalized models -- either the… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 9 pages

  4. arXiv:2405.02665  [pdf, ps, other

    cs.CR

    Metric Differential Privacy at the User-Level

    Authors: Jacob Imola, Amrita Roy Chowdhury, Kamalika Chaudhuri

    Abstract: Metric differential privacy (DP) provides heterogeneous privacy guarantees based on a distance between the pair of inputs. It is a widely popular notion of privacy since it captures the natural privacy semantics for many applications (such as, for location data) and results in better utility than standard DP. However, prior work in metric DP has primarily focused on the \textit{item-level} setting… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  5. arXiv:2404.10960  [pdf, other

    cs.CL cs.AI

    Uncertainty-Based Abstention in LLMs Improves Safety and Reduces Hallucinations

    Authors: Christian Tomani, Kamalika Chaudhuri, Ivan Evtimov, Daniel Cremers, Mark Ibrahim

    Abstract: A major barrier towards the practical deployment of large language models (LLMs) is their lack of reliability. Three situations where this is particularly apparent are correctness, hallucinations when given unanswerable questions, and safety. In all three cases, models should ideally abstain from responding, much like humans, whose ability to understand uncertainty makes us refrain from answering… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  6. arXiv:2404.02866  [pdf, other

    cs.LG cs.CR cs.CY stat.ML

    Guarantees of confidentiality via Hammersley-Chapman-Robbins bounds

    Authors: Kamalika Chaudhuri, Chuan Guo, Laurens van der Maaten, Saeed Mahloujifar, Mark Tygert

    Abstract: Protecting privacy during inference with deep neural networks is possible by adding noise to the activations in the last layers prior to the final classifiers or other task-specific layers. The activations in such layers are known as "features" (or, less commonly, as "embeddings" or "feature embeddings"). The added noise helps prevent reconstruction of the inputs from the noisy features. Lower bou… ▽ More

    Submitted 17 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: 18 pages, 6 figures

  7. arXiv:2403.14421  [pdf, other

    cs.LG cs.CR cs.CV

    DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning

    Authors: Jonathan Lebensold, Maziar Sanjabi, Pietro Astolfi, Adriana Romero-Soriano, Kamalika Chaudhuri, Mike Rabbat, Chuan Guo

    Abstract: Text-to-image diffusion models have been shown to suffer from sample-level memorization, possibly reproducing near-perfect replica of images that they are trained on, which may be undesirable. To remedy this issue, we develop the first differentially private (DP) retrieval-augmented generation algorithm that is capable of generating high-quality image samples while providing provable privacy guara… ▽ More

    Submitted 13 May, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

  8. arXiv:2403.05598  [pdf, other

    cs.CR cs.LG

    Privacy Amplification for the Gaussian Mechanism via Bounded Support

    Authors: Shengyuan Hu, Saeed Mahloujifar, Virginia Smith, Kamalika Chaudhuri, Chuan Guo

    Abstract: Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset. These guarantees can be desirable compared to vanilla DP in real world settings as they tightly upper-bound the privacy leakage for a $\textit{specific}$ individual in an $\textit{actual}$… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 23 pages, 4 figures

  9. arXiv:2403.02506  [pdf, other

    cs.CV cs.LG

    Differentially Private Representation Learning via Image Captioning

    Authors: Tom Sander, Yaodong Yu, Maziar Sanjabi, Alain Durmus, Yi Ma, Kamalika Chaudhuri, Chuan Guo

    Abstract: Differentially private (DP) machine learning is considered the gold-standard solution for training a model from sensitive data while still preserving privacy. However, a major barrier to achieving this ideal is its sub-optimal privacy-accuracy trade-off, which is particularly visible in DP representation learning. Specifically, it has been shown that under modest privacy budgets, most models learn… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  10. arXiv:2402.12572  [pdf, other

    cs.LG cs.AI cs.CR

    FairProof : Confidential and Certifiable Fairness for Neural Networks

    Authors: Chhavi Yadav, Amrita Roy Chowdhury, Dan Boneh, Kamalika Chaudhuri

    Abstract: Machine learning models are increasingly used in societal applications, yet legal and privacy concerns demand that they very often be kept confidential. Consequently, there is a growing distrust about the fairness properties of these models in the minds of consumers, who are often at the receiving end of model predictions. To this end, we propose FairProof - a system that uses Zero-Knowledge Proof… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  11. arXiv:2402.11526  [pdf, other

    cs.CR

    Measuring Privacy Loss in Distributed Spatio-Temporal Data

    Authors: Tatsuki Koga, Casey Meehan, Kamalika Chaudhuri

    Abstract: Statistics about traffic flow and people's movement gathered from multiple geographical locations in a distributed manner are the driving force powering many applications, such as traffic prediction, demand prediction, and restaurant occupancy reports. However, these statistics are often based on sensitive location data of people, and hence privacy has to be preserved while releasing them. The sta… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: Chrome PDF viewer might not display Figures 3 and 4 properly

  12. arXiv:2402.02103  [pdf, other

    cs.CV cs.LG

    Déjà Vu Memorization in Vision-Language Models

    Authors: Bargav Jayaraman, Chuan Guo, Kamalika Chaudhuri

    Abstract: Vision-Language Models (VLMs) have emerged as the state-of-the-art representation learning solution, with myriads of downstream applications such as image classification, retrieval and generation. A natural question is whether these models memorize their training data, which also has implications for generalization. We propose a new method for measuring memorization in VLMs, which we call déjà vu… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  13. arXiv:2401.04578  [pdf, other

    cs.CV

    Effective pruning of web-scale datasets based on complexity of concept clusters

    Authors: Amro Abbas, Evgenia Rusak, Kushal Tirumala, Wieland Brendel, Kamalika Chaudhuri, Ari S. Morcos

    Abstract: Utilizing massive web-scale datasets has led to unprecedented performance gains in machine learning models, but also imposes outlandish compute requirements for their training. In order to improve training and data efficiency, we here push the limits of pruning large-scale multimodal datasets for training CLIP-style models. Today's most effective pruning method on ImageNet clusters data samples in… ▽ More

    Submitted 12 March, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Accepted at ICLR 2024, code available at https://github.com/amro-kamal/effective_pruning

  14. arXiv:2311.00682  [pdf, other

    physics.ins-det cs.LG

    Deep Learning-Based Classification of Gamma Photon Interactions in Room-Temperature Semiconductor Radiation Detectors

    Authors: Sandeep K. Chaudhuri, Qinyang Li, Krishna C. Mandal, Jianjun Hu

    Abstract: Photon counting radiation detectors have become an integral part of medical imaging modalities such as Positron Emission Tomography or Computed Tomography. One of the most promising detectors is the wide bandgap room temperature semiconductor detectors, which depends on the interaction gamma/x-ray photons with the detector material involves Compton scattering which leads to multiple interaction ph… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: 17 pages

  15. arXiv:2310.06237  [pdf, other

    cs.LG cs.CR

    Differentially Private Multi-Site Treatment Effect Estimation

    Authors: Tatsuki Koga, Kamalika Chaudhuri, David Page

    Abstract: Patient privacy is a major barrier to healthcare AI. For confidentiality reasons, most patient data remains in silo in separate hospitals, preventing the design of data-driven healthcare AI systems that need large volumes of patient data to make effective decisions. A solution to this is collective learning across multiple sites through federated learning with differential privacy. However, litera… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: 16 pages

  16. arXiv:2310.01202  [pdf, other

    stat.ML cs.LG

    Unified Uncertainty Calibration

    Authors: Kamalika Chaudhuri, David Lopez-Paz

    Abstract: To build robust, fair, and safe AI systems, we would like our classifiers to say ``I don't know'' when facing test examples that are difficult or fall outside of the training classes.The ubiquitous strategy to predict under uncertainty is the simplistic \emph{reject-or-classify} rule: abstain from prediction if epistemic uncertainty is high, classify otherwise.Unfortunately, this recipe does not a… ▽ More

    Submitted 18 January, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

  17. arXiv:2309.00008  [pdf, other

    cs.CV cs.CR cs.LG

    Large-Scale Public Data Improves Differentially Private Image Generation Quality

    Authors: Ruihan Wu, Chuan Guo, Kamalika Chaudhuri

    Abstract: Public data has been frequently used to improve the privacy-accuracy trade-off of differentially private machine learning, but prior work largely assumes that this data come from the same distribution as the private. In this work, we look at how to use generic large-scale public data to improve the quality of differentially private image generation in Generative Adversarial Networks (GANs), and pr… ▽ More

    Submitted 4 August, 2023; originally announced September 2023.

  18. arXiv:2306.08842  [pdf, other

    cs.CV cs.CR cs.LG

    ViP: A Differentially Private Foundation Model for Computer Vision

    Authors: Yaodong Yu, Maziar Sanjabi, Yi Ma, Kamalika Chaudhuri, Chuan Guo

    Abstract: Artificial intelligence (AI) has seen a tremendous surge in capabilities thanks to the use of foundation models trained on internet-scale data. On the flip side, the uncurated nature of internet-scale data also poses significant privacy and legal risks, as they often contain personal information or copyrighted material that should not be trained on without permission. In this work, we propose as a… ▽ More

    Submitted 28 June, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Code: https://github.com/facebookresearch/ViP-MAE. V2 adds a GitHub link to the code

  19. arXiv:2306.01922  [pdf, other

    cs.LG

    Agnostic Multi-Group Active Learning

    Authors: Nick Rittler, Kamalika Chaudhuri

    Abstract: Inspired by the problem of improving classification accuracy on rare or hard subsets of a population, there has been recent interest in models of learning where the goal is to generalize to a collection of distributions, each representing a ``group''. We consider a variant of this problem from the perspective of active learning, where the learner is endowed with the power to decide which examples… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  20. arXiv:2305.11351  [pdf, other

    cs.LG cs.CL cs.CV

    Data Redaction from Conditional Generative Models

    Authors: Zhifeng Kong, Kamalika Chaudhuri

    Abstract: Deep generative models are known to produce undesirable samples such as harmful content. Traditional mitigation methods include re-training from scratch, filtering, or editing; however, these are either computationally expensive or can be circumvented by third parties. In this paper, we take a different approach and study how to post-edit an already-trained conditional generative model so that it… ▽ More

    Submitted 20 February, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: SaTML 2024

  21. arXiv:2304.13850  [pdf, other

    cs.CV cs.CR cs.LG

    Do SSL Models Have Déjà Vu? A Case of Unintended Memorization in Self-supervised Learning

    Authors: Casey Meehan, Florian Bordes, Pascal Vincent, Kamalika Chaudhuri, Chuan Guo

    Abstract: Self-supervised learning (SSL) algorithms can produce useful image representations by learning to associate different parts of natural images with one another. However, when taken to the extreme, SSL models can unintendedly memorize specific parts in individual training samples rather than learning semantically meaningful associations. In this work, we perform a systematic study of the unintended… ▽ More

    Submitted 12 December, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

  22. arXiv:2303.03648  [pdf, other

    cs.LG cs.CR

    Can Membership Inferencing be Refuted?

    Authors: Zhifeng Kong, Amrita Roy Chowdhury, Kamalika Chaudhuri

    Abstract: Membership inference (MI) attack is currently the most popular test for measuring privacy leakage in machine learning models. Given a machine learning model, a data point and some auxiliary information, the goal of an MI attack is to determine whether the data point was used to train the model. In this work, we study the reliability of membership inference attacks in practice. Specifically, we sho… ▽ More

    Submitted 7 March, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

  23. arXiv:2302.13181  [pdf, other

    cs.LG

    Data-Copying in Generative Models: A Formal Framework

    Authors: Robi Bhattacharjee, Sanjoy Dasgupta, Kamalika Chaudhuri

    Abstract: There has been some recent interest in detecting and addressing memorization of training data by deep neural networks. A formal framework for memorization in generative models, called "data-copying," was proposed by Meehan et. al. (2020). We build upon their work to show that their framework may fail to detect certain kinds of blatant memorization. Motivated by this and the theory of non-parametri… ▽ More

    Submitted 1 March, 2023; v1 submitted 25 February, 2023; originally announced February 2023.

    Comments: 33 pages

  24. arXiv:2211.10773  [pdf, other

    cs.LG

    A Two-Stage Active Learning Algorithm for $k$-Nearest Neighbors

    Authors: Nick Rittler, Kamalika Chaudhuri

    Abstract: $k$-nearest neighbor classification is a popular non-parametric method because of desirable properties like automatic adaption to distributional scale changes. Unfortunately, it has thus far proved difficult to design active learning strategies for the training of local voting-based classifiers that naturally retain these desirable properties, and hence active learning strategies for $k… ▽ More

    Submitted 19 August, 2023; v1 submitted 19 November, 2022; originally announced November 2022.

  25. arXiv:2211.03942  [pdf, other

    cs.LG cs.CR

    Privacy-Aware Compression for Federated Learning Through Numerical Mechanism Design

    Authors: Chuan Guo, Kamalika Chaudhuri, Pierre Stock, Mike Rabbat

    Abstract: In private federated learning (FL), a server aggregates differentially private updates from a large number of clients in order to train a machine learning model. The main challenge in this setting is balancing privacy with both classification accuracy of the learnt model as well as the number of bits communicated between the clients and server. Prior work has achieved a good trade-off by designing… ▽ More

    Submitted 9 August, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

  26. arXiv:2210.14376  [pdf, other

    cs.CR

    Robustness of Locally Differentially Private Graph Analysis Against Poisoning

    Authors: Jacob Imola, Amrita Roy Chowdhury, Kamalika Chaudhuri

    Abstract: Locally differentially private (LDP) graph analysis allows private analysis on a graph that is distributed across multiple users. However, such computations are vulnerable to data poisoning attacks where an adversary can skew the results by submitting malformed data. In this paper, we formally study the impact of poisoning attacks for graph degree estimation protocols under LDP. We make two key te… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: 22 pages, 6 figures

  27. arXiv:2210.00635  [pdf, other

    cs.LG stat.ML

    Robust Empirical Risk Minimization with Tolerance

    Authors: Robi Bhattacharjee, Max Hopkins, Akash Kumar, Hantao Yu, Kamalika Chaudhuri

    Abstract: Develo** simple, sample-efficient learning algorithms for robust classification is a pressing issue in today's tech-dominated world, and current theoretical techniques requiring exponential sample complexity and complicated improper learning rules fall far from answering the need. In this work we study the fundamental paradigm of (robust) $\textit{empirical risk minimization}$ (RERM), a simple p… ▽ More

    Submitted 4 February, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

    Comments: 22 pages, 1 figure, To appear at ALT'23

  28. arXiv:2206.14389  [pdf, other

    cs.LG stat.ML

    Data Redaction from Pre-trained GANs

    Authors: Zhifeng Kong, Kamalika Chaudhuri

    Abstract: Large pre-trained generative models are known to occasionally output undesirable samples, which undermines their trustworthiness. The common way to mitigate this is to re-train them differently from scratch using different data or different regularization -- which uses a lot of computational resources and does not always fully address the problem. In this work, we take a different, more compute-… ▽ More

    Submitted 17 January, 2023; v1 submitted 28 June, 2022; originally announced June 2022.

    Comments: SaTML 2023

  29. arXiv:2206.08556  [pdf, other

    cs.LG stat.ML

    Thompson Sampling for Robust Transfer in Multi-Task Bandits

    Authors: Zhi Wang, Chicheng Zhang, Kamalika Chaudhuri

    Abstract: We study the problem of online multi-task learning where the tasks are performed within similar but not necessarily identical multi-armed bandit environments. In particular, we study how a learner can improve its overall performance across multiple related tasks through robust transfer of knowledge. While an upper confidence bound (UCB)-based algorithm has recently been shown to achieve nearly-opt… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: To appear in Proceedings of the 39th International Conference on Machine Learning (ICML-2022)

  30. arXiv:2206.04740  [pdf, other

    cs.LG

    XAudit : A Theoretical Look at Auditing with Explanations

    Authors: Chhavi Yadav, Michal Moshkovitz, Kamalika Chaudhuri

    Abstract: Responsible use of machine learning requires models to be audited for undesirable properties. While a body of work has proposed using explanations for auditing, how to do so and why has remained relatively ill-understood. This work formalizes the role of explanations in auditing and investigates if and how model explanations can help audits. Specifically, we propose explanation-based algorithms fo… ▽ More

    Submitted 5 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

  31. arXiv:2205.11672  [pdf, other

    stat.ML cs.LG

    Why does Throwing Away Data Improve Worst-Group Error?

    Authors: Kamalika Chaudhuri, Kartik Ahuja, Martin Arjovsky, David Lopez-Paz

    Abstract: When facing data with imbalanced classes or groups, practitioners follow an intriguing strategy to achieve best results. They throw away examples until the classes or groups are balanced in size, and then perform empirical risk minimization on the reduced training set. This opposes common wisdom in learning theory, where the expected error is supposed to decrease as the dataset grows in size. In t… ▽ More

    Submitted 21 February, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

  32. arXiv:2205.04605  [pdf, other

    cs.LG cs.CL

    Sentence-level Privacy for Document Embeddings

    Authors: Casey Meehan, Khalil Mrini, Kamalika Chaudhuri

    Abstract: User language data can contain highly sensitive personal content. As such, it is imperative to offer users a strong and interpretable privacy guarantee when learning from their data. In this work, we propose SentDP: pure local differential privacy at the sentence level for a single user document. We propose a novel technique, DeepCandidate, that combines concepts from robust statistics and languag… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

    Comments: Presented at ACL 2022 main conference

  33. arXiv:2205.01429  [pdf, other

    cs.CR cs.DB

    Differentially Private Triangle and 4-Cycle Counting in the Shuffle Model

    Authors: Jacob Imola, Takao Murakami, Kamalika Chaudhuri

    Abstract: Subgraph counting is fundamental for analyzing connection patterns or clustering tendencies in graph data. Recent studies have applied LDP (Local Differential Privacy) to subgraph counting to protect user privacy even against a data collector in social networks. However, existing local algorithms suffer from extremely large estimation errors or assume multi-round interaction between users and the… ▽ More

    Submitted 26 August, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

    Comments: Full version of the paper accepted at ACM CCS 2022; The first and second authors made equal contribution

  34. arXiv:2203.08134  [pdf, other

    cs.LG cs.CR

    Privacy-Aware Compression for Federated Data Analysis

    Authors: Kamalika Chaudhuri, Chuan Guo, Mike Rabbat

    Abstract: Federated data analytics is a framework for distributed data analysis where a server compiles noisy responses from a group of distributed low-bandwidth user devices to estimate aggregate statistics. Two major challenges in this framework are privacy, since user data is often sensitive, and compression, since the user devices have low network bandwidth. Prior work has addressed these challenges sep… ▽ More

    Submitted 9 June, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

  35. arXiv:2202.05189  [pdf, other

    cs.LG stat.ML

    Understanding Rare Spurious Correlations in Neural Networks

    Authors: Yao-Yuan Yang, Chi-Ning Chou, Kamalika Chaudhuri

    Abstract: Neural networks are known to use spurious correlations such as background information for classification. While prior work has looked at spurious correlations that are widespread in the training data, in this work, we investigate how sensitive neural networks are to rare spurious correlations, which may be harder to detect and correct, and may lead to privacy leaks. We introduce spurious patterns… ▽ More

    Submitted 4 October, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

  36. arXiv:2201.12383  [pdf, other

    cs.LG cs.CR

    Bounding Training Data Reconstruction in Private (Deep) Learning

    Authors: Chuan Guo, Brian Karrer, Kamalika Chaudhuri, Laurens van der Maaten

    Abstract: Differential privacy is widely accepted as the de facto method for preventing data leakage in ML, and conventional wisdom suggests that it offers strong protection against privacy attacks. However, existing semantic guarantees for DP focus on membership inference, which may overestimate the adversary's capabilities and is not applicable when membership status itself is non-sensitive. In this paper… ▽ More

    Submitted 23 June, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

  37. arXiv:2201.04762  [pdf, ps, other

    cs.CR cs.AI cs.LG

    Privacy Amplification by Subsampling in Time Domain

    Authors: Tatsuki Koga, Casey Meehan, Kamalika Chaudhuri

    Abstract: Aggregate time-series data like traffic flow and site occupancy repeatedly sample statistics from a population across time. Such data can be profoundly useful for understanding trends within a given population, but also pose a significant privacy risk, potentially revealing e.g., who spends time where. Producing a private version of a time-series satisfying the standard definition of Differential… ▽ More

    Submitted 12 January, 2022; originally announced January 2022.

  38. arXiv:2112.06008  [pdf, ps, other

    cs.LG

    Privacy Amplification via Shuffling for Linear Contextual Bandits

    Authors: Evrard Garcelon, Kamalika Chaudhuri, Vianney Perchet, Matteo Pirotta

    Abstract: Contextual bandit algorithms are widely used in domains where it is desirable to provide a personalized service by leveraging contextual information, that may contain sensitive information that needs to be protected. Inspired by this scenario, we study the contextual linear bandit problem with differential privacy (DP) constraints. While the literature has focused on either centralized (joint DP)… ▽ More

    Submitted 11 December, 2021; originally announced December 2021.

  39. arXiv:2110.06485  [pdf, other

    cs.CR cs.DB

    Communication-Efficient Triangle Counting under Local Differential Privacy

    Authors: Jacob Imola, Takao Murakami, Kamalika Chaudhuri

    Abstract: Triangle counting in networks under LDP (Local Differential Privacy) is a fundamental task for analyzing connection patterns or calculating a clustering coefficient while strongly protecting sensitive friendships from a central server. In particular, a recent study proposes an algorithm for this task that uses two rounds of interaction between users and the server to significantly reduce estimatio… ▽ More

    Submitted 4 January, 2024; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: Full version of the paper accepted at USENIX Security 2022; The first and second authors made equal contribution; The current version added an addendum to double clip** (Appendix I)

  40. arXiv:2109.06999  [pdf, ps, other

    cs.LG

    Behavior of k-NN as an Instance-Based Explanation Method

    Authors: Chhavi Yadav, Kamalika Chaudhuri

    Abstract: Adoption of DL models in critical areas has led to an escalating demand for sound explanation methods. Instance-based explanation methods are a popular type that return selective instances from the training set to explain the predictions for a test sample. One way to connect these explanations with prediction is to ask the following counterfactual question - how does the loss and prediction for a… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

  41. arXiv:2106.06603  [pdf, other

    cs.LG cs.CR

    A Shuffling Framework for Local Differential Privacy

    Authors: Casey Meehan, Amrita Roy Chowdhury, Kamalika Chaudhuri, Somesh Jha

    Abstract: ldp deployments are vulnerable to inference attacks as an adversary can link the noisy responses to their identity and subsequently, auxiliary information using the order of the data. An alternative model, shuffle DP, prevents this by shuffling the noisy responses uniformly at random. However, this limits the data learnability -- only symmetric functions (input order agnostic) can be learned. In t… ▽ More

    Submitted 15 October, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

  42. arXiv:2105.14203  [pdf, other

    cs.LG stat.ML

    Understanding Instance-based Interpretability of Variational Auto-Encoders

    Authors: Zhifeng Kong, Kamalika Chaudhuri

    Abstract: Instance-based interpretation methods have been widely studied for supervised learning methods as they help explain how black box neural networks predict. However, instance-based interpretations remain ill-understood in the context of unsupervised learning. In this paper, we investigate influence functions [Koh and Liang, 2017], a popular instance-based interpretation method, for a class of deep g… ▽ More

    Submitted 21 January, 2022; v1 submitted 29 May, 2021; originally announced May 2021.

    Comments: NeurIPS 2021

  43. arXiv:2105.10594  [pdf, other

    cs.LG cs.CR cs.IT

    Privacy Amplification Via Bernoulli Sampling

    Authors: Jacob Imola, Kamalika Chaudhuri

    Abstract: Balancing privacy and accuracy is a major challenge in designing differentially private machine learning algorithms. One way to improve this tradeoff for free is to leverage the noise in common data operations that already use randomness. Such operations include noisy SGD and data subsampling. The additional noise in these operations may amplify the privacy guarantee of the overall algorithm, a ph… ▽ More

    Submitted 18 October, 2021; v1 submitted 21 May, 2021; originally announced May 2021.

    Comments: 11 pages, 3 figures. Appeared in TPDP Workshop @ ICML 2021

  44. arXiv:2103.05793  [pdf, ps, other

    cs.LG stat.ML

    Universal Approximation of Residual Flows in Maximum Mean Discrepancy

    Authors: Zhifeng Kong, Kamalika Chaudhuri

    Abstract: Normalizing flows are a class of flexible deep generative models that offer easy likelihood computation. Despite their empirical success, there is little theoretical understanding of their expressiveness. In this work, we study residual flows, a class of normalizing flows composed of Lipschitz residual blocks. We prove residual flows are universal approximators in maximum mean discrepancy. We prov… ▽ More

    Submitted 24 June, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

  45. arXiv:2102.11955  [pdf, other

    cs.AI cs.CR

    Location Trace Privacy Under Conditional Priors

    Authors: Casey Meehan, Kamalika Chaudhuri

    Abstract: Providing meaningful privacy to users of location based services is particularly challenging when multiple locations are revealed in a short period of time. This is primarily due to the tremendous degree of dependence that can be anticipated between points. We propose a Rényi divergence based privacy framework for bounding expected privacy loss for conditionally dependent data. Additionally, we de… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

    Comments: To be published in the proceedings of AISTATS 2021

  46. arXiv:2102.09086  [pdf, other

    cs.LG

    Consistent Non-Parametric Methods for Maximizing Robustness

    Authors: Robi Bhattacharjee, Kamalika Chaudhuri

    Abstract: Learning classifiers that are robust to adversarial examples has received a great deal of recent attention. A major drawback of the standard robust learning framework is there is an artificial robustness radius $r$ that applies to all inputs. This ignores the fact that data may be highly heterogeneous, in which case it is plausible that robustness regions should be larger in some regions of data,… ▽ More

    Submitted 18 January, 2023; v1 submitted 17 February, 2021; originally announced February 2021.

    Comments: accepted to Nuerips 2021

  47. arXiv:2102.07048  [pdf, other

    cs.LG stat.ML

    Connecting Interpretability and Robustness in Decision Trees through Separation

    Authors: Michal Moshkovitz, Yao-Yuan Yang, Kamalika Chaudhuri

    Abstract: Recent research has recognized interpretability and robustness as essential properties of trustworthy classification. Curiously, a connection between robustness and interpretability was empirically observed, but the theoretical reasoning behind it remained elusive. In this paper, we rigorously investigate this connection. Specifically, we focus on interpretation using decision trees and robustness… ▽ More

    Submitted 13 February, 2021; originally announced February 2021.

  48. arXiv:2012.10794  [pdf, other

    cs.LG stat.ML

    Sample Complexity of Adversarially Robust Linear Classification on Separated Data

    Authors: Robi Bhattacharjee, Somesh Jha, Kamalika Chaudhuri

    Abstract: We consider the sample complexity of learning with adversarial robustness. Most prior theoretical results for this problem have considered a setting where different classes in the data are close together or overlap**. Motivated by some real applications, we consider, in contrast, the well-separated case where there exists a classifier with perfect accuracy and robustness, and show that the sampl… ▽ More

    Submitted 18 January, 2023; v1 submitted 19 December, 2020; originally announced December 2020.

  49. arXiv:2011.08485  [pdf, other

    cs.LG stat.ML

    Probing Predictions on OOD Images via Nearest Categories

    Authors: Yao-Yuan Yang, Cyrus Rashtchian, Ruslan Salakhutdinov, Kamalika Chaudhuri

    Abstract: We study out-of-distribution (OOD) prediction behavior of neural networks when they classify images from unseen classes or corrupted images. To probe the OOD behavior, we introduce a new measure, nearest category generalization (NCG), where we compute the fraction of OOD inputs that are classified with the same label as their nearest neighbor in the training set. Our motivation stems from understa… ▽ More

    Submitted 8 March, 2023; v1 submitted 17 November, 2020; originally announced November 2020.

    Comments: Accepted by Transactions on Machine Learning Research

  50. arXiv:2011.03186  [pdf, other

    cs.LG cs.CR

    Revisiting Model-Agnostic Private Learning: Faster Rates and Active Learning

    Authors: Chong Liu, Yuqing Zhu, Kamalika Chaudhuri, Yu-Xiang Wang

    Abstract: The Private Aggregation of Teacher Ensembles (PATE) framework is one of the most promising recent approaches in differentially private learning. Existing theoretical analysis shows that PATE consistently learns any VC-classes in the realizable setting, but falls short in explaining its success in more general cases where the error rate of the optimal classifier is bounded away from zero. We fill i… ▽ More

    Submitted 11 March, 2022; v1 submitted 5 November, 2020; originally announced November 2020.

    Journal ref: Journal of Machine Learning Research 22(262) (2021) 1-44