Skip to main content

Showing 1–50 of 79 results for author: Suresh, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.11607  [pdf, other

    cs.DS

    Private federated discovery of out-of-vocabulary words for Gboard

    Authors: Ziteng Sun, Peter Kairouz, Haicheng Sun, Adria Gascon, Ananda Theertha Suresh

    Abstract: The vocabulary of language models in Gboard, Google's keyboard application, plays a crucial role for improving user experience. One way to improve the vocabulary is to discover frequently typed out-of-vocabulary (OOV) words on user devices. This task requires strong privacy protection due to the sensitive nature of user input data. In this report, we present a private OOV discovery algorithm for G… ▽ More

    Submitted 18 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  2. arXiv:2404.09221  [pdf, other

    cs.CL cs.AI cs.LG

    Exploring and Improving Drafts in Blockwise Parallel Decoding

    Authors: Taehyeon Kim, Ananda Theertha Suresh, Kishore Papineni, Michael Riley, Sanjiv Kumar, Adrian Benton

    Abstract: Despite the remarkable strides made by autoregressive language models, their potential is often hampered by the slow inference speeds inherent in sequential token generation. Blockwise parallel decoding (BPD) was proposed by Stern et al. as a method to improve inference speed of language models by simultaneously predicting multiple future tokens, termed block drafts, which are subsequently verifie… ▽ More

    Submitted 5 June, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

  3. arXiv:2404.01730  [pdf, other

    cs.LG cs.IT stat.ML

    Asymptotics of Language Model Alignment

    Authors: Joy Qi** Yang, Salman Salamatian, Ziteng Sun, Ananda Theertha Suresh, Ahmad Beirami

    Abstract: Let $p$ denote a generative language model. Let $r$ denote a reward model that returns a scalar that captures the degree at which a draw from $p$ is preferred. The goal of language model alignment is to alter $p$ to a new distribution $φ$ that results in a higher expected reward while kee** $φ$ close to $p.$ A popular alignment method is the KL-constrained reinforcement learning (RL), which choo… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  4. arXiv:2403.17983  [pdf, other

    cs.CR cs.LG

    Is Watermarking LLM-Generated Code Robust?

    Authors: Tarun Suresh, Shubham Ugare, Gagandeep Singh, Sasa Misailovic

    Abstract: We present the first study of the robustness of existing watermarking techniques on Python code generated by large language models. Although existing works showed that watermarking can be robust for natural language, we show that it is easy to remove these watermarks on code by semantic-preserving transformations.

    Submitted 28 June, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  5. arXiv:2403.10444  [pdf, other

    cs.LG cs.CL cs.DS cs.IT

    Optimal Block-Level Draft Verification for Accelerating Speculative Decoding

    Authors: Ziteng Sun, Jae Hun Ro, Ahmad Beirami, Ananda Theertha Suresh

    Abstract: Speculative decoding has shown to be an effective method for lossless acceleration of large language models (LLMs) during inference. In each iteration, the algorithm first uses a smaller model to draft a block of tokens. The tokens are then verified by the large model in parallel and only a subset of tokens will be kept to guarantee that the final output follows the distribution of the large model… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  6. arXiv:2403.08100  [pdf, other

    cs.LG cs.CR cs.DC

    Efficient Language Model Architectures for Differentially Private Federated Learning

    Authors: Jae Hun Ro, Srinadh Bhojanapalli, Zheng Xu, Yanxiang Zhang, Ananda Theertha Suresh

    Abstract: Cross-device federated learning (FL) is a technique that trains a model on data distributed across typically millions of edge devices without data leaving the devices. SGD is the standard client optimizer for on device training in cross-device FL, favored for its memory and computational efficiency. However, in centralized training of neural language models, adaptive optimizers are preferred as th… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  7. arXiv:2403.01632  [pdf, other

    cs.LG cs.FL cs.PL cs.SE

    SynCode: LLM Generation with Grammar Augmentation

    Authors: Shubham Ugare, Tarun Suresh, Hangoo Kang, Sasa Misailovic, Gagandeep Singh

    Abstract: LLMs are widely used in complex AI applications. These applications underscore the need for LLM outputs to adhere to a specific format, for their integration with other components in the systems. Typically the format rules e.g., for data serialization formats such as JSON, YAML, or Code in Programming Language are expressed as context-free grammar (CFG). Due to the hallucinations and unreliability… ▽ More

    Submitted 29 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

  8. arXiv:2401.01879  [pdf, other

    cs.LG cs.CL cs.IT

    Theoretical guarantees on the best-of-n alignment policy

    Authors: Ahmad Beirami, Alekh Agarwal, Jonathan Berant, Alexander D'Amour, Jacob Eisenstein, Chirag Nagpal, Ananda Theertha Suresh

    Abstract: A simple and effective method for the alignment of generative models is the best-of-$n$ policy, where $n$ samples are drawn from a base policy, and ranked based on a reward function, and the highest ranking one is selected. A commonly used analytical expression in the literature claims that the KL divergence between the best-of-$n$ policy and the base policy is equal to $\log (n) - (n-1)/n.$ We di… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  9. arXiv:2312.06658  [pdf, other

    cs.DS cs.CR cs.IT stat.ML

    Mean estimation in the add-remove model of differential privacy

    Authors: Alex Kulesza, Ananda Theertha Suresh, Yuyan Wang

    Abstract: Differential privacy is often studied under two different models of neighboring datasets: the add-remove model and the swap model. While the swap model is frequently used in the academic literature to simplify analysis, many practical applications rely on the more conservative add-remove model, where obtaining tight results can be difficult. Here, we study the problem of one-dimensional mean estim… ▽ More

    Submitted 19 February, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

  10. arXiv:2312.03867  [pdf, other

    cs.LG cs.CY cs.IT stat.ML

    Multi-Group Fairness Evaluation via Conditional Value-at-Risk Testing

    Authors: Lucas Monteiro Paes, Ananda Theertha Suresh, Alex Beutel, Flavio P. Calmon, Ahmad Beirami

    Abstract: Machine learning (ML) models used in prediction and classification tasks may display performance disparities across population groups determined by sensitive attributes (e.g., race, sex, age). We consider the problem of evaluating the performance of a fixed ML model across population groups defined by multiple sensitive attributes (e.g., race and sex and age). Here, the sample complexity for estim… ▽ More

    Submitted 25 May, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: Accepted for publication in the IEEE Journal on Selected Areas in Information Theory (JSAIT)

  11. arXiv:2310.15141  [pdf, other

    cs.LG cs.CL cs.DS cs.IT

    SpecTr: Fast Speculative Decoding via Optimal Transport

    Authors: Ziteng Sun, Ananda Theertha Suresh, Jae Hun Ro, Ahmad Beirami, Himanshu Jain, Felix Yu

    Abstract: Autoregressive sampling from large language models has led to state-of-the-art results in several natural language tasks. However, autoregressive sampling generates tokens one at a time making it slow, and even prohibitive in certain tasks. One way to speed up sampling is $\textit{speculative decoding}$: use a small model to sample a $\textit{draft}$ (block or sequence of tokens), and then score a… ▽ More

    Submitted 17 January, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  12. arXiv:2307.13347  [pdf, other

    cs.DS cs.CR cs.IT

    Federated Heavy Hitter Recovery under Linear Sketching

    Authors: Adria Gascon, Peter Kairouz, Ziteng Sun, Ananda Theertha Suresh

    Abstract: Motivated by real-life deployments of multi-round federated analytics with secure aggregation, we investigate the fundamental communication-accuracy tradeoffs of the heavy hitter discovery and approximate (open-domain) histogram problems under a linear sketching constraint. We propose efficient algorithms based on local subsampling and invertible bloom look-up tables (IBLTs). We also show that our… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

  13. arXiv:2307.11106  [pdf, other

    cs.LG cs.CR cs.IT

    The importance of feature preprocessing for differentially private linear optimization

    Authors: Ziteng Sun, Ananda Theertha Suresh, Aditya Krishna Menon

    Abstract: Training machine learning models with differential privacy (DP) has received increasing interest in recent years. One of the most popular algorithms for training differentially private models is differentially private stochastic gradient descent (DPSGD) and its variants, where at each step gradients are clipped and combined with some noise. Given the increasing usage of DPSGD, we ask the question:… ▽ More

    Submitted 19 February, 2024; v1 submitted 19 July, 2023; originally announced July 2023.

  14. arXiv:2307.04905  [pdf, other

    cs.LG cs.DC

    FedYolo: Augmenting Federated Learning with Pretrained Transformers

    Authors: Xuechen Zhang, Mingchen Li, Xiangyu Chang, Jiasi Chen, Amit K. Roy-Chowdhury, Ananda Theertha Suresh, Samet Oymak

    Abstract: The growth and diversity of machine learning applications motivate a rethinking of learning with mobile and edge devices. How can we address diverse client goals and learn with scarce heterogeneous data? While federated learning aims to address these issues, it has challenges hindering a unified solution. Large transformer models have been shown to work across a variety of tasks achieving remarkab… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: 20 pages, 18 figures

  15. arXiv:2305.19521  [pdf, other

    cs.LG cs.CR cs.PL

    Incremental Randomized Smoothing Certification

    Authors: Shubham Ugare, Tarun Suresh, Debangshu Banerjee, Gagandeep Singh, Sasa Misailovic

    Abstract: Randomized smoothing-based certification is an effective approach for obtaining robustness certificates of deep neural networks (DNNs) against adversarial attacks. This method constructs a smoothed DNN model and certifies its robustness through statistical sampling, but it is computationally expensive, especially when certifying with a large number of samples. Furthermore, when the smoothed model… ▽ More

    Submitted 10 April, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: ICLR 2024

  16. arXiv:2303.01262  [pdf, other

    cs.LG cs.CR cs.IT

    Subset-Based Instance Optimality in Private Estimation

    Authors: Travis Dick, Alex Kulesza, Ziteng Sun, Ananda Theertha Suresh

    Abstract: We propose a new definition of instance optimality for differentially private estimation algorithms. Our definition requires an optimal algorithm to compete, simultaneously for every dataset $D$, with the best private benchmark algorithm that (a) knows $D$ in advance and (b) is evaluated by its worst-case performance on large subsets of $D$. That is, the benchmark algorithm need not perform well w… ▽ More

    Submitted 28 May, 2024; v1 submitted 1 March, 2023; originally announced March 2023.

  17. arXiv:2302.06869  [pdf, other

    stat.ML cs.DM cs.IT cs.LG math.PR

    Concentration Bounds for Discrete Distribution Estimation in KL Divergence

    Authors: Clément L. Canonne, Ziteng Sun, Ananda Theertha Suresh

    Abstract: We study the problem of discrete distribution estimation in KL divergence and provide concentration bounds for the Laplace estimator. We show that the deviation from mean scales as $\sqrt{k}/n$ when $n \ge k$, improving upon the best prior result of $k/n$. We also establish a matching lower bound that shows that our bounds are tight up to polylogarithmic factors.

    Submitted 12 June, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

    Comments: Updated discussion of previous work

  18. arXiv:2301.11219  [pdf, other

    cs.CL cs.CY

    Characterizing the Entities in Harmful Memes: Who is the Hero, the Villain, the Victim?

    Authors: Shivam Sharma, Atharva Kulkarni, Tharun Suresh, Himanshi Mathur, Preslav Nakov, Md. Shad Akhtar, Tanmoy Chakraborty

    Abstract: Memes can sway people's opinions over social media as they combine visual and textual information in an easy-to-consume manner. Since memes instantly turn viral, it becomes crucial to infer their intent and potentially associated harmfulness to take timely measures as needed. A common problem associated with meme comprehension lies in detecting the entities referenced and characterizing the role o… ▽ More

    Submitted 10 April, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: Accepted at EACL 2023 (Main Track). 9 Pages (main content), Limitations, Ethical Considerations + 4 Pages (Refs.) + Appendix; 8 Figures; 5 Tables; Paper ID: 804

  19. arXiv:2212.00715  [pdf, other

    cs.CY cs.CL

    What do you MEME? Generating Explanations for Visual Semantic Role Labelling in Memes

    Authors: Shivam Sharma, Siddhant Agarwal, Tharun Suresh, Preslav Nakov, Md. Shad Akhtar, Tanmoy Chakraborty

    Abstract: Memes are powerful means for effective communication on social media. Their effortless amalgamation of viral visuals and compelling messages can have far-reaching implications with proper marketing. Previous research on memes has primarily focused on characterizing their affective spectrum and detecting whether the meme's message insinuates any intended harm, such as hate, offense, racism, etc. Ho… ▽ More

    Submitted 20 December, 2022; v1 submitted 1 December, 2022; originally announced December 2022.

    Comments: Accepted at AAAI 2023 (Main Track). 7 Pages (main content) + 2 Pages (Refs.); 3 Figures; 6 Tables; Paper ID: 10326 (AAAI'23)

  20. arXiv:2208.06135  [pdf, other

    cs.LG cs.CR stat.ML

    Private Domain Adaptation from a Public Source

    Authors: Raef Bassily, Mehryar Mohri, Ananda Theertha Suresh

    Abstract: A key problem in a variety of applications is that of domain adaptation from a public source domain, for which a relatively large amount of labeled data with no privacy constraints is at one's disposal, to a private target domain, for which a private sample is available with very few or no labeled data. In regression problems with no privacy constraints on the source or target data, a discrepancy… ▽ More

    Submitted 12 August, 2022; originally announced August 2022.

  21. arXiv:2206.08406  [pdf, other

    cs.SI cs.AI cs.CL

    Predicting Hate Intensity of Twitter Conversation Threads

    Authors: Qing Meng, Tharun Suresh, Roy Ka-Wei Lee, Tanmoy Chakraborty

    Abstract: Tweets are the most concise form of communication in online social media, wherein a single tweet has the potential to make or break the discourse of the conversation. Online hate speech is more accessible than ever, and stifling its propagation is of utmost importance for social media companies and users for congenial communication. Most of the research barring a recent few has focused on classify… ▽ More

    Submitted 14 May, 2023; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: Accepted in Knowledge-Based Systems, 30 pages (main content) + 9 pages (Refs.), 11 figures, 3 tables

  22. arXiv:2206.03886  [pdf, other

    cs.CL

    Counseling Summarization using Mental Health Knowledge Guided Utterance Filtering

    Authors: Aseem Srivastava, Tharun Suresh, Sarah Peregrine, Lord, Md. Shad Akhtar, Tanmoy Chakraborty

    Abstract: The psychotherapy intervention technique is a multifaceted conversation between a therapist and a patient. Unlike general clinical discussions, psychotherapy's core components (viz. symptoms) are hard to distinguish, thus becoming a complex problem to summarize later. A structured counseling conversation may contain discussions about symptoms, history of mental health issues, or the discovery of t… ▽ More

    Submitted 8 June, 2022; originally announced June 2022.

    Comments: Full paper accepted at KDD 2022 -- ADS Track

  23. arXiv:2206.03008  [pdf, other

    cs.LG cs.CR

    Algorithms for bounding contribution for histogram estimation under user-level privacy

    Authors: Yuhan Liu, Ananda Theertha Suresh, Wennan Zhu, Peter Kairouz, Marco Gruteser

    Abstract: We study the problem of histogram estimation under user-level differential privacy, where the goal is to preserve the privacy of all entries of any single user. We consider the heterogeneous scenario where the quantity of data can be different for each user. In this scenario, the amount of noise injected into the histogram to obtain differential privacy is proportional to the maximum user contribu… ▽ More

    Submitted 30 June, 2023; v1 submitted 7 June, 2022; originally announced June 2022.

    Comments: 32 pages, ICML 2023

  24. arXiv:2204.12753  [pdf, other

    cs.CL

    A Comprehensive Understanding of Code-mixed Language Semantics using Hierarchical Transformer

    Authors: Ayan Sengupta, Tharun Suresh, Md Shad Akhtar, Tanmoy Chakraborty

    Abstract: Being a popular mode of text-based communication in multilingual communities, code-mixing in online social media has became an important subject to study. Learning the semantics and morphology of code-mixed language remains a key challenge, due to scarcity of data and unavailability of robust and language-invariant representation learning technique. Any morphologically-rich language can benefit fr… ▽ More

    Submitted 27 April, 2022; originally announced April 2022.

    Comments: 12 pages, 1 figure, 11 tables

  25. arXiv:2204.10376  [pdf, other

    cs.LG stat.ML

    Differentially Private Learning with Margin Guarantees

    Authors: Raef Bassily, Mehryar Mohri, Ananda Theertha Suresh

    Abstract: We present a series of new differentially private (DP) algorithms with dimension-independent margin guarantees. For the family of linear hypotheses, we give a pure DP learning algorithm that benefits from relative deviation margin guarantees, as well as an efficient DP learning algorithm with margin guarantees. We also present a new efficient DP learning algorithm with margin guarantees for kernel… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

  26. arXiv:2204.09715  [pdf, other

    cs.CL cs.LG

    Scaling Language Model Size in Cross-Device Federated Learning

    Authors: Jae Hun Ro, Theresa Breiner, Lara McConnaughey, Mingqing Chen, Ananda Theertha Suresh, Shankar Kumar, Rajiv Mathews

    Abstract: Most studies in cross-device federated learning focus on small models, due to the server-client communication and on-device computation bottlenecks. In this work, we leverage various techniques for mitigating these bottlenecks to train larger language models in cross-device federated learning. With systematic applications of partial model training, quantization, efficient transfer learning, and co… ▽ More

    Submitted 24 June, 2022; v1 submitted 31 March, 2022; originally announced April 2022.

  27. arXiv:2203.04925  [pdf, other

    cs.LG cs.DS cs.IT

    Correlated quantization for distributed mean estimation and optimization

    Authors: Ananda Theertha Suresh, Ziteng Sun, Jae Hun Ro, Felix Yu

    Abstract: We study the problem of distributed mean estimation and optimization under communication constraints. We propose a correlated quantization protocol whose leading term in the error guarantee depends on the mean deviation of data points rather than only their absolute range. The design doesn't need any prior knowledge on the concentration property of the dataset, which is required to get such depend… ▽ More

    Submitted 8 July, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

  28. arXiv:2203.03761  [pdf, other

    cs.LG stat.ML

    The Fundamental Price of Secure Aggregation in Differentially Private Federated Learning

    Authors: Wei-Ning Chen, Christopher A. Choquette-Choo, Peter Kairouz, Ananda Theertha Suresh

    Abstract: We consider the problem of training a $d$ dimensional model with distributed differential privacy (DP) where secure aggregation (SecAgg) is used to ensure that the server only sees the noisy sum of $n$ model updates in every training round. Taking into account the constraints imposed by SecAgg, we characterize the fundamental communication cost required to obtain the best accuracy achievable under… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

  29. arXiv:2202.00126  [pdf, other

    cs.SI cs.CY cs.LG

    Handling Bias in Toxic Speech Detection: A Survey

    Authors: Tanmay Garg, Sarah Masud, Tharun Suresh, Tanmoy Chakraborty

    Abstract: Detecting online toxicity has always been a challenge due to its inherent subjectivity. Factors such as the context, geography, socio-political climate, and background of the producers and consumers of the posts play a crucial role in determining if the content can be flagged as toxic. Adoption of automated toxicity detection models in production can thus lead to a sidelining of the various groups… ▽ More

    Submitted 15 January, 2023; v1 submitted 26 January, 2022; originally announced February 2022.

    Comments: Accepted in ACM Computing Surveys, 30 pages, 5 figures, 7 tables

  30. arXiv:2111.05320  [pdf, ps, other

    cs.DS cs.IT math.ST stat.ML

    Robust Estimation for Random Graphs

    Authors: Jayadev Acharya, Ayush Jain, Gautam Kamath, Ananda Theertha Suresh, Huanyu Zhang

    Abstract: We study the problem of robustly estimating the parameter $p$ of an Erdős-Rényi random graph on $n$ nodes, where a $γ$ fraction of nodes may be adversarially corrupted. After showing the deficiencies of canonical estimators, we design a computationally-efficient spectral algorithm which estimates $p$ up to accuracy $\tilde O(\sqrt{p(1-p)}/n + γ\sqrt{p(1-p)} /\sqrt{n}+ γ/n)$ for $γ< 1/60$. Furtherm… ▽ More

    Submitted 15 February, 2022; v1 submitted 9 November, 2021; originally announced November 2021.

  31. arXiv:2110.15440  [pdf, other

    cs.CR cs.LG

    HD-cos Networks: Efficient Neural Architectures for Secure Multi-Party Computation

    Authors: Wittawat Jitkrittum, Michal Lukasik, Ananda Theertha Suresh, Felix Yu, Gang Wang

    Abstract: Multi-party computation (MPC) is a branch of cryptography where multiple non-colluding parties execute a well designed protocol to securely compute a function. With the non-colluding party assumption, MPC has a cryptographic guarantee that the parties will not learn sensitive information from the computation process, making it an appealing framework for applications that involve privacy-sensitive… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

  32. arXiv:2108.02117  [pdf, other

    cs.LG

    FedJAX: Federated learning simulation with JAX

    Authors: Jae Hun Ro, Ananda Theertha Suresh, Ke Wu

    Abstract: Federated learning is a machine learning technique that enables training across decentralized data. Recently, federated learning has become an active area of research due to an increased focus on privacy and security. In light of this, a variety of open source federated learning libraries have been developed and released. We introduce FedJAX, a JAX-based open source library for federated learning… ▽ More

    Submitted 5 November, 2021; v1 submitted 4 August, 2021; originally announced August 2021.

  33. arXiv:2107.06917  [pdf, other

    cs.LG

    A Field Guide to Federated Optimization

    Authors: Jianyu Wang, Zachary Charles, Zheng Xu, Gauri Joshi, H. Brendan McMahan, Blaise Aguera y Arcas, Maruan Al-Shedivat, Galen Andrew, Salman Avestimehr, Katharine Daly, Deepesh Data, Suhas Diggavi, Hubert Eichner, Advait Gadhikar, Zachary Garrett, Antonious M. Girgis, Filip Hanzely, Andrew Hard, Chaoyang He, Samuel Horvath, Zhouyuan Huo, Alex Ingerman, Martin Jaggi, Tara Javidi, Peter Kairouz , et al. (28 additional authors not shown)

    Abstract: Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with privacy and system requirements, and… ▽ More

    Submitted 14 July, 2021; originally announced July 2021.

  34. arXiv:2106.10370  [pdf, other

    stat.ML cs.AI cs.LG

    On the benefits of maximum likelihood estimation for Regression and Forecasting

    Authors: Pranjal Awasthi, Abhimanyu Das, Rajat Sen, Ananda Theertha Suresh

    Abstract: We advocate for a practical Maximum Likelihood Estimation (MLE) approach towards designing loss functions for regression and forecasting, as an alternative to the typical approach of direct empirical risk minimization on a specific target metric. The MLE approach is better suited to capture inductive biases such as prior domain knowledge in datasets, and can output post-hoc estimators at inference… ▽ More

    Submitted 9 October, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

  35. arXiv:2105.05180  [pdf, other

    cs.CR cs.LG

    On the Renyi Differential Privacy of the Shuffle Model

    Authors: Antonious M. Girgis, Deepesh Data, Suhas Diggavi, Ananda Theertha Suresh, Peter Kairouz

    Abstract: The central question studied in this paper is Renyi Differential Privacy (RDP) guarantees for general discrete local mechanisms in the shuffle privacy model. In the shuffle model, each of the $n$ clients randomizes its response using a local differentially private (LDP) mechanism and the untrusted server only receives a random permutation (shuffle) of the client responses without association to ea… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

  36. arXiv:2104.02748  [pdf, other

    cs.LG

    Communication-Efficient Agnostic Federated Averaging

    Authors: Jae Ro, Mingqing Chen, Rajiv Mathews, Mehryar Mohri, Ananda Theertha Suresh

    Abstract: In distributed learning settings such as federated learning, the training algorithm can be potentially biased towards different clients. Mohri et al. (2019) proposed a domain-agnostic learning algorithm, where the model is optimized for any target distribution formed by a mixture of the client distributions in order to overcome this bias. They further proposed an algorithm for the cross-silo feder… ▽ More

    Submitted 15 June, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

  37. arXiv:2103.03279  [pdf, ps, other

    cs.LG cs.AI

    Remember What You Want to Forget: Algorithms for Machine Unlearning

    Authors: Ayush Sekhari, Jayadev Acharya, Gautam Kamath, Ananda Theertha Suresh

    Abstract: We study the problem of unlearning datapoints from a learnt model. The learner first receives a dataset $S$ drawn i.i.d. from an unknown distribution, and outputs a model $\widehat{w}$ that performs well on unseen samples from the same distribution. However, at some point in the future, any training datapoint $z \in S$ can request to be unlearned, thus prompting the learner to modify its output mo… ▽ More

    Submitted 22 July, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

  38. arXiv:2102.11845  [pdf, other

    cs.LG cs.CR math.OC stat.ML

    Learning with User-Level Privacy

    Authors: Daniel Levy, Ziteng Sun, Kareem Amin, Satyen Kale, Alex Kulesza, Mehryar Mohri, Ananda Theertha Suresh

    Abstract: We propose and analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints. Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution ($m \ge 1$ samples), providing more stringent but more realistic protection against information leaks. We show that for high-dimensional mean estimation, empirical… ▽ More

    Submitted 3 December, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: NeurIPS 2021. 43 pages, 0 figure

  39. arXiv:2011.12160  [pdf, ps, other

    cs.IT cs.LG stat.ML

    Wyner-Ziv Estimators for Distributed Mean Estimation with Side Information and Optimization

    Authors: Prathamesh Mayekar, Shubham Jha, Ananda Theertha Suresh, Himanshu Tyagi

    Abstract: Communication efficient distributed mean estimation is an important primitive that arises in many distributed learning and optimization scenarios such as federated learning. Without any probabilistic assumptions on the underlying data, we study the problem of distributed mean estimation where the server has access to side information. We propose \emph{Wyner-Ziv estimators}, which are communication… ▽ More

    Submitted 14 November, 2022; v1 submitted 24 November, 2020; originally announced November 2020.

  40. arXiv:2011.01848  [pdf, other

    math.ST cs.IT cs.LG stat.ML

    Robust hypothesis testing and distribution estimation in Hellinger distance

    Authors: Ananda Theertha Suresh

    Abstract: We propose a simple robust hypothesis test that has the same sample complexity as that of the optimal Neyman-Pearson test up to constants, but robust to distribution perturbations under Hellinger distance. We discuss the applicability of such a robust test for estimating distributions in Hellinger distance. We empirically demonstrate the power of the test on canonical distributions.

    Submitted 3 November, 2020; originally announced November 2020.

  41. arXiv:2008.11036  [pdf, other

    cs.LG stat.ML

    A Discriminative Technique for Multiple-Source Adaptation

    Authors: Corinna Cortes, Mehryar Mohri, Ananda Theertha Suresh, Ningshan Zhang

    Abstract: We present a new discriminative technique for the multiple-source adaptation, MSA, problem. Unlike previous work, which relies on density estimation for each source domain, our solution only requires conditional probabilities that can easily be accurately estimated from unlabeled data from the source domains. We give a detailed analysis of our new technique, including general guarantees based on R… ▽ More

    Submitted 12 February, 2021; v1 submitted 25 August, 2020; originally announced August 2020.

  42. arXiv:2008.07180  [pdf, ps, other

    cs.LG cs.IT stat.ML

    Shuffled Model of Federated Learning: Privacy, Communication and Accuracy Trade-offs

    Authors: Antonious M. Girgis, Deepesh Data, Suhas Diggavi, Peter Kairouz, Ananda Theertha Suresh

    Abstract: We consider a distributed empirical risk minimization (ERM) optimization problem with communication efficiency and privacy requirements, motivated by the federated learning (FL) framework. Unique challenges to the traditional ERM problem in the context of FL include (i) need to provide privacy guarantees on clients' data, (ii) compress the communication between clients and the server, since client… ▽ More

    Submitted 23 September, 2020; v1 submitted 17 August, 2020; originally announced August 2020.

  43. arXiv:2008.03606  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning

    Authors: Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

    Abstract: Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon. In fact, obtaining an algorithm for FL which is uniformly better than simple centralized training has been a major open problem thus far. In this work, we propose a general algorithmic framework, Mime, which i) mitigates cl… ▽ More

    Submitted 8 June, 2021; v1 submitted 8 August, 2020; originally announced August 2020.

    Comments: Version 2 provides stronger theoretical results and more thorough experiments

    MSC Class: 68W40; 68W15; 90C25; 90C06 ACM Class: G.1.6; F.2.1; E.4

  44. arXiv:2007.13660  [pdf, other

    cs.LG cs.CR cs.DS cs.IT stat.ML

    Learning discrete distributions: user vs item-level privacy

    Authors: Yuhan Liu, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, Michael Riley

    Abstract: Much of the literature on differential privacy focuses on item-level privacy, where loosely speaking, the goal is to provide privacy per item or training example. However, recently many practical applications such as federated learning require preserving privacy for all items of a single user, which is much harder to achieve. Therefore understanding the theoretical limit of user-level privacy beco… ▽ More

    Submitted 11 January, 2021; v1 submitted 27 July, 2020; originally announced July 2020.

    Comments: NeurIPS 2020, 38 pages

  45. arXiv:2007.09762  [pdf, other

    cs.LG stat.ML

    A Theory of Multiple-Source Adaptation with Limited Target Labeled Data

    Authors: Yishay Mansour, Mehryar Mohri, Jae Ro, Ananda Theertha Suresh, Ke Wu

    Abstract: We present a theoretical and algorithmic study of the multiple-source domain adaptation problem in the common scenario where the learner has access only to a limited amount of labeled target data, but where the learner has at disposal a large amount of labeled data from multiple source domains. We show that a new family of algorithms based on model selection ideas benefits from very favorable guar… ▽ More

    Submitted 29 October, 2020; v1 submitted 19 July, 2020; originally announced July 2020.

    Comments: 20 pages

  46. arXiv:2006.14950  [pdf, other

    cs.LG stat.ML

    Relative Deviation Margin Bounds

    Authors: Corinna Cortes, Mehryar Mohri, Ananda Theertha Suresh

    Abstract: We present a series of new and more favorable margin-based learning guarantees that depend on the empirical margin loss of a predictor. We give two types of learning bounds, both distribution-dependent and valid for general families, in terms of the Rademacher complexity or the empirical $\ell_\infty$ covering number of the hypothesis set used. Furthermore, using our relative deviation margin boun… ▽ More

    Submitted 28 October, 2020; v1 submitted 26 June, 2020; originally announced June 2020.

    Comments: 29 pages

  47. arXiv:2002.10619  [pdf, other

    cs.LG stat.ML

    Three Approaches for Personalization with Applications to Federated Learning

    Authors: Yishay Mansour, Mehryar Mohri, Jae Ro, Ananda Theertha Suresh

    Abstract: The standard objective in machine learning is to train a single model for all users. However, in many learning scenarios, such as cloud computing and federated learning, it is possible to learn a personalized model per user. In this work, we present a systematic learning-theoretic study of personalization. We propose and analyze three approaches: user clustering, data interpolation, and model inte… ▽ More

    Submitted 19 July, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

    Comments: 24 pages

  48. arXiv:1912.04977  [pdf, other

    cs.LG cs.CR stat.ML

    Advances and Open Problems in Federated Learning

    Authors: Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson , et al. (34 additional authors not shown)

    Abstract: Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while kee** the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs re… ▽ More

    Submitted 8 March, 2021; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: Published in Foundations and Trends in Machine Learning Vol 4 Issue 1. See: https://www.nowpublishers.com/article/Details/MAL-083

  49. arXiv:1911.07963  [pdf, other

    cs.LG cs.CR stat.ML

    Can You Really Backdoor Federated Learning?

    Authors: Ziteng Sun, Peter Kairouz, Ananda Theertha Suresh, H. Brendan McMahan

    Abstract: The decentralized nature of federated learning makes detecting and defending against adversarial attacks a challenging task. This paper focuses on backdoor attacks in the federated learning setting, where the goal of the adversary is to reduce the performance of the model on targeted tasks while maintaining good performance on the main task. Unlike existing works, we allow non-malicious clients to… ▽ More

    Submitted 2 December, 2019; v1 submitted 18 November, 2019; originally announced November 2019.

    Comments: To appear at the 2nd International Workshop on Federated Learning for Data Privacy and Confidentiality at NeurIPS 2019

  50. arXiv:1910.06378  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

    Authors: Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

    Abstract: Federated Averaging (FedAvg) has emerged as the algorithm of choice for federated learning due to its simplicity and low communication cost. However, in spite of recent research efforts, its performance is not fully understood. We obtain tight convergence rates for FedAvg and prove that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow converge… ▽ More

    Submitted 9 April, 2021; v1 submitted 14 October, 2019; originally announced October 2019.

    Comments: v2 contains analysis of FedAvg, non-convex rates of Scaffold, and experimental evaluation. v3 fixes typos, ICML version. v4 slightly improves rate of SCAFFOLD for general convex functions

    MSC Class: 68W40; 68W15; 90C25; 90C06 ACM Class: G.1.6; F.2.1; E.4