Skip to main content

Showing 1–31 of 31 results for author: Charles, Z

.
  1. arXiv:2407.07737  [pdf, other

    cs.LG cs.CL cs.CR cs.DC

    Fine-Tuning Large Language Models with User-Level Differential Privacy

    Authors: Zachary Charles, Arun Ganesh, Ryan McKenna, H. Brendan McMahan, Nicole Mitchell, Krishna Pillutla, Keith Rush

    Abstract: We investigate practical and scalable algorithms for training large language models (LLMs) with user-level differential privacy (DP) in order to provably safeguard all the examples contributed by each user. We study two variants of DP-SGD with: (1) example-level sampling (ELS) and per-example gradient clip**, and (2) user-level sampling (ULS) and per-user gradient clip**. We derive a novel use… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  2. arXiv:2403.07128  [pdf, other

    cs.DC cs.LG

    FAX: Scalable and Differentiable Federated Primitives in JAX

    Authors: Keith Rush, Zachary Charles, Zachary Garrett

    Abstract: We present FAX, a JAX-based library designed to support large-scale distributed and federated computations in both data center and cross-device applications. FAX leverages JAX's sharding mechanisms to enable native targeting of TPUs and state-of-the-art JAX runtimes, including Pathways. FAX embeds building blocks for federated computations as primitives in JAX. This enables three key benefits. Fir… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  3. arXiv:2311.10291  [pdf, other

    cs.LG

    Leveraging Function Space Aggregation for Federated Learning at Scale

    Authors: Nikita Dhawan, Nicole Mitchell, Zachary Charles, Zachary Garrett, Gintare Karolina Dziugaite

    Abstract: The federated learning paradigm has motivated the development of methods for aggregating multiple client updates into a global server model, without sharing client data. Many federated learning algorithms, including the canonical Federated Averaging (FedAvg), take a direct (possibly weighted) average of the client parameter updates, motivated by results in distributed optimization. In this work, w… ▽ More

    Submitted 16 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: 23 pages, 10 figures. Transactions on Machine Learning Research, 2024

  4. arXiv:2307.09619  [pdf, other

    cs.LG cs.DC

    Towards Federated Foundation Models: Scalable Dataset Pipelines for Group-Structured Learning

    Authors: Zachary Charles, Nicole Mitchell, Krishna Pillutla, Michael Reneer, Zachary Garrett

    Abstract: We introduce Dataset Grouper, a library to create large-scale group-structured (e.g., federated) datasets, enabling federated learning simulation at the scale of foundation models. This library facilitates the creation of group-structured versions of existing datasets based on user-specified partitions and directly leads to a variety of useful heterogeneous datasets that can be plugged into existi… ▽ More

    Submitted 21 December, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: Dataset Grouper is available at https://github.com/google-research/dataset_grouper

    Journal ref: NeurIPS 2023 (Datasets & Benchmarks)

  5. arXiv:2302.01463  [pdf, other

    cs.LG

    Gradient Descent with Linearly Correlated Noise: Theory and Applications to Differential Privacy

    Authors: Anastasia Koloskova, Ryan McKenna, Zachary Charles, Keith Rush, Brendan McMahan

    Abstract: We study gradient descent under linearly correlated noise. Our work is motivated by recent practical methods for optimization with differential privacy (DP), such as DP-FTRL, which achieve strong performance in settings where privacy amplification techniques are infeasible (such as in federated learning). These methods inject privacy noise through a matrix factorization mechanism, making the noise… ▽ More

    Submitted 15 January, 2024; v1 submitted 2 February, 2023; originally announced February 2023.

  6. arXiv:2301.07806  [pdf, other

    cs.LG cs.DC cs.SC

    Federated Automatic Differentiation

    Authors: Keith Rush, Zachary Charles, Zachary Garrett

    Abstract: Federated learning (FL) is a general framework for learning across heterogeneous clients while preserving data privacy, under the orchestration of a central server. FL methods often compute gradients of loss functions purely locally (ie. entirely at each client, or entirely at the server), typically using automatic differentiation (AD) techniques. We propose a federated automatic differentiation (… ▽ More

    Submitted 18 January, 2023; originally announced January 2023.

    Comments: 36 pages, 13 figures

  7. arXiv:2208.09432  [pdf, other

    cs.LG cs.DC

    Federated Select: A Primitive for Communication- and Memory-Efficient Federated Learning

    Authors: Zachary Charles, Kallista Bonawitz, Stanislav Chiknavaryan, Brendan McMahan, Blaise Agüera y Arcas

    Abstract: Federated learning (FL) is a framework for machine learning across heterogeneous client devices in a privacy-preserving fashion. To date, most FL algorithms learn a "global" server model across multiple rounds. At each round, the same server model is broadcast to all participating clients, updated locally, and then aggregated across clients. In this work, we propose a more general procedure in whi… ▽ More

    Submitted 19 August, 2022; originally announced August 2022.

  8. arXiv:2206.09262  [pdf, other

    cs.LG cs.DC

    Motley: Benchmarking Heterogeneity and Personalization in Federated Learning

    Authors: Shanshan Wu, Tian Li, Zachary Charles, Yu Xiao, Ziyu Liu, Zheng Xu, Virginia Smith

    Abstract: Personalized federated learning considers learning models unique to each client in a heterogeneous network. The resulting client-specific models have been purported to improve metrics such as accuracy, fairness, and robustness in federated networks. However, despite a plethora of work in this area, it remains unclear: (1) which personalization techniques are most effective in various settings, and… ▽ More

    Submitted 26 September, 2022; v1 submitted 18 June, 2022; originally announced June 2022.

    Comments: 40 pages, 10 figures, 7 tables. EMNIST and Landmarks fine-tuning results are corrected in (and after) v5. Code: https://github.com/google-research/federated/tree/master/personalization_benchmark

  9. arXiv:2201.02664  [pdf, other

    cs.LG cs.DC cs.IT stat.ML

    Optimizing the Communication-Accuracy Trade-off in Federated Learning with Rate-Distortion Theory

    Authors: Nicole Mitchell, Johannes Ballé, Zachary Charles, Jakub Konečný

    Abstract: A significant bottleneck in federated learning (FL) is the network communication cost of sending model updates from client devices to the central server. We present a comprehensive empirical study of the statistics of model updates in FL, as well as the role and benefits of various compression techniques. Motivated by these observations, we propose a novel method to reduce the average communicatio… ▽ More

    Submitted 19 May, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

  10. arXiv:2109.03973  [pdf, other

    math.OC cs.DC cs.LG math.CA

    Iterated Vector Fields and Conservatism, with Applications to Federated Learning

    Authors: Zachary Charles, Keith Rush

    Abstract: We study whether iterated vector fields (vector fields composed with themselves) are conservative. We give explicit examples of vector fields for which this self-composition preserves conservatism. Notably, this includes gradient vector fields of loss functions associated with some generalized linear models. As we show, characterizing the set of vector fields satisfying this condition leads to non… ▽ More

    Submitted 12 November, 2021; v1 submitted 8 September, 2021; originally announced September 2021.

  11. arXiv:2107.06917  [pdf, other

    cs.LG

    A Field Guide to Federated Optimization

    Authors: Jianyu Wang, Zachary Charles, Zheng Xu, Gauri Joshi, H. Brendan McMahan, Blaise Aguera y Arcas, Maruan Al-Shedivat, Galen Andrew, Salman Avestimehr, Katharine Daly, Deepesh Data, Suhas Diggavi, Hubert Eichner, Advait Gadhikar, Zachary Garrett, Antonious M. Girgis, Filip Hanzely, Andrew Hard, Chaoyang He, Samuel Horvath, Zhouyuan Huo, Alex Ingerman, Martin Jaggi, Tara Javidi, Peter Kairouz , et al. (28 additional authors not shown)

    Abstract: Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with privacy and system requirements, and… ▽ More

    Submitted 14 July, 2021; originally announced July 2021.

  12. arXiv:2106.07820  [pdf, other

    cs.LG cs.DC

    On Large-Cohort Training for Federated Learning

    Authors: Zachary Charles, Zachary Garrett, Zhouyuan Huo, Sergei Shmulyian, Virginia Smith

    Abstract: Federated learning methods typically learn a model by iteratively sampling updates from a population of clients. In this work, we explore how the number of clients sampled at each round (the cohort size) impacts the quality of the learned model and the training dynamics of federated learning algorithms. Our work poses three fundamental questions. First, what challenges arise when trying to scale f… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

  13. arXiv:2106.02305  [pdf, other

    cs.LG cs.DC stat.ML

    Local Adaptivity in Federated Learning: Convergence and Consistency

    Authors: Jianyu Wang, Zheng Xu, Zachary Garrett, Zachary Charles, Luyang Liu, Gauri Joshi

    Abstract: The federated learning (FL) framework trains a machine learning model using decentralized data stored at edge client devices by periodically aggregating locally trained models. Popular optimization algorithms of FL use vanilla (stochastic) gradient descent for both local updates at clients and global updates at the aggregating server. Recently, adaptive optimization methods such as AdaGrad have be… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

  14. arXiv:2103.05032  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Convergence and Accuracy Trade-Offs in Federated Learning and Meta-Learning

    Authors: Zachary Charles, Jakub Konečný

    Abstract: We study a family of algorithms, which we refer to as local update methods, generalizing many federated and meta-learning algorithms. We prove that for quadratic models, local update methods are equivalent to first-order optimization on a surrogate loss we exactly characterize. Moreover, fundamental algorithmic choices (such as learning rates) explicitly govern a trade-off between the condition nu… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

    Journal ref: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021. PMLR: Volume 130

  15. arXiv:2007.00878  [pdf, other

    cs.LG math.OC stat.ML

    On the Outsized Importance of Learning Rates in Local Update Methods

    Authors: Zachary Charles, Jakub Konečný

    Abstract: We study a family of algorithms, which we refer to as local update methods, that generalize many federated learning and meta-learning algorithms. We prove that for quadratic objectives, local update methods perform stochastic gradient descent on a surrogate loss function which we exactly characterize. We show that the choice of client learning rate controls the condition number of that surrogate l… ▽ More

    Submitted 2 July, 2020; originally announced July 2020.

  16. arXiv:2003.00295  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Adaptive Federated Optimization

    Authors: Sashank Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečný, Sanjiv Kumar, H. Brendan McMahan

    Abstract: Federated learning is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data. Standard federated optimization methods such as Federated Averaging (FedAvg) are often difficult to tune and exhibit unfavorable convergence behavior. In non-federated settings, adaptive optimization methods have… ▽ More

    Submitted 8 September, 2021; v1 submitted 29 February, 2020; originally announced March 2020.

    Comments: Published as a conference paper at ICLR 2021

  17. arXiv:1912.04977  [pdf, other

    cs.LG cs.CR stat.ML

    Advances and Open Problems in Federated Learning

    Authors: Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson , et al. (34 additional authors not shown)

    Abstract: Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while kee** the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs re… ▽ More

    Submitted 8 March, 2021; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: Published in Foundations and Trends in Machine Learning Vol 4 Issue 1. See: https://www.nowpublishers.com/article/Details/MAL-083

  18. arXiv:1907.12205  [pdf, other

    cs.LG cs.DC stat.ML

    DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation

    Authors: Shashank Rajput, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos

    Abstract: To improve the resilience of distributed training to worst-case, or Byzantine node failures, several recent approaches have replaced gradient averaging with robust aggregation methods. Such techniques can have high computational costs, often quadratic in the number of compute nodes, and only have limited robustness guarantees. Other methods have instead used redundancy to guarantee robustness, but… ▽ More

    Submitted 7 March, 2020; v1 submitted 29 July, 2019; originally announced July 2019.

  19. arXiv:1905.09209  [pdf, other

    cs.LG math.OC stat.ML

    Convergence and Margin of Adversarial Training on Separable Data

    Authors: Zachary Charles, Shashank Rajput, Stephen Wright, Dimitris Papailiopoulos

    Abstract: Adversarial training is a technique for training robust machine learning models. To encourage robustness, it iteratively computes adversarial examples for the model, and then re-trains on these examples via some update rule. This work analyzes the performance of adversarial training on linearly separable data, and provides bounds on the number of iterations required for large margin. We show that… ▽ More

    Submitted 22 May, 2019; originally announced May 2019.

  20. arXiv:1905.03177  [pdf, other

    cs.LG stat.ML

    Does Data Augmentation Lead to Positive Margin?

    Authors: Shashank Rajput, Zhili Feng, Zachary Charles, Po-Ling Loh, Dimitris Papailiopoulos

    Abstract: Data augmentation (DA) is commonly used during model training, as it significantly improves test error and model robustness. DA artificially expands the training set by applying random noise, rotations, crops, or even adversarial perturbations to the input data. Although DA is widely used, its capacity to provably improve robustness is not fully understood. In this work, we analyze the robustness… ▽ More

    Submitted 8 May, 2019; originally announced May 2019.

    Comments: ICML 2019

  21. arXiv:1901.09671  [pdf, other

    cs.LG cs.DC cs.IT math.OC stat.ML

    ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding

    Authors: Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos

    Abstract: We present ErasureHead, a new approach for distributed gradient descent (GD) that mitigates system delays by employing approximate gradient coding. Gradient coded distributed GD uses redundancy to exactly recover the gradient at each iteration from a subset of compute nodes. ErasureHead instead uses approximate gradient codes to recover an inexact gradient at each iteration, but with higher delay… ▽ More

    Submitted 28 January, 2019; originally announced January 2019.

  22. arXiv:1811.03531  [pdf, other

    cs.LG stat.ML

    A Geometric Perspective on the Transferability of Adversarial Directions

    Authors: Zachary Charles, Harrison Rosenberg, Dimitris Papailiopoulos

    Abstract: State-of-the-art machine learning models frequently misclassify inputs that have been perturbed in an adversarial manner. Adversarial perturbations generated for a given input and a specific classifier often seem to be effective on other inputs and even different classifiers. In other words, adversarial perturbations seem to transfer between different inputs, models, and even different neural netw… ▽ More

    Submitted 8 November, 2018; originally announced November 2018.

  23. arXiv:1806.04090  [pdf, other

    stat.ML cs.DC cs.LG

    ATOMO: Communication-efficient Learning via Atomic Sparsification

    Authors: Hongyi Wang, Scott Sievert, Zachary Charles, Shengchao Liu, Stephen Wright, Dimitris Papailiopoulos

    Abstract: Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes. To mitigate these overheads, several studies propose the use of sparsified stochastic gradients. We argue that these are facets of a general sparsification method that can operate on any possible atomic decomposition. Notable examples include element-wise, singular va… ▽ More

    Submitted 8 November, 2018; v1 submitted 11 June, 2018; originally announced June 2018.

  24. arXiv:1805.10378  [pdf, other

    stat.ML cs.DC cs.IT cs.LG stat.CO

    Gradient Coding via the Stochastic Block Model

    Authors: Zachary Charles, Dimitris Papailiopoulos

    Abstract: Gradient descent and its many variants, including mini-batch stochastic gradient descent, form the algorithmic foundation of modern large-scale machine learning. Due to the size and scale of modern data, gradient computations are often distributed across multiple compute nodes. Unfortunately, such distributed implementations can face significant delays caused by straggler nodes, i.e., nodes that a… ▽ More

    Submitted 25 May, 2018; originally announced May 2018.

  25. arXiv:1803.09877  [pdf, other

    stat.ML cs.DC cs.IT cs.LG cs.NE

    DRACO: Byzantine-resilient Distributed Training via Redundant Gradients

    Authors: Lingjiao Chen, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos

    Abstract: Distributed model training is vulnerable to byzantine system failures and adversarial compute nodes, i.e., nodes that use malicious updates to corrupt the global model stored at a parameter server (PS). To guarantee some form of robustness, recent work suggests using variants of the geometric median as an aggregation rule, in place of gradient averaging. Unfortunately, median-based rules can incur… ▽ More

    Submitted 21 June, 2018; v1 submitted 26 March, 2018; originally announced March 2018.

    Comments: Accepted by ICML 2018

  26. arXiv:1711.06771  [pdf, other

    stat.ML cs.DC cs.IT cs.LG stat.CO

    Approximate Gradient Coding via Sparse Random Graphs

    Authors: Zachary Charles, Dimitris Papailiopoulos, Jordan Ellenberg

    Abstract: Distributed algorithms are often beset by the straggler effect, where the slowest compute nodes in the system dictate the overall running time. Coding-theoretic techniques have been recently proposed to mitigate stragglers via algorithmic redundancy. Prior work in coded computation and gradient coding has mainly focused on exact recovery of the desired output. However, slightly inexact solutions c… ▽ More

    Submitted 17 November, 2017; originally announced November 2017.

  27. arXiv:1710.08402  [pdf, other

    stat.ML cs.IT cs.LG math.OC

    Stability and Generalization of Learning Algorithms that Converge to Global Optima

    Authors: Zachary Charles, Dimitris Papailiopoulos

    Abstract: We establish novel generalization bounds for learning algorithms that converge to global minima. We do so by deriving black-box stability results that only depend on the convergence of a learning algorithm and the geometry around the minimizers of the loss function. The results are shown for nonconvex loss functions satisfying the Polyak-Łojasiewicz (PL) and the quadratic growth (QG) conditions. W… ▽ More

    Submitted 23 October, 2017; originally announced October 2017.

    Comments: 27 pages, 5 figures

  28. arXiv:1708.08114  [pdf, ps, other

    math.OC

    Exploiting Algebraic Structure in Global Optimization and the Belgian Chocolate Problem

    Authors: Zachary Charles, Nigel Boston

    Abstract: The Belgian chocolate problem involves maximizing a parameter δ over a non-convex region of polynomials. In this paper we detail a global optimization method for this problem that outperforms previous such methods by exploiting underlying algebraic structure. Previous work has focused on iterative methods that, due to the complicated non-convex feasible region, may require many iterations or resul… ▽ More

    Submitted 27 August, 2017; originally announced August 2017.

    Comments: 15 pages

  29. arXiv:1707.02461  [pdf, other

    stat.ML

    Subspace Clustering with Missing and Corrupted Data

    Authors: Zachary Charles, Amin Jalali, Rebecca Willett

    Abstract: Given full or partial information about a collection of points that lie close to a union of several subspaces, subspace clustering refers to the process of clustering the points according to their subspace and identifying the subspaces. One popular approach, sparse subspace clustering (SSC), represents each sample as a weighted combination of the other samples, with weights of minimal $\ell_1$ nor… ▽ More

    Submitted 15 January, 2018; v1 submitted 8 July, 2017; originally announced July 2017.

    Comments: 31 pages, 2 figures

  30. arXiv:1612.06260  [pdf, ps, other

    math.NT cs.DS

    Generating Random Factored Ideals in Number Fields

    Authors: Zachary Charles

    Abstract: We present a randomized polynomial-time algorithm to generate a random integer according to the distribution of norms of ideals at most N in any given number field, along with the factorization of the integer. Using this algorithm, we can produce a random ideal in the ring of algebraic integers uniformly at random among ideals with norm up to N, in polynomial time. We also present a variant of thi… ▽ More

    Submitted 28 June, 2017; v1 submitted 15 December, 2016; originally announced December 2016.

    Comments: 7 pages

    MSC Class: 11Y16

  31. arXiv:1108.4810  [pdf, ps, other

    math.CO

    Nonpositive Eigenvalues of the Adjacency Matrix and Lower Bounds for Laplacian Eigenvalues

    Authors: Zachary B. Charles, Miriam Farber, Charles R. Johnson, Lee Kennedy-Shaffer

    Abstract: Let $NPO(k)$ be the smallest number $n$ such that the adjacency matrix of any undirected graph with $n$ vertices or more has at least $k$ nonpositive eigenvalues. We show that $NPO(k)$ is well-defined and prove that the values of $NPO(k)$ for $k=1,2,3,4,5$ are $1,3,6,10,16$ respectively. In addition, we prove that for all $k \geq 5$, $R(k,k+1) \ge NPO(k) > T_k$, in which $R(k,k+1)$ is the Ramsey n… ▽ More

    Submitted 26 May, 2012; v1 submitted 24 August, 2011; originally announced August 2011.

    Comments: 23 pages, 12 figures