Search | arXiv e-print repository

How to Make the Gradients Small Privately: Improved Rates for Differentially Private Non-Convex Optimization

Authors: Andrew Lowy, Jonathan Ullman, Stephen J. Wright

Abstract: We provide a simple and flexible framework for designing differentially private algorithms to find approximate stationary points of non-convex loss functions. Our framework is based on using a private approximate risk minimizer to "warm start" another private algorithm for finding stationary points. We use this framework to obtain improved, and sometimes optimal, rates for several classes of non-c… ▽ More We provide a simple and flexible framework for designing differentially private algorithms to find approximate stationary points of non-convex loss functions. Our framework is based on using a private approximate risk minimizer to "warm start" another private algorithm for finding stationary points. We use this framework to obtain improved, and sometimes optimal, rates for several classes of non-convex loss functions. First, we obtain improved rates for finding stationary points of smooth non-convex empirical loss functions. Second, we specialize to quasar-convex functions, which generalize star-convex functions and arise in learning dynamical systems and training some neural nets. We achieve the optimal rate for this class. Third, we give an optimal algorithm for finding stationary points of functions satisfying the Kurdyka-Lojasiewicz (KL) condition. For example, over-parameterized neural networks often satisfy this condition. Fourth, we provide new state-of-the-art rates for stationary points of non-convex population loss functions. Fifth, we obtain improved rates for non-convex generalized linear models. A modification of our algorithm achieves nearly the same rates for second-order stationary points of functions with Lipschitz Hessian, improving over the previous state-of-the-art for each of the above problems. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2306.15056 [pdf, other]

Optimal Differentially Private Model Training with Public Data

Authors: Andrew Lowy, Zeman Li, Tianjian Huang, Meisam Razaviyayn

Abstract: Differential privacy (DP) ensures that training a machine learning model does not leak private data. In practice, we may have access to auxiliary public data that is free of privacy concerns. In this work, we assume access to a given amount of public data and settle the following fundamental open questions: 1. What is the optimal (worst-case) error of a DP model trained over a private data set whi… ▽ More Differential privacy (DP) ensures that training a machine learning model does not leak private data. In practice, we may have access to auxiliary public data that is free of privacy concerns. In this work, we assume access to a given amount of public data and settle the following fundamental open questions: 1. What is the optimal (worst-case) error of a DP model trained over a private data set while having access to side public data? 2. How can we harness public data to improve DP model training in practice? We consider these questions in both the local and central models of pure and approximate DP. To answer the first question, we prove tight (up to log factors) lower and upper bounds that characterize the optimal error rates of three fundamental problems: mean estimation, empirical risk minimization, and stochastic convex optimization. We show that the optimal error rates can be attained (up to log factors) by either discarding private data and training a public model, or treating public data like it is private and using an optimal DP algorithm. To address the second question, we develop novel algorithms that are "even more optimal" (i.e. better constants) than the asymptotically optimal approaches described above. For local DP mean estimation, our algorithm is \ul{optimal including constants}. Empirically, our algorithms show benefits over the state-of-the-art. △ Less

Submitted 13 February, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

Comments: V2 changed the title and added high-dimensional approximate semi-DP lower bounds

arXiv:2209.07403 [pdf, other]

Private Stochastic Optimization With Large Worst-Case Lipschitz Parameter: Optimal Rates for (Non-Smooth) Convex Losses and Extension to Non-Convex Losses

Authors: Andrew Lowy, Meisam Razaviyayn

Abstract: We study differentially private (DP) stochastic optimization (SO) with loss functions whose worst-case Lipschitz parameter over all data points may be extremely large. To date, the vast majority of work on DP SO assumes that the loss is uniformly Lipschitz continuous over data (i.e. stochastic gradients are uniformly bounded over all data points). While this assumption is convenient, it often lead… ▽ More We study differentially private (DP) stochastic optimization (SO) with loss functions whose worst-case Lipschitz parameter over all data points may be extremely large. To date, the vast majority of work on DP SO assumes that the loss is uniformly Lipschitz continuous over data (i.e. stochastic gradients are uniformly bounded over all data points). While this assumption is convenient, it often leads to pessimistic excess risk bounds. In many practical problems, the worst-case (uniform) Lipschitz parameter of the loss over all data points may be extremely large due to outliers. In such cases, the error bounds for DP SO, which scale with the worst-case Lipschitz parameter of the loss, are vacuous. To address these limitations, this work provides near-optimal excess risk bounds that do not depend on the uniform Lipschitz parameter of the loss. Building on a recent line of work (Wang et al., 2020; Kamath et al., 2022), we assume that stochastic gradients have bounded $k$-th order moments for some $k \geq 2$. Compared with works on uniformly Lipschitz DP SO, our excess risk scales with the $k$-th moment bound instead of the uniform Lipschitz parameter of the loss, allowing for significantly faster rates in the presence of outliers and/or heavy-tailed data. For convex and strongly convex loss functions, we provide the first asymptotically optimal excess risk bounds (up to a logarithmic factor). In contrast to (Wang et al., 2020; Kamath et al., 2022), our bounds do not require the loss function to be differentiable/smooth. We also devise a linear-time algorithm for smooth losses that has excess risk that is tight in certain practical parameter regimes. Additionally, our work is the first to address non-convex non-uniformly Lipschitz loss functions satisfying the Proximal-PL inequality; this covers some practical machine learning models. Our Proximal-PL algorithm has near-optimal excess risk. △ Less

Submitted 27 October, 2023; v1 submitted 15 September, 2022; originally announced September 2022.

Comments: Appeared in the International Conference on Algorithmic Learning Theory (ALT) 2023. This version improves the runtime bound in Theorem 6

arXiv:2203.06735 [pdf, other]

Private Non-Convex Federated Learning Without a Trusted Server

Authors: Andrew Lowy, Ali Ghafelebashi, Meisam Razaviyayn

Abstract: We study federated learning (FL) -- especially cross-silo FL -- with non-convex loss functions and data from people who do not trust the server or other silos. In this setting, each silo (e.g. hospital) must protect the privacy of each person's data (e.g. patient's medical record), even if the server or other silos act as adversarial eavesdroppers. To that end, we consider inter-silo record-level… ▽ More We study federated learning (FL) -- especially cross-silo FL -- with non-convex loss functions and data from people who do not trust the server or other silos. In this setting, each silo (e.g. hospital) must protect the privacy of each person's data (e.g. patient's medical record), even if the server or other silos act as adversarial eavesdroppers. To that end, we consider inter-silo record-level (ISRL) differential privacy (DP), which requires silo~$i$'s communications to satisfy record/item-level DP. We propose novel ISRL-DP algorithms for FL with heterogeneous (non-i.i.d.) silo data and two classes of Lipschitz continuous loss functions: First, we consider losses satisfying the Proximal Polyak-Lojasiewicz (PL) inequality, which is an extension of the classical PL condition to the constrained setting. In contrast to our result, prior works only considered unconstrained private optimization with Lipschitz PL loss, which rules out most interesting PL losses such as strongly convex problems and linear/logistic regression. Our algorithms nearly attain the optimal strongly convex, homogeneous (i.i.d.) rate for ISRL-DP FL without assuming convexity or i.i.d. data. Second, we give the first private algorithms for non-convex non-smooth loss functions. Our utility bounds even improve on the state-of-the-art bounds for smooth losses. We complement our upper bounds with lower bounds. Additionally, we provide shuffle DP (SDP) algorithms that improve over the state-of-the-art central DP algorithms under more practical trust assumptions. Numerical experiments show that our algorithm has better accuracy than baselines for most privacy levels. All the codes are publicly available at: https://github.com/ghafeleb/Private-NonConvex-Federated-Learning-Without-a-Trusted-Server. △ Less

Submitted 25 June, 2023; v1 submitted 13 March, 2022; originally announced March 2022.

Comments: AISTATS 2023

arXiv:2106.09779 [pdf, other]

Private Federated Learning Without a Trusted Server: Optimal Algorithms for Convex Losses

Authors: Andrew Lowy, Meisam Razaviyayn

Abstract: This paper studies federated learning (FL)--especially cross-silo FL--with data from people who do not trust the server or other silos. In this setting, each silo (e.g. hospital) has data from different people (e.g. patients) and must maintain the privacy of each person's data (e.g. medical record), even if the server or other silos act as adversarial eavesdroppers. This requirement motivates the… ▽ More This paper studies federated learning (FL)--especially cross-silo FL--with data from people who do not trust the server or other silos. In this setting, each silo (e.g. hospital) has data from different people (e.g. patients) and must maintain the privacy of each person's data (e.g. medical record), even if the server or other silos act as adversarial eavesdroppers. This requirement motivates the study of Inter-Silo Record-Level Differential Privacy (ISRL-DP), which requires silo i's communications to satisfy record/item-level differential privacy (DP). ISRL-DP ensures that the data of each person (e.g. patient) in silo i (e.g. hospital i) cannot be leaked. ISRL-DP is different from well-studied privacy notions. Central and user-level DP assume that people trust the server/other silos. On the other end of the spectrum, local DP assumes that people do not trust anyone at all (even their own silo). Sitting between central and local DP, ISRL-DP makes the realistic assumption (in cross-silo FL) that people trust their own silo, but not the server or other silos. In this work, we provide tight (up to logarithms) upper and lower bounds for ISRL-DP FL with convex/strongly convex loss functions and homogeneous (i.i.d.) silo data. Remarkably, we show that similar bounds are attainable for smooth losses with arbitrary heterogeneous silo data distributions, via an accelerated ISRL-DP algorithm. We also provide tight upper and lower bounds for ISRL-DP federated empirical risk minimization, and use acceleration to attain the optimal bounds in fewer rounds of communication than the state-of-the-art. Finally, with a secure "shuffler" to anonymize silo messages (but without a trusted server), our algorithm attains the optimal central DP rates under more practical trust assumptions. Numerical experiments show favorable privacy-accuracy tradeoffs for our algorithm in classification and regression tasks. △ Less

Submitted 14 June, 2023; v1 submitted 17 June, 2021; originally announced June 2021.

Comments: ICLR 2023

arXiv:2002.07919 [pdf, other]

Efficient Search of First-Order Nash Equilibria in Nonconvex-Concave Smooth Min-Max Problems

Authors: Dmitrii M. Ostrovskii, Andrew Lowy, Meisam Razaviyayn

Abstract: We propose an efficient algorithm for finding first-order Nash equilibria in min-max problems of the form $\min_{x \in X}\max_{y\in Y} F(x,y)$, where the objective function is smooth in both variables and concave with respect to $y$; the sets $X$ and $Y$ are convex and "projection-friendly," and $Y$ is compact. Our goal is to find an $(\varepsilon_x,\varepsilon_y)$-first-order Nash equilibrium wit… ▽ More We propose an efficient algorithm for finding first-order Nash equilibria in min-max problems of the form $\min_{x \in X}\max_{y\in Y} F(x,y)$, where the objective function is smooth in both variables and concave with respect to $y$; the sets $X$ and $Y$ are convex and "projection-friendly," and $Y$ is compact. Our goal is to find an $(\varepsilon_x,\varepsilon_y)$-first-order Nash equilibrium with respect to a stationarity criterion that is stronger than the commonly used proximal gradient norm. The proposed approach is fairly simple: we perform approximate proximal-point iterations on the primal function, with inexact oracle provided by Nesterov's algorithm run on the regularized function $F(x_t,\cdot)$, $x_t$ being the current primal iterate. The resulting iteration complexity is $O(\varepsilon_x{}^{-2} \varepsilon_y{}^{-1/2})$ up to a logarithmic factor. As a byproduct, the choice $\varepsilon_y = O(\varepsilon_x{}^2)$ allows for the $O(\varepsilon_x{}^{-3})$ complexity of finding an $\varepsilon_x$-stationary point for the standard Moreau envelope of the primal function. Moreover, when the objective is strongly concave with respect to $y$, the complexity estimate for our algorithm improves to $O(\varepsilon_x{}^{-2}{κ_y}^{1/2})$ up to a logarithmic factor, where $κ_y$ is the condition number appropriately adjusted for coupling. In both scenarios, the complexity estimates are the best known so far, and are only known for the (weaker) proximal gradient norm criterion. Meanwhile, our approach is "user-friendly:" (i) the algorithm is built upon running a variant of Nesterov's accelerated algorithm as subroutine and avoids extragradient steps; (ii) the convergence analysis recycles the well-known results on accelerated methods with inexact oracle. Finally, we extend the approach to non-Euclidean proximal geometries. △ Less

Submitted 2 May, 2021; v1 submitted 18 February, 2020; originally announced February 2020.

Comments: 29 pages; accepted to SIAM Journal on Optimization (as of May 2021)

MSC Class: 90C06; 90C25; 90C26; 91A99

Showing 1–6 of 6 results for author: Lowy, A