Skip to main content

Showing 1–30 of 30 results for author: Feldman, V

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.19566  [pdf, other

    cs.LG cs.CR cs.DS math.ST stat.ML

    Instance-Optimal Private Density Estimation in the Wasserstein Distance

    Authors: Vitaly Feldman, Audra McMillan, Satchit Sivakumar, Kunal Talwar

    Abstract: Estimating the density of a distribution from samples is a fundamental problem in statistics. In many practical settings, the Wasserstein distance is an appropriate error metric for density estimation. For example, when estimating population densities in a geographic region, a small Wasserstein distance means that the estimate is able to capture roughly where the population mass is. In this work w… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2310.00098  [pdf, other

    cs.LG cs.CR stat.ML

    Federated Learning with Differential Privacy for End-to-End Speech Recognition

    Authors: Martin Pelikan, Sheikh Shams Azam, Vitaly Feldman, Jan "Honza" Silovsky, Kunal Talwar, Tatiana Likhomanenko

    Abstract: While federated learning (FL) has recently emerged as a promising approach to train machine learning models, it is limited to only preliminary explorations in the domain of automatic speech recognition (ASR). Moreover, FL does not inherently guarantee user privacy and requires the use of differential privacy (DP) for robust privacy guarantees. However, we are not aware of prior work on applying DP… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

    Comments: Under review

  3. arXiv:2307.15835  [pdf, ps, other

    cs.CR cs.DS cs.LG stat.ML

    Mean Estimation with User-level Privacy under Data Heterogeneity

    Authors: Rachel Cummings, Vitaly Feldman, Audra McMillan, Kunal Talwar

    Abstract: A key challenge in many modern data analysis tasks is that user data are heterogeneous. Different users may possess vastly different numbers of data points. More importantly, it cannot be assumed that all users sample from the same underlying distribution. This is true, for example in language data, where different speech styles result in data heterogeneity. In this work we propose a simple model… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: Conference version published at NeurIPS 2022

  4. arXiv:2306.04444  [pdf, other

    cs.LG cs.CR stat.ML

    Fast Optimal Locally Private Mean Estimation via Random Projections

    Authors: Hilal Asi, Vitaly Feldman, Jelani Nelson, Huy L. Nguyen, Kunal Talwar

    Abstract: We study the problem of locally private mean estimation of high-dimensional vectors in the Euclidean ball. Existing algorithms for this problem either incur sub-optimal error or have high communication and/or run-time complexity. We propose a new algorithmic framework, ProjUnit, for private mean estimation that yields algorithms that are computationally efficient, have low communication complexity… ▽ More

    Submitted 26 June, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: Added the correct github link

  5. arXiv:2302.14154  [pdf, ps, other

    cs.LG cs.CR math.OC stat.ML

    Near-Optimal Algorithms for Private Online Optimization in the Realizable Regime

    Authors: Hilal Asi, Vitaly Feldman, Tomer Koren, Kunal Talwar

    Abstract: We consider online learning problems in the realizable setting, where there is a zero-loss solution, and propose new Differentially Private (DP) algorithms that obtain near-optimal regret bounds. For the problem of online prediction from experts, we design new algorithms that obtain near-optimal regret ${O} \big( \varepsilon^{-1} \log^{1.5}{d} \big)$ where $d$ is the number of experts. This signif… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

  6. arXiv:2210.13537  [pdf, ps, other

    cs.LG cs.CR math.OC stat.ML

    Private Online Prediction from Experts: Separations and Faster Rates

    Authors: Hilal Asi, Vitaly Feldman, Tomer Koren, Kunal Talwar

    Abstract: Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints. We propose and analyze new algorithms for this problem that improve over the regret bounds of the best existing algorithms for non-adaptive adversaries. For approximate differential privacy, our algorithms achieve regret bounds of… ▽ More

    Submitted 29 June, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: Removed the results for the realizable setting which we uploaded with additional results for that setting in a separate paper. Added a proof sketch for the lower bound

  7. arXiv:2210.13497  [pdf, other

    cs.LG cs.IT math.ST stat.ML

    Subspace Recovery from Heterogeneous Data with Non-isotropic Noise

    Authors: John Duchi, Vitaly Feldman, Lunjia Hu, Kunal Talwar

    Abstract: Recovering linear subspaces from data is a fundamental and important task in statistics and machine learning. Motivated by heterogeneity in Federated Learning settings, we study a basic formulation of this problem: the principal component analysis (PCA), with a focus on dealing with irregular noise. Our data come from $n$ users with user $i$ contributing data samples from a $d$-dimensional distrib… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: In NeurIPS 2022

  8. arXiv:2208.04591  [pdf, other

    cs.CR cs.DS cs.LG stat.ML

    Stronger Privacy Amplification by Shuffling for Rényi and Approximate Differential Privacy

    Authors: Vitaly Feldman, Audra McMillan, Kunal Talwar

    Abstract: The shuffle model of differential privacy has gained significant interest as an intermediate trust model between the standard local and central models [EFMRTT19; CSUZZ19]. A key result in this model is that randomly shuffling locally randomized data amplifies differential privacy guarantees. Such amplification implies substantially stronger privacy guarantees for systems in which data is contribut… ▽ More

    Submitted 30 October, 2023; v1 submitted 9 August, 2022; originally announced August 2022.

    Comments: Errata added. 14 pages, 4 figures

  9. arXiv:2103.01516  [pdf, ps, other

    cs.LG cs.CR math.OC stat.ML

    Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry

    Authors: Hilal Asi, Vitaly Feldman, Tomer Koren, Kunal Talwar

    Abstract: Stochastic convex optimization over an $\ell_1$-bounded domain is ubiquitous in machine learning applications such as LASSO but remains poorly understood when learning with differential privacy. We show that, up to logarithmic factors the optimal excess population loss of any $(\varepsilon,δ)$-differentially private optimizer is $\sqrt{\log(d)/n} + \sqrt{d}/\varepsilon n.$ The upper bound is based… ▽ More

    Submitted 2 March, 2021; originally announced March 2021.

  10. arXiv:2012.12803  [pdf, other

    cs.LG cs.CR cs.DS stat.ML

    Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy Amplification by Shuffling

    Authors: Vitaly Feldman, Audra McMillan, Kunal Talwar

    Abstract: Recent work of Erlingsson, Feldman, Mironov, Raghunathan, Talwar, and Thakurta [EFMRTT19] demonstrates that random shuffling amplifies differential privacy guarantees of locally randomized data. Such amplification implies substantially stronger privacy guarantees for systems in which data is contributed anonymously [BEMMRLRKTS17] and has lead to significant interest in the shuffle model of privacy… ▽ More

    Submitted 7 September, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

    Comments: Updated to include numerical experiments for Renyi differential privacy

  11. arXiv:2008.11193  [pdf, other

    cs.CR cs.LG stat.ML

    Individual Privacy Accounting via a Renyi Filter

    Authors: Vitaly Feldman, Tijana Zrnic

    Abstract: We consider a sequential setting in which a single dataset of individuals is used to perform adaptively-chosen analyses, while ensuring that the differential privacy loss of each participant does not exceed a pre-specified privacy budget. The standard approach to this problem relies on bounding a worst-case estimate of the privacy loss over all individuals and all possible values of their data, fo… ▽ More

    Submitted 8 January, 2022; v1 submitted 25 August, 2020; originally announced August 2020.

  12. arXiv:2008.03703  [pdf, other

    cs.LG stat.ML

    What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation

    Authors: Vitaly Feldman, Chiyuan Zhang

    Abstract: Deep learning algorithms are well-known to have a propensity for fitting the training data very well and often fit even outliers and mislabeled data points. Such fitting requires memorization of training data labels, a phenomenon that has attracted significant research interest but has not been given a compelling explanation so far. A recent work of Feldman (2019) proposes a theoretical explanatio… ▽ More

    Submitted 9 August, 2020; originally announced August 2020.

  13. arXiv:2006.06914  [pdf, ps, other

    cs.LG math.OC stat.ML

    Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses

    Authors: Raef Bassily, Vitaly Feldman, Cristóbal Guzmán, Kunal Talwar

    Abstract: Uniform stability is a notion of algorithmic stability that bounds the worst case change in the model output by the algorithm when a single data point in the dataset is replaced. An influential work of Hardt et al. (2016) provides strong upper bounds on the uniform stability of the stochastic gradient descent (SGD) algorithm on sufficiently smooth convex losses. These results led to important prog… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

    Comments: 32 pages

    MSC Class: 90-08 ACM Class: F.2.1; G.1.6; G.3

  14. arXiv:2005.04763  [pdf, other

    cs.LG cs.CR math.OC stat.ML

    Private Stochastic Convex Optimization: Optimal Rates in Linear Time

    Authors: Vitaly Feldman, Tomer Koren, Kunal Talwar

    Abstract: We study differentially private (DP) algorithms for stochastic convex optimization: the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions. A recent work of Bassily et al. (2019) has established the optimal bound on the excess population loss achievable given $n$ samples. Unfortunately, their algorithm achieving this bound is relatively in… ▽ More

    Submitted 10 May, 2020; originally announced May 2020.

  15. arXiv:1911.10541  [pdf, ps, other

    cs.LG cs.CR cs.DS stat.ML

    PAC learning with stable and private predictions

    Authors: Yuval Dagan, Vitaly Feldman

    Abstract: We study binary classification algorithms for which the prediction on any point is not too sensitive to individual examples in the dataset. Specifically, we consider the notions of uniform stability (Bousquet and Elisseeff, 2001) and prediction privacy (Dwork and Feldman, 2018). Previous work on these notions shows how they can be achieved in the standard PAC model via simple aggregation of models… ▽ More

    Submitted 23 September, 2020; v1 submitted 24 November, 2019; originally announced November 2019.

  16. arXiv:1911.04014  [pdf, ps, other

    cs.LG cs.CR cs.DS stat.ML

    Interaction is necessary for distributed learning with privacy or communication constraints

    Authors: Yuval Dagan, Vitaly Feldman

    Abstract: Local differential privacy (LDP) is a model where users send privatized data to an untrusted central server whose goal it to solve some data analysis task. In the non-interactive version of this model the protocol consists of a single round in which a server sends requests to all users then receives their responses. This version is deployed in industry due to its practical advantages and has attra… ▽ More

    Submitted 23 September, 2020; v1 submitted 10 November, 2019; originally announced November 2019.

  17. arXiv:1908.09970  [pdf, other

    cs.LG cs.CR cs.DS stat.ML

    Private Stochastic Convex Optimization with Optimal Rates

    Authors: Raef Bassily, Vitaly Feldman, Kunal Talwar, Abhradeep Thakurta

    Abstract: We study differentially private (DP) algorithms for stochastic convex optimization (SCO). In this problem the goal is to approximately minimize the population loss given i.i.d. samples from a distribution over convex and Lipschitz loss functions. A long line of existing work on private convex optimization focuses on the empirical loss and derives asymptotically tight bounds on the excess empirical… ▽ More

    Submitted 26 August, 2019; originally announced August 2019.

  18. arXiv:1906.05271  [pdf, other

    cs.LG stat.ML

    Does Learning Require Memorization? A Short Tale about a Long Tail

    Authors: Vitaly Feldman

    Abstract: State-of-the-art results on image recognition tasks are achieved using over-parameterized learning algorithms that (nearly) perfectly fit the training set and are known to fit well even random labels. This tendency to memorize the labels of the training data is not explained by existing theoretical analyses. Memorization of the training data also presents significant privacy risks when the trainin… ▽ More

    Submitted 10 January, 2021; v1 submitted 12 June, 2019; originally announced June 2019.

    Comments: Significant revision: revised introduction/overview; added formal treatment of noise in the labels and explanation for the disparate effects of limiting memorization

  19. arXiv:1905.10360  [pdf, other

    cs.LG cs.DS stat.ML

    The advantages of multiple classes for reducing overfitting from test set reuse

    Authors: Vitaly Feldman, Roy Frostig, Moritz Hardt

    Abstract: Excessive reuse of holdout data can lead to overfitting. However, there is little concrete evidence of significant overfitting due to holdout reuse in popular multiclass benchmarks today. Known results show that, in the worst-case, revealing the accuracy of $k$ adaptively chosen classifiers on a data set of size $n$ allows to create a classifier with bias of $Θ(\sqrt{k/n})$ for any binary predicti… ▽ More

    Submitted 24 May, 2019; originally announced May 2019.

  20. arXiv:1902.10710  [pdf, ps, other

    cs.LG cs.DS stat.ML

    High probability generalization bounds for uniformly stable algorithms with nearly optimal rate

    Authors: Vitaly Feldman, Jan Vondrak

    Abstract: Algorithmic stability is a classical approach to understanding and analysis of the generalization error of learning algorithms. A notable weakness of most stability-based generalization bounds is that they hold only in expectation. Generalization with high probability has been established in a landmark paper of Bousquet and Elisseeff (2002) albeit at the expense of an additional $\sqrt{n}$ factor… ▽ More

    Submitted 23 June, 2019; v1 submitted 27 February, 2019; originally announced February 2019.

    Comments: this is a follow-up to and has minor text overlap with arXiv:1812.09859; v2: minor revision following acceptance for presentation at COLT 2019

  21. arXiv:1812.09859  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Generalization Bounds for Uniformly Stable Algorithms

    Authors: Vitaly Feldman, Jan Vondrak

    Abstract: Uniform stability of a learning algorithm is a classical notion of algorithmic stability introduced to derive high-probability bounds on the generalization error (Bousquet and Elisseeff, 2002). Specifically, for a loss function with range bounded in $[0,1]$, the generalization error of a $γ$-uniformly stable learning algorithm on $n$ samples is known to be within $O((γ+1/n) \sqrt{n \log(1/δ)})$ of… ▽ More

    Submitted 18 March, 2019; v1 submitted 24 December, 2018; originally announced December 2018.

    Comments: Appeared in Neural Information Processing Systems (NeurIPS), 2018

  22. arXiv:1811.12469  [pdf, other

    cs.LG cs.CR cs.DS stat.ML

    Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity

    Authors: Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, Abhradeep Thakurta

    Abstract: Sensitive statistics are often collected across sets of users, with repeated collection of reports done over time. For example, trends in users' private preferences or software usage may be monitored via such reports. We study the collection of such statistics in the local differential privacy (LDP) model, and describe an algorithm whose privacy cost is polylogarithmic in the number of changes to… ▽ More

    Submitted 25 July, 2020; v1 submitted 29 November, 2018; originally announced November 2018.

    Comments: Stated amplification bounds for epsilon > 1 explicitly and also stated the bounds for for Renyi DP. Fixed an incorrect statement in one of the proofs

  23. arXiv:1809.09165  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Locally Private Learning without Interaction Requires Separation

    Authors: Amit Daniely, Vitaly Feldman

    Abstract: We consider learning under the constraint of local differential privacy (LDP). For many learning problems known efficient algorithms in this model require many rounds of communication between the server and the clients holding the data points. Yet multi-round protocols are prohibitively slow in practice due to network latency and, as a result, currently deployed large-scale systems are limited to… ▽ More

    Submitted 28 October, 2019; v1 submitted 24 September, 2018; originally announced September 2018.

  24. arXiv:1808.06651  [pdf, other

    cs.LG cs.CR cs.DS stat.ML

    Privacy Amplification by Iteration

    Authors: Vitaly Feldman, Ilya Mironov, Kunal Talwar, Abhradeep Thakurta

    Abstract: Many commonly used learning algorithms work by iteratively updating an intermediate solution using one or a few data points in each iteration. Analysis of differential privacy for such algorithms often involves ensuring privacy of each step and then reasoning about the cumulative privacy cost of the algorithm. This is enabled by composition theorems for differential privacy that allow releasing of… ▽ More

    Submitted 10 December, 2018; v1 submitted 20 August, 2018; originally announced August 2018.

    Comments: Extended abstract appears in Foundations of Computer Science (FOCS) 2018

  25. arXiv:1803.10266  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Privacy-preserving Prediction

    Authors: Cynthia Dwork, Vitaly Feldman

    Abstract: Ensuring differential privacy of models learned from sensitive user data is an important goal that has been studied extensively in recent years. It is now known that for some basic learning problems, especially those involving high-dimensional data, producing an accurate private model requires much more data than learning without privacy. At the same time, in many applications it is not necessary… ▽ More

    Submitted 8 May, 2018; v1 submitted 27 March, 2018; originally announced March 2018.

    Comments: Accepted for presentation at Conference on Learning Theory (COLT) 2018

  26. arXiv:1706.05069  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Generalization for Adaptively-chosen Estimators via Stable Median

    Authors: Vitaly Feldman, Thomas Steinke

    Abstract: Datasets are often reused to perform multiple statistical analyses in an adaptive way, in which each analysis may depend on the outcomes of previous analyses on the same dataset. Standard statistical guarantees do not account for these dependencies and little is known about how to provably avoid overfitting and false discovery in the adaptive setting. We consider a natural formalization of this pr… ▽ More

    Submitted 15 June, 2017; originally announced June 2017.

    Comments: To appear in Conference on Learning Theory (COLT) 2017

  27. arXiv:1611.06475  [pdf, ps, other

    cs.LG stat.ML

    Dealing with Range Anxiety in Mean Estimation via Statistical Queries

    Authors: Vitaly Feldman

    Abstract: We give algorithms for estimating the expectation of a given real-valued function $φ:X\to {\bf R}$ on a sample drawn randomly from some unknown distribution $D$ over domain $X$, namely ${\bf E}_{{\bf x}\sim D}[φ({\bf x})]$. Our algorithms work in two well-studied models of restricted access to data samples. The first one is the statistical query (SQ) model in which an algorithm has access to an SQ… ▽ More

    Submitted 25 August, 2017; v1 submitted 20 November, 2016; originally announced November 2016.

  28. arXiv:1608.04414  [pdf, other

    cs.LG stat.ML

    Generalization of ERM in Stochastic Convex Optimization: The Dimension Strikes Back

    Authors: Vitaly Feldman

    Abstract: In stochastic convex optimization the goal is to minimize a convex function $F(x) \doteq {\mathbf E}_{{\mathbf f}\sim D}[{\mathbf f}(x)]$ over a convex set $\cal K \subset {\mathbb R}^d$ where $D$ is some unknown distribution and each $f(\cdot)$ in the support of $D$ is convex over $\cal K$. The optimization is commonly based on i.i.d.~samples $f^1,f^2,\ldots,f^n$ from $D$. A standard approach to… ▽ More

    Submitted 26 December, 2016; v1 submitted 15 August, 2016; originally announced August 2016.

    Comments: Added illustrations of functions used in some of the constructions

  29. arXiv:1608.02198  [pdf, ps, other

    cs.LG cs.CC stat.ML

    A General Characterization of the Statistical Query Complexity

    Authors: Vitaly Feldman

    Abstract: Statistical query (SQ) algorithms are algorithms that have access to an {\em SQ oracle} for the input distribution $D$ instead of i.i.d.~ samples from $D$. Given a query function $φ:X \rightarrow [-1,1]$, the oracle returns an estimate of ${\bf E}_{ x\sim D}[φ(x)]$ within some tolerance $τ_φ$ that roughly corresponds to the number of samples. In this work we demonstrate that the complexity of so… ▽ More

    Submitted 17 April, 2017; v1 submitted 7 August, 2016; originally announced August 2016.

    Comments: Minor revision

  30. arXiv:1307.3102  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Statistical Active Learning Algorithms for Noise Tolerance and Differential Privacy

    Authors: Maria Florina Balcan, Vitaly Feldman

    Abstract: We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise and are differentially-private. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns (1993).… ▽ More

    Submitted 5 November, 2014; v1 submitted 11 July, 2013; originally announced July 2013.

    Comments: Extended abstract appears in NIPS 2013