-
Privacy Amplification via Iteration for Shuffled and Online PNSGD
Authors:
Matteo Sordello,
Zhiqi Bu,
**shuo Dong
Abstract:
In this paper, we consider the framework of privacy amplification via iteration, which is originally proposed by Feldman et al. and subsequently simplified by Asoodeh et al. in their analysis via the contraction coefficient. This line of work focuses on the study of the privacy guarantees obtained by the projected noisy stochastic gradient descent (PNSGD) algorithm with hidden intermediate updates…
▽ More
In this paper, we consider the framework of privacy amplification via iteration, which is originally proposed by Feldman et al. and subsequently simplified by Asoodeh et al. in their analysis via the contraction coefficient. This line of work focuses on the study of the privacy guarantees obtained by the projected noisy stochastic gradient descent (PNSGD) algorithm with hidden intermediate updates. A limitation in the existing literature is that only the early stopped PNSGD has been studied, while no result has been proved on the more widely-used PNSGD applied on a shuffled dataset. Moreover, no scheme has been yet proposed regarding how to decrease the injected noise when new data are received in an online fashion. In this work, we first prove a privacy guarantee for shuffled PNSGD, which is investigated asymptotically when the noise is fixed for each sample size $n$ but reduced at a predetermined rate when $n$ increases, in order to achieve the convergence of privacy loss. We then analyze the online setting and provide a faster decaying scheme for the magnitude of the injected noise that also guarantees the convergence of privacy loss.
△ Less
Submitted 20 June, 2021;
originally announced June 2021.
-
Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic
Authors:
Matteo Sordello,
Niccolò Dalmasso,
Hangfeng He,
Weijie Su
Abstract:
This paper proposes SplitSGD, a new dynamic learning rate schedule for stochastic optimization. This method decreases the learning rate for better adaptation to the local geometry of the objective function whenever a stationary phase is detected, that is, the iterates are likely to bounce at around a vicinity of a local minimum. The detection is performed by splitting the single thread into two an…
▽ More
This paper proposes SplitSGD, a new dynamic learning rate schedule for stochastic optimization. This method decreases the learning rate for better adaptation to the local geometry of the objective function whenever a stationary phase is detected, that is, the iterates are likely to bounce at around a vicinity of a local minimum. The detection is performed by splitting the single thread into two and using the inner product of the gradients from the two threads as a measure of stationarity. Owing to this simple yet provably valid stationarity detection, SplitSGD is easy-to-implement and essentially does not incur additional computational cost than standard SGD. Through a series of extensive experiments, we show that this method is appropriate for both convex problems and training (non-convex) neural networks, with performance compared favorably to other stochastic optimization methods. Importantly, this method is observed to be very robust with a set of default parameters for a wide range of problems and, moreover, can yield better generalization performance than other adaptive gradient methods such as Adam.
△ Less
Submitted 16 February, 2024; v1 submitted 18 October, 2019;
originally announced October 2019.
-
A Bernstein type inequality for sums of selections from three dimensional arrays
Authors:
Debapratim Banerjee,
Matteo Sordello
Abstract:
We consider the three dimensional array $\mathcal{A} = \{a_{i,j,k}\}_{1\le i,j,k \le n}$, with $a_{i,j,k} \in [0,1]$, and the two random statistics $T_{1}:= \sum_{i=1}^n \sum_{j=1}^n a_{i,j,σ(i)}$ and $T_{2}:= \sum_{i=1}^{n} a_{i,σ(i),π(i)}$, where $σ$ and $π$ are chosen independently from the set of permutations of $\{1,2,\ldots,n \}.$ These can be viewed as natural three dimensional generalizati…
▽ More
We consider the three dimensional array $\mathcal{A} = \{a_{i,j,k}\}_{1\le i,j,k \le n}$, with $a_{i,j,k} \in [0,1]$, and the two random statistics $T_{1}:= \sum_{i=1}^n \sum_{j=1}^n a_{i,j,σ(i)}$ and $T_{2}:= \sum_{i=1}^{n} a_{i,σ(i),π(i)}$, where $σ$ and $π$ are chosen independently from the set of permutations of $\{1,2,\ldots,n \}.$ These can be viewed as natural three dimensional generalizations of the statistic $T_{3}=\sum_{i=1}^{n} a_{i,σ(i)}$, considered by Hoeffding \cite{Hoe51}. Here we give Bernstein type concentration inequalities for $T_{1}$ and $T_{2}$ by extending the argument for concentration of $T_{3}$ by Chatterjee \cite{Cha05}.
△ Less
Submitted 19 July, 2019;
originally announced July 2019.
-
Clustering dynamics in a class of normalised generalised gamma dependent priors
Authors:
Matteo Ruggiero,
Matteo Sordello
Abstract:
Normalised generalised gamma processes are random probability measures that induce nonparametric prior distributions widely used in Bayesian statistics, particularly for mixture modelling. We construct a class of dependent normalised generalised gamma priors induced by a stationary population model of Moran type, which exploits a generalised Pólya urn scheme associated with the prior. We study the…
▽ More
Normalised generalised gamma processes are random probability measures that induce nonparametric prior distributions widely used in Bayesian statistics, particularly for mixture modelling. We construct a class of dependent normalised generalised gamma priors induced by a stationary population model of Moran type, which exploits a generalised Pólya urn scheme associated with the prior. We study the asymptotic scaling for the dynamics of the number of clusters in the sample, which in turn provides a dynamic measure of diversity in the underlying population. The limit is formalised to be a positive nonstationary diffusion process which falls outside well known families, with unbounded drift and an entrance boundary at the origin. We also introduce a new class of stationary positive diffusions, whose invariant measures are explicit and have power law tails, which approximate weakly the scaling limit.
△ Less
Submitted 4 November, 2016; v1 submitted 2 August, 2016;
originally announced August 2016.