We review the one-dimensional, sample-level private coarse estimation as given in KSUβ20:
Pure DP Range Estimator :
-
1.
If , return any point in . Otherwise, set parameter .
-
2.
Divide into buckets: .
-
3.
Run Pure DP Histogram for over the above buckets.
-
4.
Let be the bucket that has the maximum number of points.
-
5.
Return .
{theorem}
[Sample-Level Coarse Estimation]
For all , PDPRE is -DP. Futhermore, suppose is a distribution over with mean and -th moment bounded by . Then there exists a small constant \colorblue(that gets smaller as gets biggerβ¦) such that, for all , there exists
|
|
|
such that, if , then with probability at least ,
|
|
|
{proof}
[Proof]
Note that the proof of privacy follows directly from \colorblue cite. Thus, the rest of this proof is dedicated to the proof of accuracy. If , by step , the coarse estimate will be within of . Otherwise, we show that , which implies that .
We first show that, with probability at least , the heaviest (non-noisy) bucket in (i.e. the bucket with the most samples) must intersect with . If the (noisy) bucket discovered in our algorithm is also the heaviest non-noisy bucket, then this would immediately imply .
To prove this, it suffices to show that only at most samples are outside of . This event would suffice as the heaviest bucket not intersecting with would only have at most samples while, on the other hand, the heaviest bucket that intersects with will have more than samplesβat least samples.
We begin by calculating the expected number of samples that fall outside of the interval . If we set , then this is equivalent to calculating :
|
|
|
where the last inequality comes from the bounded -th moment assumption and Markovβs inequality. Thus, we can show that, with probability at most , more than samples fall outside of :
|
|
|
by a Chernoff bound, so long as for some constant and .
Finally, we show that, with probability at least , the heaviest non-noisy bucket is also the heaviest noisy bucket, completing the proof. By Lemma(\colorbluecite), we know that, with probability at least , the largest magnitude of the noise in any bucket will not exceed , so long as . Thus, the heaviest non-noisy bucket will remain the heaviest bucket after noise is added to all of the buckets, completing the proof.
{corollary}[User-Level Coarse Estimation]
Let be a distribution over with mean and -th moment bounded by .
Then for all , there exists an -DP user-level algorithm that takes many users where
|
|
|
and outputs such that
|
|
|
where is the number of samples per users.
{proof}
\Crefthm:sample-level-coarse-esitmation, together with \Crefthm:user-level-to-sample-level-reduction, and setting the accuracy parameter to readily proves this corollary.
{remark}
This is sufficient for the purposes in the fine estimation setting as the two values the clip** radius obtains are both larger than . Therefore having a coarse estimate with accuracy , would be sufficient