\setstackEOL

Instance-Optimal Private Density Estimation in the Wasserstein Distance

Vitaly Feldman
Apple
   Audra McMillan
Apple
   Satchit Sivakumar111Work partially done while author was an intern at Apple.
Boston University
   Kunal Talwar
Apple
Abstract

Estimating the density of a distribution from samples is a fundamental problem in statistics. In many practical settings, the Wasserstein distance is an appropriate error metric for density estimation. For example, when estimating population densities in a geographic region, a small Wasserstein distance means that the estimate is able to capture roughly where the population mass is. In this work we study differentially private density estimation in the Wasserstein distance. We design and analyze instance-optimal algorithms for this problem that can adapt to easy instances.

For distributions P𝑃Pitalic_P over \mathbb{R}blackboard_R, we consider a strong notion of instance-optimality: an algorithm that uniformly achieves the instance-optimal estimation rate is competitive with an algorithm that is told that the distribution is either P𝑃Pitalic_P or QPsubscript𝑄𝑃Q_{P}italic_Q start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT for some distribution QPsubscript𝑄𝑃Q_{P}italic_Q start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT whose probability density function (pdf) is within a factor of 2 of the pdf of P𝑃Pitalic_P. For distributions over 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, we use a different notion of instance optimality. We say that an algorithm is instance-optimal if it is competitive with an algorithm that is given a constant-factor multiplicative approximation of the density of the distribution. We characterize the instance-optimal estimation rates in both these settings and show that they are uniformly achievable (up to polylogarithmic factors). Our approach for 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT extends to arbitrary metric spaces as it goes via hierarchically separated trees. As a special case our results lead to instance-optimal private learning in TV distance for discrete distributions.

1 Introduction

Distribution estimation is a fundamental problem in statistics. In this work, we focus on the problem of learning the density of a distribution over a low-dimensional real space. Our motivation for studying this problem comes from practical problems such as estimating the population density in a geographical area (defined by bounded two dimensional space, for e.g. [0,]2superscript02[0,\ell]^{2}[ 0 , roman_ℓ ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT), learning the distribution of accuracy of a machine learning model (i.e. a distribution over [0,1]01[0,1][ 0 , 1 ]), estimating the average temperature across latitude, longitude, and altitude (i.e. a distribution over [0,]3superscript03[0,\ell]^{3}[ 0 , roman_ℓ ] start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT) etc.

In this work, we are interested in the non-parametric version of this question, where we make no assumptions on the form of the distribution we are learning. This is frequently of interest in practice, where population densities for example may change over time (become more or less concentrated), and it is difficult to specify a meaningful parametric class that will simultaneously capture all densities of interest. Given estimation is often done using sensitive data (for e.g. health data), our interest in this question is in, and consequently all our results are for, the differentially private version of this question. While we believe our results in the non-private setting are also novel and interesting, we view the private results as our main contribution.

Any statistical algorithm learning from samples is inexact. The appropriate gauge to measure the (in)accuracy of a density estimation algorithm depends on how this density estimate is used. In this work, we focus on the Wasserstein distance between the original distribution and the learnt distribution as our measure of accuracy. Known by many names (Earthmover distance, Kantorovich distance, Optimal Transport distance), this distance is defined over any distance metric d𝑑ditalic_d as the minimum over all couplings π𝜋\piitalic_π from P𝑃Pitalic_P to Q𝑄Qitalic_Q of the quantity 𝔼xP[d(x,π(x))]subscript𝔼similar-to𝑥𝑃𝑑𝑥𝜋𝑥\operatorname*{\mathbb{E}}_{x\sim P}[d(x,\pi(x))]blackboard_E start_POSTSUBSCRIPT italic_x ∼ italic_P end_POSTSUBSCRIPT [ italic_d ( italic_x , italic_π ( italic_x ) ) ]. It is arguably one of the most natural ways to define distances between distributions over a metric space and has been extensively studied (see Section 4) . We note that Wasserstein distance is particularly salient in many practical applications of density estimation where the geometry of the space is significant. As a simple example, when creating population density estimates, if the population is concentrated in a few cities, then outputting a distribution concentrated close to these cities (even if not exactly at the cities) is intuitively better than outputting a distribution that is more spread out. Metrics such as TV distance that do not incorporate the geometry of the space do not capture this nuance. Additionally, Wasserstein distance is versatile and can be adapted to the setting of interest by varying the metric. In the case of the metric being a discrete metric with d(x,y)=𝟏(xy)𝑑𝑥𝑦1𝑥𝑦d(x,y)=\mathbf{1}(x\neq y)italic_d ( italic_x , italic_y ) = bold_1 ( italic_x ≠ italic_y ), it reduces to the commonly used total variation (TV)TV(\text{\rm TV})( TV ) distance. Our focus in this work is on the case of Euclidean distance metric on [0,1]01[0,1][ 0 , 1 ] or [0,1]2superscript012[0,1]^{2}[ 0 , 1 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, though our results apply to both to higher-dimensional Euclidean space as well as to any finite metric. In the [0,1]01[0,1][ 0 , 1 ] case (with the standard Euclidean metric), the Wasserstein distance is equivalent to the total area between the cumulative distribution functions.

The problem of learning a distribution under Wasserstein distance has a long history, starting with [Dud69] proving worst-case bounds on the rate of convergence of the Wasserstein distance between the empirical distribution P^nsubscript^𝑃𝑛\hat{P}_{n}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and the target distribution P𝑃Pitalic_P over dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Similarly, this question for the case of the discrete metric (d(x,y)=𝟏(xy)𝑑𝑥𝑦1𝑥𝑦d(x,y)=\mathbf{1}(x\neq y)italic_d ( italic_x , italic_y ) = bold_1 ( italic_x ≠ italic_y )) has been very well studied. However, most known results for this problem look at it from the point of view of worst-case analysis. This can paint a rather pessimistic picture. For example, the minimax rate of ε𝜀\varepsilonitalic_ε-privately learning a discrete distribution over {0,,k}0𝑘\{0,\dots,k\}{ 0 , … , italic_k } in TV distance (i.e. Wasserstein with the discrete metric described above) scales linearly with k𝑘kitalic_k, which can be prohibitive for large support size k𝑘kitalic_k. For Wasserstein distance with 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm, the rate of convergence of the empirical distribution suffers a curse of dimensionality, with the worst-case error between the distribution and the empirical distribution being Θ(n1d)Θsuperscript𝑛1𝑑\Theta(n^{-\frac{1}{d}})roman_Θ ( italic_n start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_d end_ARG end_POSTSUPERSCRIPT ) for distributions over [0,1]dsuperscript01𝑑[0,1]^{d}[ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. For the differentially private version of this question, recent works [BSV22, HVZ23] have shown that the optimal Wasserstein minimax error between the sample and the private estimate is Θ~((εn)1d)~Θsuperscript𝜀𝑛1𝑑\tilde{\Theta}((\varepsilon n)^{-\frac{1}{d}})over~ start_ARG roman_Θ end_ARG ( ( italic_ε italic_n ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_d end_ARG end_POSTSUPERSCRIPT ). This worst-case analysis viewpoint fails to distinguish between algorithms that perform very differently on the types of instances one may see in practice. In particular, many practical distributions may be more feasible to estimate than suggested by the minimax rate. As an example, Figure 1 shows the cumulative distribution function of a bimodal distribution on [0,1]01[0,1][ 0 , 1 ] with very sparse support, and the cdf learnt by a minimax optimal algorithm, as well as an algorithm we present in this work . As is clear from the figure, the minimax optimal algorithm is easily outperformed. This phenomenon only gets worse in higher dimensions. Similarly, if the distribution in Redsuperscript𝑑\real^{d}start_OPERATOR roman_Re end_OPERATOR start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT lies on a k𝑘kitalic_k-dimensional subspace, the worst-case error scaling with O~((εn)1d)~𝑂superscript𝜀𝑛1𝑑\tilde{O}((\varepsilon n)^{-\frac{1}{d}})over~ start_ARG italic_O end_ARG ( ( italic_ε italic_n ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_d end_ARG end_POSTSUPERSCRIPT ) is significantly larger than our algorithm’s scaling of O~((εn)1k)~𝑂superscript𝜀𝑛1𝑘\tilde{O}((\varepsilon n)^{-\frac{1}{k}})over~ start_ARG italic_O end_ARG ( ( italic_ε italic_n ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUPERSCRIPT ).

Refer to caption
Refer to caption
Figure 1: (Left) A sparsely supported distribution on integers [0,999] (pdf). (Right) CDF for the same distribution (green, solid line), along with a (non-private) minimax optimal learnt distribution (blue, dashed line), as well as 1-DP instance-optimal algorithm (red, dotted), both learnt from the same 1600 samples. The W1subscript𝑊1W_{1}italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT error for the minimax optimal algorithm is 13.4, whereas the DP estimated distribution has W1subscript𝑊1W_{1}italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT error of 0.860.860.860.86. While this example is artificial, it demonstrates the large potential gap between minimax optimal and instance optimal algorithms on specific instances.

This motivates the problem of viewing this question through the lens of instance optimality. 222c.f. related work section for discussion of other beyond worst case analysis approaches for this question and Section 3 for a more in-depth discussion of our approach. Briefly, instance optimal algorithms are those that on any given instance of the problem, are able to perform competitively with what any algorithm can do on this instance. Let \mathcal{M}caligraphic_M be a class of algorithms of interest (e.g. all (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-differentially private algorithms) and cost(,P)𝑐𝑜𝑠𝑡𝑃cost(\cdot,P)italic_c italic_o italic_s italic_t ( ⋅ , italic_P ) be a cost measure for an instance P𝑃Pitalic_P. In our setting, we have a distribution P𝑃Pitalic_P over a metric space, and given a set P^nsubscript^𝑃𝑛\hat{P}_{n}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT of n𝑛nitalic_n samples from P𝑃Pitalic_P, we want to learn an estimate 𝒜(P^n)𝒜subscript^𝑃𝑛\mathcal{A}(\hat{P}_{n})caligraphic_A ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) for the distribution. Our measure of performance is the Wasserstein distance 𝒲𝒲\mathcal{W}caligraphic_W, so cost(𝒜,P)=𝔼[𝒲(P,𝒜(P^n))]𝑐𝑜𝑠𝑡𝒜𝑃𝔼delimited-[]𝒲𝑃𝒜subscript^𝑃𝑛cost(\mathcal{A},P)=\mathbb{E}[\mathcal{W}(P,\mathcal{A}(\hat{P}_{n}))]italic_c italic_o italic_s italic_t ( caligraphic_A , italic_P ) = blackboard_E [ caligraphic_W ( italic_P , caligraphic_A ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) ]. We would ideally like to say that an algorithm 𝒜𝒜\mathcal{A}caligraphic_A is α𝛼\alphaitalic_α-instance optimal in a class \mathcal{M}caligraphic_M if for all instances P𝑃Pitalic_P, and all 𝒜Msuperscript𝒜𝑀\mathcal{A^{\prime}}\in Mcaligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_M,

cost(𝒜(P^n)),P)αcost(𝒜(P^n)),P).\displaystyle cost(\mathcal{A}(\hat{P}_{n})),P)\leq\alpha\cdot cost(\mathcal{A% ^{\prime}}(\hat{P}_{n})),P).italic_c italic_o italic_s italic_t ( caligraphic_A ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) , italic_P ) ≤ italic_α ⋅ italic_c italic_o italic_s italic_t ( caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) , italic_P ) . (InstanceOptimality-Ideal)

The reader would have noticed that this definition is however impossible to achieve except for trivial classes \mathcal{M}caligraphic_M. The algorithm 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT that ignores its input and always outputs P𝑃Pitalic_P makes the right hand side 0. However, this algorithm performs poorly on any distributions far from P𝑃Pitalic_P and so is not a reasonable benchmark. A common approach in many works is to measure the performance of the competing algorithm 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT not just on the given instance, but on a small neighborhood around it. Thus we say that that an algorithm 𝒜𝒜\mathcal{A}caligraphic_A is α𝛼\alphaitalic_α-instance optimal amongst a class \mathcal{M}caligraphic_M with respect to a neighborhood function 𝒩𝒩\mathcal{N}caligraphic_N if for all instances P𝑃Pitalic_P, and all 𝒜Msuperscript𝒜𝑀\mathcal{A^{\prime}}\in Mcaligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_M

cost(𝒜(P^n)),P)αsupP𝒩(P)cost(𝒜(P^n)),P).\displaystyle cost(\mathcal{A}(\hat{P}_{n})),P)\leq\alpha\cdot\sup_{P^{\prime}% \in\mathcal{N}(P)}cost(\mathcal{A^{\prime}}(\hat{P^{\prime}}_{n})),P^{\prime}).italic_c italic_o italic_s italic_t ( caligraphic_A ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) , italic_P ) ≤ italic_α ⋅ roman_sup start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_N ( italic_P ) end_POSTSUBSCRIPT italic_c italic_o italic_s italic_t ( caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over^ start_ARG italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) .

In other words, the benchmark we evaluate against is the cost of the best algorithm for a neighborhood 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ) that knows this neighborhood. We would like our algorithm 𝒜𝒜\mathcal{A}caligraphic_A, that is not tailor-made for 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ), to nevertheless be competitive against this benchmark.

This definition is general, and captures most notions of instance optimality that have been studied in the literature. The set 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ) must be carefully defined for this notion to be meaningful; we can always define 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ) to be the set of all instances whence this notion reduces to worst-case analysis. In many previous works, this neighborhood map has been defined to capture the belief that any natural algorithm must not have significantly different performance on different members of 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ). For example,  [FLN01, ABC17, VV16, OS15, GKN20] include in 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ) appropriate renamings of P𝑃Pitalic_P to capture some kind of permutation invariance of natural algorithms. In statistics, one often enforces that the cardinality of 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ) is 2, often called the hardest one-dimensional subproblem [CL15, AD20, DLSV23]. Some recent works in privacy [HLY21, DKSS23] have defined instance optimality w.r.t. neighboring datasets obtained by deleting a small number of data points. Any reasonable definition of instance optimality for a problem must justify its choice of the neighborhood map; similar choices must be justifiable in every other notion of beyond worst case analysis [Rou21]. In instance-optimality definitions, this choice of neighborhood is what encapsulates what class of domain-specific algorithms our algorithm competes against. A good definition thus depends on the context and on the kind of domain knowledge we imagine an expert designing a custom algorithm for an application may have. Ideally, the definition is broad (i.e. the neighborhoods 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ) are sufficiently contained) so that in a large class of applications, we expect the domain knowledge to not be enough to rule out any member of 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ). We discuss this general definition of instance optimality further in Section 3. We remark that for reasonable neighborhood maps, this is an extremely strong requirement: an instance-optimal algorithm must simultaneously do well on every single input, in fact as well as any other algorithm that is given this neighborhood 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ) in advance!

Instance optimality guarantees are most useful when there is a big difference between achievable utility guarantees for typical cases and the worst-case utility guarantees. Wasserstein estimation is an example of such a problem. We will see that achievable utility bounds for, for example, concentrated distributions are a lot better than worse case distributions. Our definition of instance optimality is particularly suitable for metric spaces, and our notion of neighborhood allows the target utility bound to adapt to the distribution. We note that for estimation in Wasserstein distance with practically important metrics such as 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norms, it is unclear if existing instance optimality definitions (using notions of neighborhood discussed above) capture this. For example, for discrete distributions, setting the neighborhood to be all permutations of the distribution destroys all structure of the distribution (for e.g. concentration), and hence performance on this neighborhood may not capture the relative ease of estimation of a concentrated distribution. Similar problems apply to other previously studied definitions of instance optimality, which are not well-suited to density estimation with error metrics that incorporate the geometry of the metric space. See Section 3 and Section 4 for further discussion on the inadequacy of existing instance optimality definitions for our setting of interest.

Our notion of neighborhood will correspond to small balls in one of the strictest notions of distance between distributions. Recall that for distributions P,Q𝑃𝑄P,Qitalic_P , italic_Q on X𝑋Xitalic_X, D(P,Q)subscript𝐷𝑃𝑄D_{\infty}(P,Q)italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_Q ) is defined as supxXmax(lnP(x)Q(x),lnQ(x)P(x))subscriptsupremum𝑥𝑋𝑃𝑥𝑄𝑥𝑄𝑥𝑃𝑥\sup_{x\in X}\max\left(\ln\frac{P(x)}{Q(x)},\ln\frac{Q(x)}{P(x)}\right)roman_sup start_POSTSUBSCRIPT italic_x ∈ italic_X end_POSTSUBSCRIPT roman_max ( roman_ln divide start_ARG italic_P ( italic_x ) end_ARG start_ARG italic_Q ( italic_x ) end_ARG , roman_ln divide start_ARG italic_Q ( italic_x ) end_ARG start_ARG italic_P ( italic_x ) end_ARG ). Our neighborhood map 𝒩𝒩\mathcal{N}caligraphic_N will have the property that for all P𝑃Pitalic_P, and for all Q𝒩(P)𝑄𝒩𝑃Q\in\mathcal{N}(P)italic_Q ∈ caligraphic_N ( italic_P ), D(P,Q)ln2subscript𝐷𝑃𝑄2D_{\infty}(P,Q)\leq\ln 2italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_Q ) ≤ roman_ln 2. This corresponds to the benchmark algorithm 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT being given as auxiliary input a multiplicative constant factor approximation to the probability density function P(x)𝑃𝑥P(x)italic_P ( italic_x ) (and we can replace the constant 2222 by any constant). In particular, an algorithm that knows the support of the distribution P𝑃Pitalic_P will not be able to do much better than our algorithm that gets no such information. Notice that this implicitly implies that our algorithm is able to exploit sparsity in the data distribution since it is competitive with an algorithm that is told the support. In the one-dimensional real case we can achieve an even stronger notion of instance-optimality. In this case 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ) is defined to be {P,Q}𝑃𝑄\{P,Q\}{ italic_P , italic_Q } where Q𝑄Qitalic_Q is a distribution with D(P,Q)ln2subscript𝐷𝑃𝑄2D_{\infty}(P,Q)\leq\ln 2italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_Q ) ≤ roman_ln 2. This is a strengthening of the rate defined by the hardest one-dimensional subproblem.

We also give a definition that captures another aspect of instance optimality, related to the notion of super efficiency, that we term local minimality in Section 3. Informally, local minimality says that if any comparator algorithm does better than 𝒜𝒜\mathcal{A}caligraphic_A on P𝑃Pitalic_P, then there is a distribution Q𝑄Qitalic_Q in the neighborhood of P𝑃Pitalic_P where 𝒜𝒜\mathcal{A}caligraphic_A does better than the comparator. Approximate local minimality relaxes the latter condition to being better than some constant times the comparator. The two definitions of approximate local minimality and instance optimality are in general incomparable (see Section 3) but for suitable smooth algorithms, we show that these definitions are equivalent. Our algorithms, both for the 1-dimensional and the case of general metric spaces approximately satisfy both these definitions.

In order to show that the instance optimality definition is achievable, we give both algorithmic upper bounds and matching, up to logarithmic factors, theoretical lower bounds. The algorithms we use in our upper bounds are built largely from ingredients previously used for similar problems. We see this as an asset since these algorithms are implementable in practice. A key ingredient that we do introduce is the use of randomised HST approximation of finite metric spaces. This replaces deterministic hierarchical decompositions that were used in prior work, allowing us to gain tighter utility guarantees. Our main conceptual contribution is to introduce what we believe to the right notion of instance optimality for this problem, including the definition of a meaningful neighbourhood function. The main technical challenge is in the lower bounds, which require carefully building nets of distributions within each neighborhood 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ) that allow us to use a slight generalisation of DP Assoud’s Lemma to give a lower bound on the target estimation rate for each distribution P𝑃Pitalic_P.

1.1 Our Results

Given an estimation algorithm 𝒜:𝒳n:𝒜superscript𝒳𝑛\mathcal{A}:\mathcal{X}^{n}\to\mathcal{M}caligraphic_A : caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → caligraphic_M, the estimation rate of 𝒜𝒜\mathcal{A}caligraphic_A for distribution P𝑃Pitalic_P is:

R𝒜,n(P)=inftsubscript𝑅𝒜𝑛𝑃subscriptinfimum𝑡\displaystyle R_{\mathcal{A},n}(P)=\inf_{t\in\mathbb{R}}italic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_P ) = roman_inf start_POSTSUBSCRIPT italic_t ∈ blackboard_R end_POSTSUBSCRIPT {t:w.p. 0.75 over 𝐱Pn and the randomness of the algorithm, 𝒲d(𝒜(𝐱),P)t}.conditional-set𝑡w.p. 0.75 over 𝐱Pn and the randomness of the algorithm, subscript𝒲𝑑𝒜𝐱𝑃𝑡\displaystyle\{t:\text{w.p. $\geq 0.75$ over ${\bf x}\sim P^{n}$ and the % randomness of the algorithm, }\mathcal{W}_{d}(\mathcal{A}({\bf x}),P)\leq t\}.{ italic_t : w.p. ≥ 0.75 over bold_x ∼ italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and the randomness of the algorithm, caligraphic_W start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( caligraphic_A ( bold_x ) , italic_P ) ≤ italic_t } . (1)

We start by stating an informal version of our result in the one-dimensional real case.

Theorem 1.1 (Informal 1-dimensional result).

Let ε,γ(0,1]𝜀𝛾01\varepsilon,\gamma\in(0,1]italic_ε , italic_γ ∈ ( 0 , 1 ]. There is an ε𝜀\varepsilonitalic_ε-differentially private algorithm 𝒜𝒜\mathcal{A}caligraphic_A such that, for all distributions P𝑃Pitalic_P supported in [0,1]01[0,1][ 0 , 1 ], for all natural numbers n>polylog1/γε𝑛polylog1𝛾𝜀n>\frac{\operatorname{polylog}1/\gamma}{\varepsilon}italic_n > divide start_ARG roman_polylog 1 / italic_γ end_ARG start_ARG italic_ε end_ARG, there exists a distribution Q𝑄Qitalic_Q (with D(P,Q)ln2subscript𝐷𝑃𝑄2D_{\infty}(P,Q)\leq\ln 2italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_Q ) ≤ roman_ln 2) such that the following is satisfied.

For any ε𝜀\varepsilonitalic_ε-DP algorithm 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, with probability at least 0.750.750.750.75 over the randomness of 𝐱Pnsimilar-to𝐱superscript𝑃𝑛{\bf x}\sim P^{n}bold_x ∼ italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and additional randomness of the algorithm,

𝒲(𝒜(P^n),P)𝒲𝒜subscript^𝑃𝑛𝑃\displaystyle\mathcal{W}(\mathcal{A}(\hat{P}_{n}),P)caligraphic_W ( caligraphic_A ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , italic_P ) polylognsupP{P,Q}R𝒜,n(P)+γabsentpolylog𝑛subscriptsupremumsuperscript𝑃𝑃𝑄subscript𝑅superscript𝒜superscript𝑛superscript𝑃𝛾\displaystyle\leq\operatorname{polylog}n\cdot\sup_{P^{\prime}\in\{P,Q\}}R_{% \mathcal{A}^{\prime},n^{\prime}}(P^{\prime})+\gamma≤ roman_polylog italic_n ⋅ roman_sup start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ { italic_P , italic_Q } end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) + italic_γ

where nnpolylogn/γsuperscript𝑛𝑛polylog𝑛𝛾n^{\prime}\approx\frac{n}{\operatorname{polylog}n/\gamma}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≈ divide start_ARG italic_n end_ARG start_ARG roman_polylog italic_n / italic_γ end_ARG

In this one-dimensional case, our algorithm is based on DP quantile estimation. The additive γ𝛾\gammaitalic_γ term can be made polynomially small. The lower bound is based on (differentially private) simple hypothesis testing where for each distribution P𝑃Pitalic_P, we find a distribution in 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ) that is indistinguishable from P𝑃Pitalic_P given n𝑛nitalic_n samples but also sufficiently far from P𝑃Pitalic_P in Wasserstein distance.

Extending the quantiles based approach from the one dimensional setting to even the two dimensional setting is challenging, as there is no “right” way to generalize quantiles to dimensions 2 or beyond. Several previous works on Wasserstein density estimation (e.g. [BNNR09]) have used a hierarchical decomposition approach to address this question. A hierarchical approach has also been used in various more practical works on private density estimation (e.g. [CB22, QYL12, BKM+21, MJT+22, ZXX16]). These works focus on practical performance and do not offer tight theoretical bounds. A hierarchical approach was also used by [GHK+23], who proved theoretical bounds for a related problem, but not through the lens of instance optimality. We compare our results to theirs in more detail later in this section.

The use of deterministic hierarchical decompositions in all these papers means that some points that are very close (but on opposite sides of the boundaries of the hierarchical decomposition) get mapped to relatively far points, resulting in high distortion factors that are not appropriate for instance optimality.

Inspired by the above approaches but noting their constraints, we use a randomized embedding into hierarchically separated trees instead of a deterministic one. We define our algorithm on any hierarchically separated tree metric and use the fact that there is a randomized embedding of [0,1]2superscript012[0,1]^{2}[ 0 , 1 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT on a hierarchically separated tree metric space with low distortion. This, along with some other important technical modifications (such as truncating low values to 00), allows us to analyze a variant of the above practical algorithms theoretically and show that it satisfies our strong notion of instance optimality, up to polylogarithmic factors in the number of samples.

Theorem 1.2 (Informal two-dimensional result).

There is a polynomial time ε𝜀\varepsilonitalic_ε-differentially private algorithm 𝒜𝒜\mathcal{A}caligraphic_A that for any distribution P𝑃Pitalic_P on [0,1]2superscript012[0,1]^{2}[ 0 , 1 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, any integer n𝑛nitalic_n, and any ε𝜀\varepsilonitalic_ε-DP algorithm 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT with probability at least 0.75, satisfies

𝒲2(𝒜(P^n),P)subscript𝒲2𝒜subscript^𝑃𝑛𝑃\displaystyle\mathcal{W}_{2}(\mathcal{A}(\hat{P}_{n}),P)caligraphic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( caligraphic_A ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , italic_P ) (logn)O(1)supP:D(P,P)ln2𝔼[𝒲2(𝒜(P^n),P)],absentsuperscript𝑛𝑂1subscriptsupremum:superscript𝑃subscript𝐷𝑃superscript𝑃2𝔼subscript𝒲2superscript𝒜subscript^superscript𝑃superscript𝑛superscript𝑃\displaystyle\leq(\log n)^{O(1)}\sup_{P^{\prime}:D_{\infty}(P,P^{\prime})\leq% \ln 2}\operatorname*{\mathbb{E}}[\mathcal{W}_{2}(\mathcal{A}^{\prime}(\hat{P^{% \prime}}_{n^{\prime}}),P^{\prime})],≤ ( roman_log italic_n ) start_POSTSUPERSCRIPT italic_O ( 1 ) end_POSTSUPERSCRIPT roman_sup start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ≤ roman_ln 2 end_POSTSUBSCRIPT blackboard_E [ caligraphic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over^ start_ARG italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ] ,

where nnpolylognsuperscript𝑛𝑛polylog𝑛n^{\prime}\approx\frac{n}{\operatorname{polylog}n}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≈ divide start_ARG italic_n end_ARG start_ARG roman_polylog italic_n end_ARG. Here, the expectation is taken over the internal coin tosses of 𝒜𝒜\mathcal{A}caligraphic_A as well as over the choice of the i.i.d. samples P^nsubscript^𝑃𝑛\hat{P}_{n}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

In fact, since our algorithm is defined on any hierarchically separated tree metric space, it has the added bonus of giving instance optimality results for any finite metric space (since powerful results [Bar96, FRT03] show that any finite metric space can be embedded in a hierarchically separated tree metric space with a distortion factor at most logarithmic in the size of the metric space).

Theorem 1.3 (Informal finite metric result).

Let (𝒳,d)𝒳𝑑(\mathcal{X},d)( caligraphic_X , italic_d ) be an arbitrary metric space with diameter 1111. There is a polynomial time ε𝜀\varepsilonitalic_ε-differentially private algorithm 𝒜𝒜\mathcal{A}caligraphic_A such that for any distribution P𝑃Pitalic_P on X𝑋Xitalic_X any integer n𝑛nitalic_n and any ε𝜀\varepsilonitalic_ε-DP algorithm 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT with probability at least 0.75, satisfies

𝒲(𝒜(P^n),P)𝒲𝒜subscript^𝑃𝑛𝑃\displaystyle\mathcal{W}(\mathcal{A}(\hat{P}_{n}),P)caligraphic_W ( caligraphic_A ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , italic_P ) (log|𝒳|logn)O(1)supP:D(P,P)ln2𝔼[𝒲(𝒜(P^n),P)],absentsuperscript𝒳𝑛𝑂1subscriptsupremum:superscript𝑃subscript𝐷𝑃superscript𝑃2𝔼𝒲superscript𝒜subscript^superscript𝑃superscript𝑛superscript𝑃\displaystyle\leq(\log|\mathcal{X}|\cdot\log n)^{O(1)}\sup_{P^{\prime}:D_{% \infty}(P,P^{\prime})\leq\ln 2}\operatorname*{\mathbb{E}}[\mathcal{W}(\mathcal% {A}^{\prime}(\hat{P^{\prime}}_{n^{\prime}}),P^{\prime})],≤ ( roman_log | caligraphic_X | ⋅ roman_log italic_n ) start_POSTSUPERSCRIPT italic_O ( 1 ) end_POSTSUPERSCRIPT roman_sup start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ≤ roman_ln 2 end_POSTSUBSCRIPT blackboard_E [ caligraphic_W ( caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over^ start_ARG italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ] ,

where nnpolylognsuperscript𝑛𝑛polylog𝑛n^{\prime}\approx\frac{n}{\operatorname{polylog}n}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≈ divide start_ARG italic_n end_ARG start_ARG roman_polylog italic_n end_ARG. Here, the expectation is taken over the internal coin tosses of 𝒜𝒜\mathcal{A}caligraphic_A as well as over the choice of the i.i.d. samples P^nsubscript^𝑃𝑛\hat{P}_{n}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

Our lower bound result is actually slightly stronger than stated in Theorem 1.3 since it holds not only for ε𝜀\varepsilonitalic_ε-DP, but also for (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-DP. At this point, we also compare specifically to the paper of [GHK+23] who give an algorithm for obtaining two-dimensional heatmaps and analyze it theoretically. They focus on the empirical version of a variant of this problem as opposed to the population version, and aim to compete with the best k𝑘kitalic_k-sparse distribution. Their algorithm takes the sparsity parameter k𝑘kitalic_k as input in order to set parameters and achieves additive error k/n𝑘𝑛\sqrt{k}/nsquare-root start_ARG italic_k end_ARG / italic_n (and a constant multiplicative factor). On the other hand, our algorithm also performs better for sparse distributions but is automatically adaptive to the sparsity (and hence doesn’t need to take it as an input). Additionally the additive term in our work can be made polynomially small (for any polynomial) in n𝑛nitalic_n at a logarithmic cost to the multiplicative error (regardless of the sparsity of the distribution). On the other hand, for large k𝑘kitalic_k their results have additive error that scales with 1/n1𝑛1/\sqrt{n}1 / square-root start_ARG italic_n end_ARG. Their use of a deterministic hierarchical decomposition makes their algorithm unsuitable for our notion of instance optimality (as discussed earlier), and it is unclear if their algorithm can be directly extended to all finite metric spaces.

Note that instance optimality for all finite metric spaces implies instance optimality results for a wide variety of applications not addressed in prior work. For example, our results immediately extend to other low-dimensional real spaces with arbitrary metrics (for e.g. psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT norms). They also give non-trivial improvements on worst-case analysis for higher-dimensional spaces that are not the main focus of our work (for [0,1]dsuperscript01𝑑[0,1]^{d}[ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we can use a fine grid of size (1/(η/d))dsuperscript1𝜂𝑑𝑑(1/(\eta/\sqrt{d}))^{d}( 1 / ( italic_η / square-root start_ARG italic_d end_ARG ) ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT at an additive cost of η𝜂\etaitalic_η in the Wasserstein distance in order to create a finite metric space to apply our result on. Since the dependence on |𝒳|𝒳|\mathcal{X}|| caligraphic_X | in the result above is logarithmic, this translates to a dlogdη𝑑𝑑𝜂d\log\frac{d}{\eta}italic_d roman_log divide start_ARG italic_d end_ARG start_ARG italic_η end_ARG multiplicative overhead term replacing the log|𝒳|𝒳\log|\mathcal{X}|roman_log | caligraphic_X | factor above. While this is still a significant overhead, all previous results on density estimation in the Wasserstein distance (in both the private and non-private literature) are worst case, where the sample complexity is exponential in d𝑑ditalic_d. Since our results only have a polynomial dependence in d𝑑ditalic_d over the optimal error, this is a non-trivial improvement over worst-case error, even when d𝑑ditalic_d is large.

Another immediate application of our results is to give (to the best of our knowledge) new bounds for private estimation of discrete distributions in TV distance. Generally, for learning a discrete distribution defined by probabilities {p1,,pk}subscript𝑝1subscript𝑝𝑘\{p_{1},\ldots,p_{k}\}{ italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT }, our results lead to a rate (up to polylogarithmic factors) of imin{pi(1pi),pi(1pi)n}+imin(pi,(1pi),1εn).subscript𝑖subscript𝑝𝑖1subscript𝑝𝑖subscript𝑝𝑖1subscript𝑝𝑖𝑛subscript𝑖subscript𝑝𝑖1subscript𝑝𝑖1𝜀𝑛\sum_{i}\min\Big{\{}p_{i}(1-p_{i}),\sqrt{\frac{p_{i}(1-p_{i})}{n}}\Big{\}}+% \sum_{i}\min\left(p_{i},(1-p_{i}),\frac{1}{\varepsilon n}\right).∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_min { italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , square-root start_ARG divide start_ARG italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_n end_ARG end_ARG } + ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_min ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ( 1 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG ) .

This can give significant improvements over the worst case bounds for practically important distributions. The minimax rate is linear in the support size k𝑘kitalic_k, namely Θ(k/εn)Θ𝑘𝜀𝑛\Theta(k/\varepsilon n)roman_Θ ( italic_k / italic_ε italic_n ) (for sufficiently small ε𝜀\varepsilonitalic_ε). Now, consider the following power-law distribution over support size k𝑘kitalic_k: p(i)i2proportional-to𝑝𝑖superscript𝑖2p(i)\propto i^{-2}italic_p ( italic_i ) ∝ italic_i start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT. (Power law distributions arise frequently in practice for e.g. frequencies of family names, sizes of power outages etc. all follow power law distributions.) Applying our result above gives a bound that is O~(min{kεn,1εn})~𝑂𝑘𝜀𝑛1𝜀𝑛\tilde{O}\left(\min\{\frac{k}{\varepsilon n},\frac{1}{\sqrt{\varepsilon n}}\}\right)over~ start_ARG italic_O end_ARG ( roman_min { divide start_ARG italic_k end_ARG start_ARG italic_ε italic_n end_ARG , divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_ε italic_n end_ARG end_ARG } ), which is much better than the worst case bound for large support distributions.

Our result also applies to other practically important settings such as building lists of popular sequences such as n-grams over words. We leave open the questions of designing instance-optimal algorithms for other practically important questions in private learning and statistics, and of designing better instance optimal algorithms for higher dimensional spaces. We also leave open the question of removing the polylogarithmic factors in our instance optimality bounds.

1.2 Techniques

1.2.1 Distributions over \mathbb{R}blackboard_R:

We start by describing the rate we obtain for distributions P𝑃Pitalic_P over \mathbb{R}blackboard_R.In order to state the rate, we will use qαsubscript𝑞𝛼q_{\alpha}italic_q start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT to represent the α𝛼\alphaitalic_α-quantile of the distribution P𝑃Pitalic_P and use P|a,bevaluated-at𝑃𝑎𝑏P|_{a,b}italic_P | start_POSTSUBSCRIPT italic_a , italic_b end_POSTSUBSCRIPT to define a certain restricted distribution described below. The rate consists of three terms and roughly looks as follows— we suppress logarithmic factors in n𝑛nitalic_n.

R𝒜,n(P)=O~(𝔼[𝒲(P,P^n)]+1εn(q11εnq1εn)+𝒲(P,P|q1εn,q11εn)),subscript𝑅𝒜𝑛𝑃~𝑂𝔼delimited-[]𝒲𝑃subscript^𝑃𝑛1𝜀𝑛subscript𝑞11𝜀𝑛subscript𝑞1𝜀𝑛𝒲𝑃evaluated-at𝑃subscript𝑞1𝜀𝑛subscript𝑞11𝜀𝑛R_{\mathcal{A},n}(P)=\tilde{O}\left(\mathbb{E}\left[\mathcal{W}\left(P,\hat{P}% _{n}\right)\right]+\frac{1}{\varepsilon n}\left(q_{1-\frac{1}{\varepsilon n}}-% q_{\frac{1}{\varepsilon n}}\right)+\mathcal{W}(P,P|_{q_{\frac{1}{\varepsilon n% }},q_{1-\frac{1}{\varepsilon n}}})\right),italic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_P ) = over~ start_ARG italic_O end_ARG ( blackboard_E [ caligraphic_W ( italic_P , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ] + divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG ( italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG end_POSTSUBSCRIPT ) + caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) ,

The first term is 𝔼[𝒲(P,P^n)]𝔼delimited-[]𝒲𝑃subscript^𝑃𝑛\mathbb{E}[\mathcal{W}(P,\hat{P}_{n})]blackboard_E [ caligraphic_W ( italic_P , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ], the expected Wasserstein distance between the true distribution and the empirical distribution over n𝑛nitalic_n samples, and is the non-private term. The remaining two terms represent the cost of privacy- the first is a specific interquantile distance, roughly 1εn(q11εnq1εn)1𝜀𝑛subscript𝑞11𝜀𝑛subscript𝑞1𝜀𝑛\frac{1}{\varepsilon n}(q_{1-\frac{1}{\varepsilon n}}-q_{\frac{1}{\varepsilon n% }})divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG ( italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG end_POSTSUBSCRIPT ), and the second can be thought of as capturing the weight of the tails- represented by the Wasserstein distance between P𝑃Pitalic_P and a ‘restricted’ version of P𝑃Pitalic_P with its tails chopped off (i.e. the cumulative distribution function is 00 below q1/εnsubscript𝑞1𝜀𝑛q_{1/\varepsilon n}italic_q start_POSTSUBSCRIPT 1 / italic_ε italic_n end_POSTSUBSCRIPT and 1111 above q11/εnsubscript𝑞11𝜀𝑛q_{1-1/\varepsilon n}italic_q start_POSTSUBSCRIPT 1 - 1 / italic_ε italic_n end_POSTSUBSCRIPT and identical to P𝑃Pitalic_P otherwise). Observe that all 3333 of the terms above are smaller for distributions with small support or greater concentration, and hence the rate adapts to the hardness of the distributions.

Upper Bounds: The upper bound involves estimating roughly εn𝜀𝑛\varepsilon nitalic_ε italic_n equally spaced quantiles of the empirical distribution differentially privately (using a known private CDF estimation algorithm), and placing roughly 1/εn1𝜀𝑛1/\varepsilon n1 / italic_ε italic_n mass at each of the estimated quantile points. For the analysis, the intuition for each of the terms is as follows: since we only have access to the empirical distribution, the non-private term 𝔼[𝒲(P,P^n)]𝔼delimited-[]𝒲𝑃subscript^𝑃𝑛\mathbb{E}[\mathcal{W}(P,\hat{P}_{n})]blackboard_E [ caligraphic_W ( italic_P , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ] comes from that. Next, if the quantile estimates are good, then the pointwise CDF differences between the empirical distribution and the estimated distribution are at most 1/εn1𝜀𝑛1/\varepsilon n1 / italic_ε italic_n (due to the discretization), and so we will pay 1/εn1𝜀𝑛1/\varepsilon n1 / italic_ε italic_n multiplied by the interquantile distance of the empirical distribution. This aligns with the accuracy of state-of-the-art DP quantile estimation algorithms. Finally, since the distribution is restricted to the estimated quantiles, the distribution is 00 before the first estimated quantile and 1111 above the last estimated quantile and so we pay the Wasserstein distance between the empirical distribution and a restricted version of the empirical distribution. Some care needs to be taken while reasoning about expectation versus high probability (for various terms), and in relating population quantities to empirical quantities (which we do using various concentration inequalities). Details can be found in Section 6.2.

Lower Bounds: We prove that the private and non-private terms are lower bounds separately. Both proofs follow the same framework. The idea is that given knowledge of two distributions P𝑃Pitalic_P and Q𝑄Qitalic_Q, we can use a (private) Wasserstein estimation algorithm to construct a hypothesis test distinguishing P𝑃Pitalic_P from Q𝑄Qitalic_Q. If the (private) estimate for P𝑃Pitalic_P and Q𝑄Qitalic_Q with n𝑛nitalic_n samples gives error smaller than 12𝒲(P,Q)12𝒲𝑃𝑄\frac{1}{2}\mathcal{W}(P,Q)divide start_ARG 1 end_ARG start_ARG 2 end_ARG caligraphic_W ( italic_P , italic_Q ), we can use this to distinguish P𝑃Pitalic_P from Q𝑄Qitalic_Q. This would give a contradiction if P𝑃Pitalic_P and Q𝑄Qitalic_Q are (privately) indistinguishable with n𝑛nitalic_n samples. Hence, this would give a lower bound of 12𝒲(P,Q)12𝒲𝑃𝑄\frac{1}{2}\mathcal{W}(P,Q)divide start_ARG 1 end_ARG start_ARG 2 end_ARG caligraphic_W ( italic_P , italic_Q ) on the error of the Wasserstein estimation algorithm on P𝑃Pitalic_P or Q𝑄Qitalic_Q.

Thus the task reduces to constructing a distribution Q𝑄Qitalic_Q that satisfies three properties: 1) it is (privately) indistinguishable from P𝑃Pitalic_P given n𝑛nitalic_n samples, 2) the Wasserstein distance between P𝑃Pitalic_P and Q𝑄Qitalic_Q is sufficiently large, 3) D(P,Q)ln2subscript𝐷𝑃𝑄2D_{\infty}(P,Q)\leq\ln 2italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_Q ) ≤ roman_ln 2. The main technical work is in identifying a distribution Q𝑄Qitalic_Q that satisfies these properties.

For the privacy term, we construct the distribution Q𝑄Qitalic_Q by taking half the mass from the first 1/εn1𝜀𝑛1/\varepsilon n1 / italic_ε italic_n-quantile of P𝑃Pitalic_P (scaling the density function by half) and moving it to the last 1/εn1𝜀𝑛1/\varepsilon n1 / italic_ε italic_n-quantile of P𝑃Pitalic_P (scaling the density function by 3/2323/23 / 2). The third property is satisfied by definition, so we reason about the other two. Intuitively, since the Wasserstein distance captures how hard it is to ‘move’ P𝑃Pitalic_P to Q𝑄Qitalic_Q, this mass needs to move at least the interquantile distance to change P𝑃Pitalic_P to Q𝑄Qitalic_Q. This implies that the Wasserstein distance is at least the interquantile distance scaled by 1/εn1𝜀𝑛1/\varepsilon n1 / italic_ε italic_n, as described in the rate. Additionally, mass that is further out in the tail needs to move more, which is captured by the Wasserstein distance between the distribution P𝑃Pitalic_P and its ‘restriction’. Hence, the Wasserstein distance between P𝑃Pitalic_P and Q𝑄Qitalic_Q is lower bounded by these two terms of interest. The intuition behind Property 2 is that it is hard for any ε𝜀\varepsilonitalic_ε-DP algorithm to pinpoint the location of an 1εn1𝜀𝑛\frac{1}{\varepsilon n}divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG-fraction of the points in the dataset. Overall, this shows the privacy lower bound.

The non-private lower bound requires a more careful construction of Q𝑄Qitalic_Q. We divide P𝑃Pitalic_P into various scales and carefully adjust them differently to obtain the desired properties. Formally, to construct Q𝑄Qitalic_Q from P𝑃Pitalic_P, we consider q1/2subscript𝑞12q_{1/2}italic_q start_POSTSUBSCRIPT 1 / 2 end_POSTSUBSCRIPT and all quantiles of the form q1/2isubscript𝑞1superscript2𝑖q_{1/2^{i}}italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT and q11/2isubscript𝑞11superscript2𝑖q_{1-1/2^{i}}italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT for i>1𝑖1i>1italic_i > 1. For 1i<logn1𝑖𝑛1\leq i<\log n1 ≤ italic_i < roman_log italic_n, we add mass to [q1/2i+1,q1/2i)subscript𝑞1superscript2𝑖1subscript𝑞1superscript2𝑖[q_{1/2^{i+1}},q_{1/2^{i}})[ italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ), by setting the density fQsubscript𝑓𝑄f_{Q}italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT to be (1+2i/n)fP1superscript2𝑖𝑛subscript𝑓𝑃(1+\sqrt{2^{i}/n})f_{P}( 1 + square-root start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT / italic_n end_ARG ) italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT and balance out the extra mass by setting fQsubscript𝑓𝑄f_{Q}italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT to be (12i/n)fP1superscript2𝑖𝑛subscript𝑓𝑃(1-\sqrt{2^{i}/n})f_{P}( 1 - square-root start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT / italic_n end_ARG ) italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT between [q11/2i,q11/2i+1)subscript𝑞11superscript2𝑖subscript𝑞11superscript2𝑖1[q_{1-1/2^{i}},q_{1-1/2^{i+1}})[ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ). For ilogn𝑖𝑛i\geq\log nitalic_i ≥ roman_log italic_n (i.e. the tail), we add mass to [q1/2i+1,q1/2i)subscript𝑞1superscript2𝑖1subscript𝑞1superscript2𝑖[q_{1/2^{i+1}},q_{1/2^{i}})[ italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ), by setting fQsubscript𝑓𝑄f_{Q}italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT to be (1+12)fP112subscript𝑓𝑃(1+\frac{1}{2})f_{P}( 1 + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT and balance out the extra mass by setting fQsubscript𝑓𝑄f_{Q}italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT to be (112)fP112subscript𝑓𝑃(1-\frac{1}{2})f_{P}( 1 - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT between [q11/2i,q11/2i+1)subscript𝑞11superscript2𝑖subscript𝑞11superscript2𝑖1[q_{1-1/2^{i}},q_{1-1/2^{i+1}})[ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ).

The third property is again trivially satisfied. For the first property, observe that to ‘move’ P𝑃Pitalic_P to Q𝑄Qitalic_Q the extra 12in1superscript2𝑖𝑛\frac{1}{\sqrt{2^{i}n}}divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_n end_ARG end_ARG mass between [q1/2i+1,q1/2i)subscript𝑞1superscript2𝑖1subscript𝑞1superscript2𝑖[q_{1/2^{i+1}},q_{1/2^{i}})[ italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) has to ‘travel’ between q1/2isubscript𝑞1superscript2𝑖q_{1/2^{i}}italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT and q11/2isubscript𝑞11superscript2𝑖q_{1-1/2^{i}}italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, and so the Wasserstein distance between P𝑃Pitalic_P and Q𝑄Qitalic_Q can be lower bounded by a sum of various scaled interquantile distances. We attempt to upper bound the expected Wasserstein distance between P𝑃Pitalic_P and P^nsubscript^𝑃𝑛\hat{P}_{n}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT by a similar term. It is more intuitive to reason about this using an alternative (equivalent) formulation of Wasserstein distance as the area between the CDF curves of P𝑃Pitalic_P and Q𝑄Qitalic_Q. The intuition is that the expected pointwise CDF difference between P𝑃Pitalic_P and P^nsubscript^𝑃𝑛\hat{P}_{n}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT in the interval [q1/2i+1[q_{1/2^{i+1}}[ italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, q1/2i)q_{1/2^{i}})italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) would be roughly 12in1superscript2𝑖𝑛\frac{1}{\sqrt{2^{i}n}}divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_n end_ARG end_ARG (by properties of a Binomial) and hence the contribution of this interval to the area would be roughly 12in(q1/2iq1/2i+1)1superscript2𝑖𝑛subscript𝑞1superscript2𝑖subscript𝑞1superscript2𝑖1\frac{1}{\sqrt{2^{i}n}}\left(q_{1/2^{i}}-q_{1/2^{i+1}}\right)divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_n end_ARG end_ARG ( italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) and similarly for the corresponding interval [q11/2i[q_{1-1/2^{i}}[ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, q1/2i+1)q_{1/2^{i+1}})italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ). Hence, the expected Wasserstein distance would be a sum of these scaled quantile interval distances. We formalize this intuition using a result of Bobkov and Ledoux [BL19] that characterizes the expected Wasserstein distance between P𝑃Pitalic_P and P^nsubscript^𝑃𝑛\hat{P}_{n}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT as an integral of a function of the CDF of P𝑃Pitalic_P. We now have a bound in terms of the sum of scaled quantile interval distances, but we want to bound it by a sum of scaled interquantile distances. We can telescope the sum to indeed bound it by a sum of scaled interquantile distances. This establishes that 𝒲(P,Q)𝔼[𝒲(P,P^n]\mathcal{W}(P,Q)\geq\mathbb{E}[\mathcal{W}(P,\hat{P}_{n}]caligraphic_W ( italic_P , italic_Q ) ≥ blackboard_E [ caligraphic_W ( italic_P , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ]. Next, we show that P𝑃Pitalic_P is indistinguishable from Q𝑄Qitalic_Q by analyzing the KL divergence between P𝑃Pitalic_P and Q𝑄Qitalic_Q. The main idea is that high density intervals are modified by a small multiplicative factor of roughly 1+1n11𝑛1+\frac{1}{\sqrt{n}}1 + divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG, but low density intervals (with mass less than 1/n1𝑛1/n1 / italic_n) are modified by a constant multiplicative factor, so overall the contribution of each interval to the KL divergence is sufficiently small. This establishes indistinguishability with n𝑛nitalic_n samples. For formal details we refer the reader to Section 6.1.

1.2.2 Distributions on HSTs

Since the main technical challenge of proving Theorem 1.3 is proving the equivalent result for distributions on HST metric spaces, we focus on that problem in this section. Standard results on low distortion embeddings of metric spaces into HST metric spaces can be used to translate the HST result to [0,1]2superscript012[0,1]^{2}[ 0 , 1 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and to general metric spaces X𝑋Xitalic_X with log|X|𝑋\log|X|roman_log | italic_X | overhead.

Definition 1.4 (Hierarchically Separated Tree).

A hierarchically separated tree (HST) is a rooted weighted tree such that the edges between level \ellroman_ℓ and 11\ell-1roman_ℓ - 1 all have the same weight (denoted rsubscript𝑟r_{\ell}italic_r start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT) and the weights are geometrically decreasing so r+1=(1/2)rsubscript𝑟112subscript𝑟r_{\ell+1}=(1/2)r_{\ell}italic_r start_POSTSUBSCRIPT roman_ℓ + 1 end_POSTSUBSCRIPT = ( 1 / 2 ) italic_r start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT. Let 0pt0𝑝𝑡0pt0 italic_p italic_t be the depth of the tree.

HSTs can be defined with any geometric scaling but we will only need a factor of 2 in this work. HSTs may also have arbitrary degree. A HST defines a metric on its leaf nodes by defining the distance between any two leaf nodes to be the weight of the minimum weight path between them.

HST metric spaces are particularly well-behaved when working with the Wasserstein distance since the Wasserstein distance on a HST has a simple closed form. A distribution P𝑃Pitalic_P on the the underlying metric space in a HST induces a function 𝔊Psubscript𝔊𝑃\mathfrak{G}_{P}fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT on the nodes of the tree where the value of a node ν𝜈\nuitalic_ν is given by the weight in P𝑃Pitalic_P of the leaf nodes in the subtree rooted at ν𝜈\nuitalic_ν. For every level [0pt]delimited-[]0𝑝𝑡\ell\in[0pt]roman_ℓ ∈ [ 0 italic_p italic_t ] of the tree, let Psubscript𝑃P_{\ell}italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT be the distribution induced on the nodes at level \ellroman_ℓ where the probability of node ν𝜈\nuitalic_ν is 𝔊P(ν)subscript𝔊𝑃𝜈\mathfrak{G}_{P}(\nu)fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ). Thus Psubscript𝑃P_{\ell}italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT is a discrete distribution on a domain of size Nsubscript𝑁N_{\ell}italic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, where Nsubscript𝑁N_{\ell}italic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT is the number of nodes in level \ellroman_ℓ of the tree.

Lemma 1.5 (Closed form Wasserstein distance formula).

Given two distributions P𝑃Pitalic_P and Q𝑄Qitalic_Q defined in a HST metric space, the Wasserstein distance between P𝑃Pitalic_P and Q𝑄Qitalic_Q has the closed formula:

𝒲(P,Q)=12νrν|𝔊P(ν)𝔊Q(ν)|=rTV(P,Q),𝒲𝑃𝑄12subscript𝜈subscript𝑟𝜈subscript𝔊𝑃𝜈subscript𝔊𝑄𝜈subscriptsubscript𝑟TVsubscript𝑃subscript𝑄\mathcal{W}(P,Q)=\frac{1}{2}\sum_{\nu}r_{\nu}|\mathfrak{G}_{P}(\nu)-\mathfrak{% G}_{Q}(\nu)|=\sum_{\ell}r_{\ell}\text{\rm TV}(P_{\ell},Q_{\ell}),caligraphic_W ( italic_P , italic_Q ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT | fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) - fraktur_G start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_ν ) | = ∑ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT TV ( italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , italic_Q start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) ,

where rνsubscript𝑟𝜈r_{\nu}italic_r start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT is the weight of the edge connecting ν𝜈\nuitalic_ν to its parent, and the sum is over all nodes in the tree.

We will call a node ν𝜈\nuitalic_ν α𝛼\alphaitalic_α-active under the distribution P𝑃Pitalic_P if 𝔊P(ν)αsubscript𝔊𝑃𝜈𝛼\mathfrak{G}_{P}(\nu)\geq\alphafraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) ≥ italic_α. Let γP(α)subscript𝛾𝑃𝛼\gamma_{{P}}\left({\alpha}\right)italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_α ) be the set of α𝛼\alphaitalic_α-active nodes under P𝑃Pitalic_P and γP(α)subscript𝛾subscript𝑃𝛼\gamma_{{P_{\ell}}}\left({\alpha}\right)italic_γ start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_α ) be the set of α𝛼\alphaitalic_α-active nodes at level \ellroman_ℓ. Then there exists an algorithm 𝒜𝒜\mathcal{A}caligraphic_A such that given a distribution P𝑃Pitalic_P, ε>0𝜀0\varepsilon>0italic_ε > 0, and n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N,

𝒜,n(P)=O~(maxrx[N]min{P(x)(1P(x)),P(x)(1P(x))n}+xγP(2κ)P(x)+(|γP(2κ)|1)κ),subscript𝒜𝑛𝑃~𝑂subscriptsubscript𝑟subscript𝑥delimited-[]subscript𝑁subscript𝑃𝑥1subscript𝑃𝑥subscript𝑃𝑥1subscript𝑃𝑥𝑛subscript𝑥subscript𝛾subscript𝑃2𝜅subscript𝑃𝑥subscript𝛾subscript𝑃2𝜅1𝜅\mathcal{R}_{\mathcal{A},n}(P)=\tilde{O}\left(\max_{\ell}r_{\ell}\sum_{x\in[N_% {\ell}]}\min\left\{P_{\ell}(x)(1-P_{\ell}(x)),\sqrt{\frac{P_{\ell}(x)(1-P_{% \ell}(x))}{n}}\right\}+\sum_{x\notin\gamma_{{P_{\ell}}}\left({2\kappa}\right)}% P_{\ell}(x)+(|\gamma_{{P_{\ell}}}\left({2\kappa}\right)|-1)\kappa\right),caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_P ) = over~ start_ARG italic_O end_ARG ( roman_max start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_x ∈ [ italic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT roman_min { italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) ( 1 - italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) ) , square-root start_ARG divide start_ARG italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) ( 1 - italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) ) end_ARG start_ARG italic_n end_ARG end_ARG } + ∑ start_POSTSUBSCRIPT italic_x ∉ italic_γ start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 2 italic_κ ) end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) + ( | italic_γ start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 2 italic_κ ) | - 1 ) italic_κ ) ,

where the max is over all the levels of the tree and κ=Θ(log(n)εn)𝜅Θ𝑛𝜀𝑛\kappa=\Theta(\frac{\log(n)}{\varepsilon n})italic_κ = roman_Θ ( divide start_ARG roman_log ( start_ARG italic_n end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG ). Further, this bound matches (up to logarithmic factors) the lower bound minε-DP 𝒜supP:D(P,P)ln2𝔼[𝒲(𝒜(P^n),P)]subscript𝜀-DP superscript𝒜subscriptsupremum:superscript𝑃subscript𝐷𝑃superscript𝑃2𝔼𝒲superscript𝒜subscript^superscript𝑃superscript𝑛superscript𝑃\min_{\varepsilon\text{-DP }\mathcal{A}^{\prime}}\sup_{P^{\prime}:D_{\infty}(P% ,P^{\prime})\leq\ln 2}\operatorname*{\mathbb{E}}[\mathcal{W}(\mathcal{A}^{% \prime}(\hat{P^{\prime}}_{n^{\prime}}),P^{\prime})]roman_min start_POSTSUBSCRIPT italic_ε -DP caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ≤ roman_ln 2 end_POSTSUBSCRIPT blackboard_E [ caligraphic_W ( caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over^ start_ARG italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ] where nnpolylognsuperscript𝑛𝑛polylog𝑛n^{\prime}\approx\frac{n}{\operatorname{polylog}n}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≈ divide start_ARG italic_n end_ARG start_ARG roman_polylog italic_n end_ARG. The error rate 𝒜,nsubscript𝒜𝑛\mathcal{R}_{\mathcal{A},n}caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT does indeed adapt to easy instances as we expected. The error decomposes into three components. The first component is the non-private sampling error; the error that would occur even if privacy was not required. The second component indicates that we can not privately estimate the value of nodes that have probability less than 1/(εn)absent1𝜀𝑛\approx 1/(\varepsilon n)≈ 1 / ( italic_ε italic_n ). The third component is the error due to privacy on the active nodes. If P𝑃Pitalic_P is highly concentrated then we expect most nodes to either be κ𝜅\kappaitalic_κ-active or have weight 0, so the first two terms in 𝒩,n,ε(P)subscript𝒩𝑛𝜀𝑃\mathcal{R}_{\mathcal{N},n,\varepsilon}(P)caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n , italic_ε end_POSTSUBSCRIPT ( italic_P ) are small. There should also be few active nodes, making the last term also small. Conversely, if P𝑃Pitalic_P has a large region of low density then we expect a large number of inactive nodes, as well as non-zero inactive nodes that are at higher levels of the tree and hence contribute more to the final term. Thus, in distributions with high dispersion we expect the right hand side to be large.

Upper Bounds: As in the one-dimensional setting, we want to restrict to only privately estimating the density at a small number (εnabsent𝜀𝑛\approx\varepsilon n≈ italic_ε italic_n) of points. While we could try to mimic the one-dimensional solution by privately estimating a solution to the εn𝜀𝑛\varepsilon nitalic_ε italic_n-median problem, it’s not clear how to prove such an approach is instance-optimal. It turns out that a simpler solution more amenable to analysis will suffice. Our algorithm has two stages; first we attempt to find the set of κ𝜅\kappaitalic_κ-active nodes, then we estimate the weight of these active nodes. Since these nodes have weight greater than log(n)εn𝑛𝜀𝑛\frac{\log(n)}{\varepsilon n}divide start_ARG roman_log ( start_ARG italic_n end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG, we can privately estimate them to within constant multiplicative error. Any nodes that are not detected as active, are initially ascribed a weight of 0. The error due to not estimating the non-active nodes is absorbed into the third error term. The final step is to project the noisy density function into the space of distributions on the underlying metric space. The error of the upper bound algorithm is summed over all levels of the tree, although since the depth of the tree is logarithmic in the size of the metric space, this is within a logarithmic factor of the maximum over the levels.

Lower Bound: We first observe that in order to estimate the distribution well in Wasserstein distance, an algorithm must estimate each level of the tree well in TV distance. This is derived from Lemma 1.5. This allows us to reduce to the problem of lower bounding the error of density estimation of discrete distributions in TV distance. The main tool we use is a differentially private version of Assouad’s method. Similar to how the technique in the previous section allowed us to relate lower bounding estimation rates to simple hypothesis testing, Assouad’s lemma allows us to relate lower bounding estimation rates to multiple hypothesis testing. Note that unlike the technique in the previous section, Assouad’s lemma allows us to prove lower bounds on the expected error, rather than lower bounds on high probability error bounds. It involves constructing nets of distributions in 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ) that are pairwise far in the relevant metric of interest (which for us in the TV distance) but the multiple hypothesis testing problem between the distributions is sufficiently hard. For proving the third term belongs in the lower bound, the standard statement of DP Assouad’s lemma [ASZ21] suffices, where one builds a set of distributions indexed by a hypercube. For the first and second terms, we need to slightly generalise the statement to allow for sets of distributions indexed by a product of hypercubes. We use the approximate DP version of DP Assouad’s so while our upper bounds are for pure differential privacy, our lower bounds hold for both pure and approximate differential privacy.

Let us start with the third term. Suppose the number of active nodes is even (a small tweak is made if there is an odd number of active nodes). We pair up the active nodes and index each pair by a coordinate of the hypercube. For each corner of the hypercube, (u0,u1,,uk){±1}ksuperscript𝑢0superscript𝑢1superscript𝑢𝑘superscriptplus-or-minus1𝑘(u^{0},u^{1},\cdots,u^{k})\in\{\pm 1\}^{k}( italic_u start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , ⋯ , italic_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∈ { ± 1 } start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, for each coordinate j[k]𝑗delimited-[]𝑘j\in[k]italic_j ∈ [ italic_k ], if uj=+1superscript𝑢𝑗1u^{j}=+1italic_u start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = + 1, we move O~(κ)~𝑂𝜅\tilde{O}(\kappa)over~ start_ARG italic_O end_ARG ( italic_κ ) mass from one node in the j𝑗jitalic_jth pair to the other node. If uj=1superscript𝑢𝑗1u^{j}=-1italic_u start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = - 1 then we leave the j𝑗jitalic_jth pair of nodes alone. Since each active node has mass >κabsent𝜅>\kappa> italic_κ, it’s clear that each resulting distribution belongs in 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ). We can also show that these distributions form a sufficiently hard multiple hypothesis testing problem. By DP Assouad’s (Lemma 5.8), this allows us to lower bound the estimation error by Ω(kκ)Ω𝑘𝜅\Omega(k\kappa)roman_Ω ( italic_k italic_κ ), which is within Ω~~Ω\tilde{\Omega}over~ start_ARG roman_Ω end_ARG of the third term when the number of active nodes is 2absent2\geq 2≥ 2. We treat the case where there is a single active node separately.

For the second term, we want to pair up the inactive nodes in a similar manner and move half their mass from one node to the other. However, since we want to remain within 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ), we can’t pair any two inactive nodes together. Thus, we divide the inactive nodes into scales, where nodes within a certain scale all have weight within a multiplicative factor of two. We then pair up nodes within each scale and have a different hypercube for each scale. Again, it’s clear that these distributions are all in 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ) and we can show that these distributions form a sufficiently hard multiple hypothesis testing problem. The proof for the first term follows similarly.

2 Preliminaries

For all distributions P𝑃Pitalic_P, we will use fPsubscript𝑓𝑃f_{P}italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT to denote the density of P𝑃Pitalic_P (when it exists) and FPsubscript𝐹𝑃F_{P}italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT to denote the cumulative distribution function of P𝑃Pitalic_P. Given a space 𝒳𝒳\mathcal{X}caligraphic_X, let Δ(𝒳)Δ𝒳\Delta(\mathcal{X})roman_Δ ( caligraphic_X ) be the set of distributions on the space 𝒳𝒳\mathcal{X}caligraphic_X. Given a logical statement a𝑎aitalic_a, let χa=0subscript𝜒𝑎0\chi_{a}=0italic_χ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = 0 if a𝑎aitalic_a is false and 1 if a𝑎aitalic_a is true. For example, χ0=0=1subscript𝜒001\chi_{0=0}=1italic_χ start_POSTSUBSCRIPT 0 = 0 end_POSTSUBSCRIPT = 1 and χ0=1=0subscript𝜒010\chi_{0=1}=0italic_χ start_POSTSUBSCRIPT 0 = 1 end_POSTSUBSCRIPT = 0.

A number of distances between distributions are important in this work. We start by defining the infinity divergence, which is important in the notion of instance optimality we use.

Definition 2.1 (Dsubscript𝐷D_{\infty}italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-divergence).

Given two distributions P𝑃Pitalic_P and Q𝑄Qitalic_Q with the same support, the \infty-Rényi divergence D(P,Q)=lnsuptmax{P(t)Q(t),Q(t)P(t)}subscript𝐷𝑃𝑄subscriptsupremum𝑡𝑃𝑡𝑄𝑡𝑄𝑡𝑃𝑡D_{\infty}(P,Q)=\ln\sup_{t}\max\left\{\frac{P(t)}{Q(t)},\frac{Q(t)}{P(t)}\right\}italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_Q ) = roman_ln roman_sup start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT roman_max { divide start_ARG italic_P ( italic_t ) end_ARG start_ARG italic_Q ( italic_t ) end_ARG , divide start_ARG italic_Q ( italic_t ) end_ARG start_ARG italic_P ( italic_t ) end_ARG }, if P𝑃Pitalic_P and Q𝑄Qitalic_Q are discrete, and D(P,Q)=lnsuptmax{fP(t)fQ(t),fQ(t)fP(t)}subscript𝐷𝑃𝑄subscriptsupremum𝑡subscript𝑓𝑃𝑡subscript𝑓𝑄𝑡subscript𝑓𝑄𝑡subscript𝑓𝑃𝑡D_{\infty}(P,Q)=\ln\sup_{t}\max\left\{\frac{f_{P}(t)}{f_{Q}(t)},\frac{f_{Q}(t)% }{f_{P}(t)}\right\}italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_Q ) = roman_ln roman_sup start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT roman_max { divide start_ARG italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) end_ARG , divide start_ARG italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) end_ARG }, if P𝑃Pitalic_P and Q𝑄Qitalic_Q are continuous distributions on \mathbb{R}blackboard_R, and have density functions. If P𝑃Pitalic_P and Q𝑄Qitalic_Q don’t have the same support, then D(P,Q)=subscript𝐷𝑃𝑄D_{\infty}(P,Q)=\inftyitalic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_Q ) = ∞.

We will use KL(P,Q)KL𝑃𝑄\text{\rm KL}(P,Q)KL ( italic_P , italic_Q ) to denote the KL-divergence, H2(P,Q)superscript𝐻2𝑃𝑄H^{2}(P,Q)italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P , italic_Q ) to denote the squared Hellinger divergence and TV(P,Q)TV𝑃𝑄\text{\rm TV}(P,Q)TV ( italic_P , italic_Q ) to denote the total variation distance, defined later. These metrics are defined in Appendix A

Wasserstein Distance:

The error metric that we use to judge our performance on the density estimation task is 1111-Wasserstein distance (that we will call just Wasserstein distance where it is clear from context). In this subsection, we define Wasserstein distance.

Definition 2.2.

For any separable metric space (E,D)𝐸𝐷(E,D)( italic_E , italic_D ), let P,Q𝑃𝑄P,Qitalic_P , italic_Q represent Borel measures on E𝐸Eitalic_E. Then, the 1111-Wasserstein distance between P,Q𝑃𝑄P,Qitalic_P , italic_Q is defined as

𝒲(P,Q)=infπEED(t,t0)π(x,x0),𝒲𝑃𝑄subscriptinfimum𝜋subscript𝐸subscript𝐸𝐷𝑡subscript𝑡0𝜋𝑥subscript𝑥0\mathcal{W}(P,Q)=\inf_{\pi}\int_{E}\int_{E}D(t,t_{0})\pi(x,x_{0}),caligraphic_W ( italic_P , italic_Q ) = roman_inf start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT italic_D ( italic_t , italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) italic_π ( italic_x , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ,

where the infimum is over all measures π𝜋\piitalic_π on the product space E×E𝐸𝐸E\times Eitalic_E × italic_E with marginals P𝑃Pitalic_P and Q𝑄Qitalic_Q respectively.

Finally, for one dimensional real spaces where the metric of interest is 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm, we will use the following equivalent formulation of Wasserstein distance extensively.

Lemma 2.3 (Wasserstein formula over \mathbb{R}blackboard_R).

Let P,Q𝑃𝑄P,Qitalic_P , italic_Q represent probability distributions on \mathbb{R}blackboard_R with finite expectation. Then, the 1111-Wasserstein distance between P,Q𝑃𝑄P,Qitalic_P , italic_Q is equal to

𝒲(P,Q)=|FP(t)FQ(t)|𝑑t,𝒲𝑃𝑄superscriptsubscriptsubscript𝐹𝑃𝑡subscript𝐹𝑄𝑡differential-d𝑡\mathcal{W}(P,Q)=\int_{\infty}^{\infty}|F_{P}(t)-F_{Q}(t)|dt,caligraphic_W ( italic_P , italic_Q ) = ∫ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t ,

where the F()𝐹F(\cdot)italic_F ( ⋅ ) represents the cumulative distribution function.

Given an metric space 𝒳𝒳\mathcal{X}caligraphic_X, the Wasserstein metric is a well-defined metric on the set of the probability distributions over 𝒳𝒳\mathcal{X}caligraphic_X.

2.1 Differential Privacy

We start by defining the Hamming distance between datasets.

Definition 2.4 (Hamming Distance).

For any n>0𝑛0n>0italic_n > 0 and two datasets 𝐱=(x1,,xn),𝐱=(x1,,xn)𝒳nformulae-sequence𝐱subscript𝑥1subscript𝑥𝑛superscript𝐱subscriptsuperscript𝑥1subscriptsuperscript𝑥𝑛superscript𝒳𝑛{\bf x}=(x_{1},\dots,x_{n}),{\bf x}^{\prime}=(x^{\prime}_{1},\dots,x^{\prime}_% {n})\in\mathcal{X}^{n}bold_x = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∈ caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, the Hamming distance dHam(𝐱,𝐱)subscript𝑑𝐻𝑎𝑚𝐱superscript𝐱d_{Ham}({\bf x},{\bf x^{\prime}})italic_d start_POSTSUBSCRIPT italic_H italic_a italic_m end_POSTSUBSCRIPT ( bold_x , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) between the two datasets is defined as the number of entries that they disagree on, i.e. i=1n1[xixi]superscriptsubscript𝑖1𝑛1delimited-[]subscript𝑥𝑖subscriptsuperscript𝑥𝑖\sum_{i=1}^{n}\mathrm{1}[x_{i}\neq x^{\prime}_{i}]∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT 1 [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ].

Next, we define differential privacy.

Definition 2.5 (Differential Privacy [DMNS17, DKM+06]).

A randomized algorithm 𝒜:𝒳n𝒴:𝒜superscript𝒳𝑛𝒴\mathcal{A}:\mathcal{X}^{n}\rightarrow\mathcal{Y}caligraphic_A : caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → caligraphic_Y is said to be (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-differentially private if for every pair of datasets 𝐱,𝐱𝒳n𝐱superscript𝐱superscript𝒳𝑛{\bf x},{\bf x}^{\prime}\in\mathcal{X}^{n}bold_x , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT such that dham(𝐱,𝐱)1subscript𝑑𝑎𝑚𝐱superscript𝐱1d_{ham}({\bf x},{\bf x}^{\prime})\leq 1italic_d start_POSTSUBSCRIPT italic_h italic_a italic_m end_POSTSUBSCRIPT ( bold_x , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ≤ 1 and for all subsets Y𝒴𝑌𝒴Y\subseteq\mathcal{Y}italic_Y ⊆ caligraphic_Y,

Pr[𝒜(𝐱)Y]eεPr[𝒜(𝐱)Y]+δ.probability𝒜𝐱𝑌superscript𝑒𝜀probability𝒜superscript𝐱𝑌𝛿\Pr[\mathcal{A}({\bf x})\in Y]\leq e^{\varepsilon}\cdot\Pr[\mathcal{A}({\bf x}% ^{\prime})\in Y]+\delta.roman_Pr [ caligraphic_A ( bold_x ) ∈ italic_Y ] ≤ italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT ⋅ roman_Pr [ caligraphic_A ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ italic_Y ] + italic_δ .

Differential privacy satisfies two important properties that we will utilise: it is closed under post-processing and composes naturally. More information about these properties and an example of a basic differentially private algorithm are given in Appendix A.

3 On Instance Optimality

In this section, we discuss the notion of instance optimality, and argue that it provides a useful benchmark that captures the idea of going beyond the worst case. The notion of instance optimality we propose can be see as a generalisation of the hardest one-dimensional subproblem, or hardest local alternative introduced by [CL15]. Suppose we have a family of distributions 𝒫Δ(𝒳)𝒫Δ𝒳\mathcal{P}\subset\Delta(\mathcal{X})caligraphic_P ⊂ roman_Δ ( caligraphic_X ) on a space 𝒳𝒳\mathcal{X}caligraphic_X and our goal is to learn the parameter θ:𝒫:𝜃𝒫\theta:\mathcal{P}\to\mathcal{M}italic_θ : caligraphic_P → caligraphic_M where \mathcal{M}caligraphic_M is a metric space with metric d𝑑ditalic_d. Given an estimation algorithm 𝒜:𝒳n:𝒜superscript𝒳𝑛\mathcal{A}:\mathcal{X}^{n}\to\mathcal{M}caligraphic_A : caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → caligraphic_M, we can define the estimation rate333We note that while the estimation rate here is defined in expectation, we will sometimes show results (for e.g. in the one-dimensional case) where estimation rate is defined with probability at least 0.750.750.750.75 over the randomness of the algorithm and the data; see Equation 1. of 𝒜𝒜\mathcal{A}caligraphic_A to be the function 𝒜,n:𝒫+:subscript𝒜𝑛𝒫subscript\mathcal{R}_{\mathcal{A},n}:\mathcal{P}\to\mathbb{R}_{+}caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT : caligraphic_P → blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT where

𝒜,n(P)=𝔼DPn[d(θ(P),𝒜(D))].subscript𝒜𝑛𝑃subscript𝔼similar-to𝐷superscript𝑃𝑛delimited-[]𝑑𝜃𝑃𝒜𝐷\mathcal{R}_{\mathcal{A},n}(P)=\mathbb{E}_{D\sim P^{n}}[d(\theta(P),\mathcal{A% }(D))].caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_P ) = blackboard_E start_POSTSUBSCRIPT italic_D ∼ italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_d ( italic_θ ( italic_P ) , caligraphic_A ( italic_D ) ) ] .

Since the estimation rate is a function of the distribution P𝑃Pitalic_P, the estimation rate of an algorithm may be lower at “easy” distributions and larger at “harder” distributions. As a classic example, consider the estimation rate of Bernoulli parameter estimation where 𝒜𝒜\mathcal{A}caligraphic_A simply outputs the empirical mean. Then 𝒜,n(Ber(p))=min{p(1p),p(1p)/n}subscript𝒜𝑛Ber𝑝𝑝1𝑝𝑝1𝑝𝑛\mathcal{R}_{\mathcal{A},n}(\texttt{Ber}(p))=\min\{p(1-p),\sqrt{p(1-p)/n}\}caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( Ber ( italic_p ) ) = roman_min { italic_p ( 1 - italic_p ) , square-root start_ARG italic_p ( 1 - italic_p ) / italic_n end_ARG }, so this algorithm performs better when the Bernoulli parameter is close to 0 or 1, and has it’s worst case error when p=1/2𝑝12p=1/2italic_p = 1 / 2.

Cai and Low [CL15] proposed three desiderata that a target estimation rate n:Δ(𝒳)+:subscript𝑛Δ𝒳subscript\mathcal{R}_{n}:\Delta(\mathcal{X})\to\mathbb{R}_{+}caligraphic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : roman_Δ ( caligraphic_X ) → blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT should satisfy in order to be a meaningful benchmark;

  1. 1.

    n(P)subscript𝑛𝑃\mathcal{R}_{n}(P)caligraphic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_P ) varies significantly across 𝒫𝒫\mathcal{P}caligraphic_P

  2. 2.

    nsubscript𝑛\mathcal{R}_{n}caligraphic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is an achievable estimation rate; there exists an algorithm 𝒜𝒜\mathcal{A}caligraphic_A and constant α𝛼\alphaitalic_α such that 𝒜,n(P)αn(P)subscript𝒜𝑛𝑃𝛼subscript𝑛𝑃\mathcal{R}_{\mathcal{A},n}(P)\leq\alpha\mathcal{R}_{n}(P)caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_P ) ≤ italic_α caligraphic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_P ) for all P𝒫𝑃𝒫P\in\mathcal{P}italic_P ∈ caligraphic_P

  3. 3.

    Outperforming the benchmark nsubscript𝑛\mathcal{R}_{n}caligraphic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT at one distribution leads to worse performance at another distribution.

In this section we will discuss the definition of instance optimality we will use in this work by defining the target estimation rate that will serve as our benchmark estimation rate. The main theorems of this paper establish that our chosen benchmark achieves desiderata 1 and 2 above. It is not immediately obvious that desiderata 3 holds. We will show in Section 3.2, through the introduction of a related notion of instance optimality which we call local minimality, that desiderata 3 holds in many important settings, including the problem studied in this paper.

3.1 Local Estimation Rates

We will start by defining a target estimation rate. We’ll say an algorithm is α𝛼\alphaitalic_α-instance optimal if it uniformly achieves this target estimation rate up to a multiplicative α𝛼\alphaitalic_α factor. For each distribution P𝒫𝑃𝒫P\in\mathcal{P}italic_P ∈ caligraphic_P, we define a neighbourhood 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ).

Definition 3.1.

Given a function 𝒩:𝒫𝔓(𝒫):𝒩𝒫𝔓𝒫\mathcal{N}:\mathcal{P}\to\mathfrak{P}(\mathcal{P})caligraphic_N : caligraphic_P → fraktur_P ( caligraphic_P ), where 𝔓(𝒫)𝔓𝒫\mathfrak{P}(\mathcal{P})fraktur_P ( caligraphic_P ) is the power set of 𝒫𝒫\mathcal{P}caligraphic_P, we define the optimal estimation rate with respect to 𝒩𝒩\mathcal{N}caligraphic_N to be:

𝒩,n(P)=min𝒜supQ𝒩(P)𝒜,n(Q).subscript𝒩𝑛𝑃subscript𝒜subscriptsupremum𝑄𝒩𝑃subscript𝒜𝑛𝑄\mathcal{R}_{\mathcal{N},n}(P)=\min_{\mathcal{A}}\sup_{Q\in\mathcal{N}(P)}% \mathcal{R}_{\mathcal{A},n}(Q).caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n end_POSTSUBSCRIPT ( italic_P ) = roman_min start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT italic_Q ∈ caligraphic_N ( italic_P ) end_POSTSUBSCRIPT caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_Q ) . (2)

An algorithm 𝒜𝒜\mathcal{A}caligraphic_A is α𝛼\alphaitalic_α-instance optimal with respect to 𝒩𝒩\mathcal{N}caligraphic_N if for all P𝒫𝑃𝒫P\in\mathcal{P}italic_P ∈ caligraphic_P,

𝒜,n(P)α𝒩,n(P)subscript𝒜𝑛𝑃𝛼subscript𝒩𝑛𝑃\mathcal{R}_{\mathcal{A},n}(P)\leq\alpha\mathcal{R}_{\mathcal{N},n}(P)caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_P ) ≤ italic_α caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n end_POSTSUBSCRIPT ( italic_P )

If an algorithm 𝒜𝒜\mathcal{A}caligraphic_A uniformly achieves the optimal estimation rate wrt a function 𝒩𝒩\mathcal{N}caligraphic_N, then this implies that for all distributions P𝑃Pitalic_P, the error of the algorithm 𝒜𝒜\mathcal{A}caligraphic_A on P𝑃Pitalic_P is competitive with an algorithm that is told the additional information that the distribution is in 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ). Given a function 𝒩𝒩\mathcal{N}caligraphic_N, it is possible that there does not exist an algorithm that uniformly achieves 𝒩,nsubscript𝒩𝑛\mathcal{R}_{\mathcal{N},n}caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n end_POSTSUBSCRIPT. For example, as discussed in the introduction, if 𝒩(P)={P}𝒩𝑃𝑃\mathcal{N}(P)=\{P\}caligraphic_N ( italic_P ) = { italic_P }, then 𝒩,nsubscript𝒩𝑛\mathcal{R}_{\mathcal{N},n}caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n end_POSTSUBSCRIPT is not uniformly achievable. Conversely, if 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ) is not chosen carefully, then the estimation rate 𝒩,nsubscript𝒩𝑛\mathcal{R}_{\mathcal{N},n}caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n end_POSTSUBSCRIPT may not define a meaningful benchmark; e.g. an estimation rate that adapts to easy instances.

A different formalization may be more probabilistic: the algorithm designer may have in mind a distribution 𝒟𝒟\mathcal{D}caligraphic_D over distributions that they care about, and their objective may be to minimize 𝔼P𝒟[𝒜,n(P)]subscript𝔼similar-to𝑃𝒟subscript𝒜𝑛𝑃\operatorname*{\mathbb{E}}_{P\sim\mathcal{D}}[\mathcal{R}_{\mathcal{A},n}(P)]blackboard_E start_POSTSUBSCRIPT italic_P ∼ caligraphic_D end_POSTSUBSCRIPT [ caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_P ) ]. Suppose that for the 𝒜superscript𝒜\mathcal{A}^{\star}caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT chosen by the algorithm designer, and for our neighborhood map 𝒩𝒩\mathcal{N}caligraphic_N, the function 𝒜,n(P)subscriptsuperscript𝒜𝑛𝑃\mathcal{R}_{\mathcal{A}^{\star},n}(P)caligraphic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , italic_n end_POSTSUBSCRIPT ( italic_P ) does not vary too much over 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ) on average. Formally, let

disc𝒜𝒩(P)=supP𝒩(P)(𝒜,n(P)𝒜,n(P)disc_{\mathcal{A}^{\star}}^{\mathcal{N}}(P)=\sup_{P^{\prime}\in\mathcal{N}(P)}% (\mathcal{R}_{\mathcal{A}^{\star},n}(P^{\prime})-\mathcal{R}_{\mathcal{A}^{% \star},n}(P)italic_d italic_i italic_s italic_c start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_N end_POSTSUPERSCRIPT ( italic_P ) = roman_sup start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_N ( italic_P ) end_POSTSUBSCRIPT ( caligraphic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , italic_n end_POSTSUBSCRIPT ( italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - caligraphic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , italic_n end_POSTSUBSCRIPT ( italic_P )

and let disc¯𝒜𝒩(P)=𝔼P𝒟[disc𝒜𝒩(P)]superscriptsubscript¯𝑑𝑖𝑠𝑐superscript𝒜𝒩𝑃subscript𝔼similar-to𝑃𝒟𝑑𝑖𝑠superscriptsubscript𝑐superscript𝒜𝒩𝑃\overline{disc}_{\mathcal{A}^{\star}}^{\mathcal{N}}(P)=\operatorname*{\mathbb{% E}}_{P\sim\mathcal{D}}[disc_{\mathcal{A}^{\star}}^{\mathcal{N}}(P)]over¯ start_ARG italic_d italic_i italic_s italic_c end_ARG start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_N end_POSTSUPERSCRIPT ( italic_P ) = blackboard_E start_POSTSUBSCRIPT italic_P ∼ caligraphic_D end_POSTSUBSCRIPT [ italic_d italic_i italic_s italic_c start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_N end_POSTSUPERSCRIPT ( italic_P ) ]. Then for any algorithm 𝒜𝒜\mathcal{A}caligraphic_A that is α𝛼\alphaitalic_α-instance optimal with respect to 𝒩𝒩\mathcal{N}caligraphic_N, we can write

𝔼P𝒟[𝒜,n(P)]subscript𝔼similar-to𝑃𝒟subscript𝒜𝑛𝑃\displaystyle\operatorname*{\mathbb{E}}_{P\sim\mathcal{D}}[\mathcal{R}_{% \mathcal{A},n}(P)]blackboard_E start_POSTSUBSCRIPT italic_P ∼ caligraphic_D end_POSTSUBSCRIPT [ caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_P ) ] α𝔼P𝒟[supP𝒩(P)𝒜,n(P)]absent𝛼subscript𝔼similar-to𝑃𝒟subscriptsupremumsuperscript𝑃𝒩𝑃subscriptsuperscript𝒜𝑛superscript𝑃\displaystyle\leq\alpha\cdot\operatorname*{\mathbb{E}}_{P\sim\mathcal{D}}[\sup% _{P^{\prime}\in\mathcal{N}(P)}\mathcal{R}_{\mathcal{A}^{\star},n}(P^{\prime})]≤ italic_α ⋅ blackboard_E start_POSTSUBSCRIPT italic_P ∼ caligraphic_D end_POSTSUBSCRIPT [ roman_sup start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_N ( italic_P ) end_POSTSUBSCRIPT caligraphic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , italic_n end_POSTSUBSCRIPT ( italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ]
=α𝔼P𝒟[𝒜,n(P)+disc𝒜𝒩(P))]\displaystyle=\alpha\cdot\operatorname*{\mathbb{E}}_{P\sim\mathcal{D}}[% \mathcal{R}_{\mathcal{A}^{\star},n}(P)+disc_{\mathcal{A}^{\star}}^{\mathcal{N}% }(P))]= italic_α ⋅ blackboard_E start_POSTSUBSCRIPT italic_P ∼ caligraphic_D end_POSTSUBSCRIPT [ caligraphic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , italic_n end_POSTSUBSCRIPT ( italic_P ) + italic_d italic_i italic_s italic_c start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_N end_POSTSUPERSCRIPT ( italic_P ) ) ]
=α𝔼P𝒟[𝒜,n(P)]+α𝔼P𝒟[disc𝒜𝒩(P))]\displaystyle=\alpha\cdot\operatorname*{\mathbb{E}}_{P\sim\mathcal{D}}[% \mathcal{R}_{\mathcal{A}^{\star},n}(P)]+\alpha\cdot\operatorname*{\mathbb{E}}_% {P\sim\mathcal{D}}[disc_{\mathcal{A}^{\star}}^{\mathcal{N}}(P))]= italic_α ⋅ blackboard_E start_POSTSUBSCRIPT italic_P ∼ caligraphic_D end_POSTSUBSCRIPT [ caligraphic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , italic_n end_POSTSUBSCRIPT ( italic_P ) ] + italic_α ⋅ blackboard_E start_POSTSUBSCRIPT italic_P ∼ caligraphic_D end_POSTSUBSCRIPT [ italic_d italic_i italic_s italic_c start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_N end_POSTSUPERSCRIPT ( italic_P ) ) ]
=α(𝔼P𝒟[𝒜,n(P)]+disc¯𝒜𝒩(P)).absent𝛼subscript𝔼similar-to𝑃𝒟subscriptsuperscript𝒜𝑛𝑃superscriptsubscript¯𝑑𝑖𝑠𝑐superscript𝒜𝒩𝑃\displaystyle=\alpha\cdot\left(\operatorname*{\mathbb{E}}_{P\sim\mathcal{D}}[% \mathcal{R}_{\mathcal{A}^{\star},n}(P)]+\overline{disc}_{\mathcal{A}^{\star}}^% {\mathcal{N}}(P)\right).= italic_α ⋅ ( blackboard_E start_POSTSUBSCRIPT italic_P ∼ caligraphic_D end_POSTSUBSCRIPT [ caligraphic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , italic_n end_POSTSUBSCRIPT ( italic_P ) ] + over¯ start_ARG italic_d italic_i italic_s italic_c end_ARG start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_N end_POSTSUPERSCRIPT ( italic_P ) ) .

In other words, as long as the algorithm 𝒜superscript𝒜\mathcal{A}^{\star}caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT’s performance is relatively constant over 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ) on average over the distribution of interest, the instance optimal algorithm (that is not tailored to 𝒟𝒟\mathcal{D}caligraphic_D) is competitive with 𝒜superscript𝒜\mathcal{A}^{\star}caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT. A similar result holds for a multiplicative definition of disc𝑑𝑖𝑠𝑐discitalic_d italic_i italic_s italic_c.

This discussion can help guide the choice of the neighborhood function that is appropriate for a particular application. In the case of density estimation in the Wasserstein distance, we will define 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ) be a small Dsubscript𝐷D_{\infty}italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ball around P𝑃Pitalic_P. We believe this captures the kind of domain information an algorithm designer may have. E.g. one may have a small amount of public data samples, in which case the posterior over distributions in a Dsubscript𝐷D_{\infty}italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ball will be relatively constant. If the algorithm designer’s custom algorithm needs to do well for all distributions in this set, an instance-optimal algorithm will be competitive with this custom algorithm.

Previous work in instance optimality has largely focused on two notions of neighborhood. In [FLN01, ABC17, VV16, OS15], where the objects of interest are discrete subsets with no a priori structure, it is natural to ask that the algorithm work well for any permutation of the inputs. For example, if the goal is to compute the set of maximal points from a 2-d point set, the algorithm designer would typically want an algorithm that works well for any permutation of the set of input points. In our setting where the points of interest have a metric structure, this is not an appropriate notion. In fact, even for the discrete case studied in Section 5.2.2, permutation invariance cannot capture natural prior beliefs that may arise in practice. For example, for power-law distributions that one often sees in private learning applications [ZKM+20, CB22, CCD+23], a small number of samples are sufficient to get a good estimate of the heavy bins, and rule out a large fraction of permutations of the input space.

A second line of work arising from the statistics literature [CL15] has looked at defining instance-optimality with respect to neighborhoods of size 2222. While this approach has been very successful for many problems, we find it inappropriate for density estimation (outside of density estimation on )\mathbb{R})blackboard_R ) as neighborhoods of size two are too weak to capture the difficulty of problems of interest. Even in the simple case of discrete distributions, this neighborhood is provably insufficient to get instance-optimality results with any o(K)𝑜𝐾o(K)italic_o ( italic_K ) competitive ratio. Indeed, for any two given distributions on [K]delimited-[]𝐾[K][ italic_K ] with TV distance α𝛼\alphaitalic_α, O~(1α2)~𝑂1superscript𝛼2\tilde{O}(\frac{1}{\alpha^{2}})over~ start_ARG italic_O end_ARG ( divide start_ARG 1 end_ARG start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) samples suffice to distinguish them, whereas learning a near uniform distribution on K𝐾Kitalic_K atoms requires Ω(K)Ω𝐾\Omega(K)roman_Ω ( italic_K ) samples. In the private setting, the need to use multiple distributions to prove lower bounds is well-studied. Our approach shares this similarity of using a multi-instance lower bounding argument with packing lower bounds in privacy, and local Fano’s and Le Cam’s methods in statistics. Our work shows that some of the same lower bounding techniques can be used to prove instance-optimality results with respect to natural neighborhood maps, going well beyond the the worst-case results those works prove.

In the special case of density estimation in the Wasserstein distance on \mathbb{R}blackboard_R, instance optimality with respect to neighborhoods of size 2 is achievable. In the standard version of this benchmark metric, 𝒩(P)={P,QP}𝒩𝑃𝑃subscript𝑄𝑃\mathcal{N}(P)=\{P,Q_{P}\}caligraphic_N ( italic_P ) = { italic_P , italic_Q start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT } where QPsubscript𝑄𝑃Q_{P}italic_Q start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT can be any distribution and is chosen to maximise 𝒩,n(P)subscript𝒩𝑛𝑃\mathcal{R}_{\mathcal{N},n}(P)caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n end_POSTSUBSCRIPT ( italic_P ). However, this notion may not be an appropriate notion of instance optimality by itself. To see this, consider a distribution P𝑃Pitalic_P supported on an interval [a,b]𝑎𝑏[a,b][ italic_a , italic_b ]. Moving a small amount of mass from one end of the interval to the other would create an indistinguishable distribution that is far from P𝑃Pitalic_P in Wasserstein distance, and a hypothesis testing argument can be used to show that the target estimation rate defined above (for the hardest one-d sub problem) depends on the interval size ba𝑏𝑎b-aitalic_b - italic_a. This implies that the adaptivity of algorithms to support size of the distribution (crucial in Wasserstein estimation) is not captured by this notion of instance optimality. Instead, we add a further restriction to the definition to make it more appropriate for our setting; we only consider distributions Q𝑄Qitalic_Q that are in a small Dsubscript𝐷D_{\infty}italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ball around P𝑃Pitalic_P (D(P,Q)ln2)D_{\infty}(P,Q)\leq\ln 2)italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_Q ) ≤ roman_ln 2 ), and ask that an algorithm is competitive with an algorithm that is told the additional information that P{P,Q}𝑃𝑃𝑄P\in\{P,Q\}italic_P ∈ { italic_P , italic_Q } (in the worst case over distributions Q𝑄Qitalic_Q that are in this Dsubscript𝐷D_{\infty}italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ball). That is, we define the benchmark estimation rate to be

loc,n(P)=supQ:D(P,Q)ln2min𝒜{𝒜,n(Q),𝒜,n(P)}.subscriptloc𝑛𝑃subscriptsupremum:𝑄subscript𝐷𝑃𝑄2subscript𝒜subscript𝒜𝑛𝑄subscript𝒜𝑛𝑃\mathcal{R}_{{\rm loc},n}(P)=\sup_{Q:D_{\infty}(P,Q)\leq\ln 2}\min_{\mathcal{A% }}\{\mathcal{R}_{\mathcal{A},n}(Q),\mathcal{R}_{\mathcal{A},n}(P)\}.caligraphic_R start_POSTSUBSCRIPT roman_loc , italic_n end_POSTSUBSCRIPT ( italic_P ) = roman_sup start_POSTSUBSCRIPT italic_Q : italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_Q ) ≤ roman_ln 2 end_POSTSUBSCRIPT roman_min start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT { caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_Q ) , caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_P ) } . (3)

Note that all such distributions Q𝑄Qitalic_Q have the same support as P𝑃Pitalic_P, which allows us to capture the adaptivity of algorithms to the support size of the distribution. Specifically, we define the following target estimation rate in the one-dimensional setting. In the case of estimating distributions on a bounded subset of \mathbb{R}blackboard_R, we will show that this error rate is achievable, up to logarithmic factors.

We also note that our notion of instance optimality more naturally captures the accuracy of algorithms even for basic tasks. Note that for the Bernoulli case, our technique achieves a bound of p(1p)n+min{p,1p,1εn}𝑝1𝑝𝑛𝑝1𝑝1𝜀𝑛\frac{\sqrt{p(1-p)}}{\sqrt{n}}+\min\{p,1-p,\frac{1}{\varepsilon n}\}divide start_ARG square-root start_ARG italic_p ( 1 - italic_p ) end_ARG end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG + roman_min { italic_p , 1 - italic_p , divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG } which also appear to be better than the instance-optimal lower bounds in [MSU22], which take the form p(1p)n+1εn𝑝1𝑝𝑛1𝜀𝑛\frac{\sqrt{p(1-p)}}{\sqrt{n}}+\frac{1}{\varepsilon n}divide start_ARG square-root start_ARG italic_p ( 1 - italic_p ) end_ARG end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG + divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG. This apparent contradiction can be explained by the the use in [MSU22] of the hardest-one dimensional sub-problem to define the instance-optimal rate, i.e., 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ) is {P,Q}𝑃𝑄\{P,Q\}{ italic_P , italic_Q } for a worst-case Bernoulli Q𝑄Qitalic_Q. On the other hand, the notion of instance-optimality we use would only consider Bernoullis Q𝑄Qitalic_Q such that D(P,Q)ln2subscript𝐷𝑃𝑄2D_{\infty}(P,Q)\leq\ln 2italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_Q ) ≤ roman_ln 2. When p𝑝pitalic_p is close to 00, the lower bound in [MSU22] would on this instance consider Q𝑄Qitalic_Q to be Bern(p+1εn)𝐵𝑒𝑟𝑛𝑝1𝜀𝑛Bern(p+\frac{1}{\varepsilon n})italic_B italic_e italic_r italic_n ( italic_p + divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG ), which can have a large Dsubscript𝐷D_{\infty}italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-distance from P𝑃Pitalic_P, and so isn’t in the neighborhood used in our notion of instance-optimality. Hence, the target rate one would obtain from our definition is smaller when p𝑝pitalic_p is close to 00. Our algorithm can achieve this improved rate, as it is likely to output 00 as an estimate of p𝑝pitalic_p in this case, pushing small counts down to zero.

Recent differentially private algorithms such as those in [HLY21, DKSS23] have shown instance-optimality for problems such as mean estimation. Relatedly, other works have designed algorithms that adapt to the local/smooth/deletion sensitivity of the underlying function. An instance in these works in a dataset rather than a distribution, and it is not clear how to extend the corresponding notion of neighborhood to our setting. Our neighborhood notion perhaps comes closest to the deletion neighborhoods considered in some of these works.

Finally, we remark that while we have stated our results as being competitive with the worst-case instance in 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ), they apply for the average case over a specific distribution over 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ). Since that specific distribution is adversarial, we don’t view this version as more natural than the worst case.

Given that we are focusing on private estimation, we will use use loc,n,εsubscriptloc𝑛𝜀\mathcal{R}_{{\rm loc},n,\varepsilon}caligraphic_R start_POSTSUBSCRIPT roman_loc , italic_n , italic_ε end_POSTSUBSCRIPT to denote the version of Eqn 3 where the minimum is taken over all ε𝜀\varepsilonitalic_ε-DP mechanisms, and 𝒩,n,εsubscript𝒩𝑛𝜀\mathcal{R}_{\mathcal{N},n,\varepsilon}caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n , italic_ε end_POSTSUBSCRIPT to define the optimal ε𝜀\varepsilonitalic_ε-DP estimation rate, i.e. Eqn 2 where the minimum is taken over all ε𝜀\varepsilonitalic_ε-DP mechanisms.

3.2 Locally Minimal Algorithms

In this section we address the third desiderata of [CL15]. An important concept in statistics is that of efficiency of an estimator, which informally compares the rate of convergence of the estimator with a benchmark that in general is not beatable. This idea has been used to argue that for some fundamental estimation problems, the Maximum Likelihood Estimator (MLE) is the best possible. Hodge showed an example of a superefficient estimator that is asymptotically as good as the MLE everywhere, but beats the MLE on a certain set of inputs. The statistics community has argued in multiple ways that these superefficient estimators do not limit our ability to argue that MLE is “optimal”. We refer the reader to [vdV97, Wol65, Vov09] for a discussion of superefficiency. One of the more compelling arguments here is a result saying that the set of points where superefficiency is achieved has Lebesgue measure zero. This in particular implies that in a small neighborhood around any point, there is a point (in fact many points) where the superefficient estimator does no better than the MLE. In the partial order on estimators, the MLE is thus minimal and this is true even when looking at the performance of the estimator only on a small neighborhood around a given point.

This motivates a slightly different notion capturing the goodness of the algorithm locally.

Definition 3.2.

Let \mathcal{M}caligraphic_M be a class of algorithms. We say that an algorithm 𝒜𝒜\mathcal{A}caligraphic_A is α𝛼\alphaitalic_α-locally minimal with respect to a neighborhood map 𝒩𝒩\mathcal{N}caligraphic_N, if for all instance P𝑃Pitalic_P, and all 𝒜superscript𝒜\mathcal{A}^{\prime}\in\mathcal{M}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_M, there is a Q𝒩(P)𝑄𝒩𝑃Q\in\mathcal{N}(P)italic_Q ∈ caligraphic_N ( italic_P ) such that 𝒜,n(Q)α𝒜,n(Q)subscript𝒜𝑛𝑄𝛼subscriptsuperscript𝒜𝑛𝑄\mathcal{R}_{\mathcal{A},n}(Q)\leq\alpha\cdot\mathcal{R}_{\mathcal{A}^{\prime}% ,n}(Q)caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_Q ) ≤ italic_α ⋅ caligraphic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_n end_POSTSUBSCRIPT ( italic_Q ).

In words, local minimality says that for any other 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, the algorithm 𝒜𝒜\mathcal{A}caligraphic_A is competitive with 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT for some instance in the neighborhood of P𝑃Pitalic_P. Put differently, no 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT can be uniformly much better than 𝒜𝒜\mathcal{A}caligraphic_A on the neighborhood, even one that knows P𝑃Pitalic_P.

We show that in general, this notion is incomparable to our notion of instance optimality. Nevertheless, under reasonable assumptions, the two notions are closely related.

Example 3.3 (Local Minimality ⇏⇏\not\Rightarrow⇏ Instance Optimality).

Consider a pair of instances {P,Q}𝑃𝑄\{P,Q\}{ italic_P , italic_Q } with 𝒩(P)=𝒩(Q)={P,Q}𝒩𝑃𝒩𝑄𝑃𝑄\mathcal{N}(P)=\mathcal{N}(Q)=\{P,Q\}caligraphic_N ( italic_P ) = caligraphic_N ( italic_Q ) = { italic_P , italic_Q }. Let \mathcal{M}caligraphic_M contain two algorithms 𝒜𝒜\mathcal{A}caligraphic_A, and 𝒜superscript𝒜\mathcal{A}^{\star}caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT with

𝒜,n(P)=1;subscript𝒜𝑛𝑃1\displaystyle\mathcal{R}_{\mathcal{A},n}(P)=1;\;\;\;\;\;\;caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_P ) = 1 ; 𝒜,n(Q)=0;subscript𝒜𝑛𝑄0\displaystyle\mathcal{R}_{\mathcal{A},n}(Q)=0;caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_Q ) = 0 ;
𝒜,n(P)=0;subscriptsuperscript𝒜𝑛𝑃0\displaystyle\mathcal{R}_{\mathcal{A}^{\star},n}(P)=0;\;\;\;\;\;\;caligraphic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , italic_n end_POSTSUBSCRIPT ( italic_P ) = 0 ; 𝒜,n(Q)=0;subscriptsuperscript𝒜𝑛𝑄0\displaystyle\mathcal{R}_{\mathcal{A}^{\star},n}(Q)=0;caligraphic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , italic_n end_POSTSUBSCRIPT ( italic_Q ) = 0 ;

Then one can verify that 𝒜𝒜\mathcal{A}caligraphic_A is (1111-)locally minimal in \mathcal{M}caligraphic_M. However, it is not α𝛼\alphaitalic_α-instance optimal for any finite α𝛼\alphaitalic_α as it fails to satisfy the definition at P𝑃Pitalic_P.

Example 3.4 (Instance Optimality ⇏⇏\not\Rightarrow⇏ Local Minimality).

Consider a set of instances {P1,P2,P3}subscript𝑃1subscript𝑃2subscript𝑃3\{P_{1},P_{2},P_{3}\}{ italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT } with 𝒩(P1)={P1,P2},𝒩(P2)={P1,P2,P3},𝒩(P3)={P2,P3}formulae-sequence𝒩subscript𝑃1subscript𝑃1subscript𝑃2formulae-sequence𝒩subscript𝑃2subscript𝑃1subscript𝑃2subscript𝑃3𝒩subscript𝑃3subscript𝑃2subscript𝑃3\mathcal{N}(P_{1})=\{P_{1},P_{2}\},\mathcal{N}(P_{2})=\{P_{1},P_{2},P_{3}\},% \mathcal{N}(P_{3})=\{P_{2},P_{3}\}caligraphic_N ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = { italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } , caligraphic_N ( italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = { italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT } , caligraphic_N ( italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) = { italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT }. Let \mathcal{M}caligraphic_M contain algorithms 𝒜,𝒜𝒜superscript𝒜\mathcal{A},\mathcal{A}^{\star}caligraphic_A , caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT with

𝒜,n(P1)=1;subscriptsuperscript𝒜𝑛subscript𝑃11\displaystyle\mathcal{R}_{\mathcal{A}^{\star},n}(P_{1})=1;\;\;\;\;\;\;caligraphic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , italic_n end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = 1 ; 𝒜,n(P1)=2α;subscript𝒜𝑛subscript𝑃12𝛼\displaystyle\mathcal{R}_{\mathcal{A},n}(P_{1})=2\alpha;caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = 2 italic_α ;
𝒜,n(P2)=2α;subscriptsuperscript𝒜𝑛subscript𝑃22𝛼\displaystyle\mathcal{R}_{\mathcal{A}^{\star},n}(P_{2})=2\alpha;\;\;\;\;\;\;caligraphic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , italic_n end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = 2 italic_α ; 𝒜,n(P2)=4α2;subscript𝒜𝑛subscript𝑃24superscript𝛼2\displaystyle\mathcal{R}_{\mathcal{A},n}(P_{2})=4\alpha^{2};caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = 4 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ;
𝒜,n(P3)=4α2;subscriptsuperscript𝒜𝑛subscript𝑃34superscript𝛼2\displaystyle\mathcal{R}_{\mathcal{A}^{\star},n}(P_{3})=4\alpha^{2};\;\;\;\;\;\;caligraphic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , italic_n end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) = 4 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ; 𝒜,n(P3)=4α2.subscript𝒜𝑛subscript𝑃34superscript𝛼2\displaystyle\mathcal{R}_{\mathcal{A},n}(P_{3})=4\alpha^{2}.caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) = 4 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Then one can verify that 𝒜𝒜\mathcal{A}caligraphic_A is (1111-)instance optimal in \mathcal{M}caligraphic_M. However, it is not α𝛼\alphaitalic_α-locally minimal at P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

Under smoothness assumptions on 𝒜𝒜\mathcal{A}caligraphic_A with respect to 𝒩𝒩\mathcal{N}caligraphic_N, one can argue that the two notions are essentially equivalent.

Proposition 3.5.

Let 𝒜𝒜\mathcal{A}caligraphic_A be such that for all instances P𝑃Pitalic_P and for all Q𝒩(P)𝑄𝒩𝑃Q\in\mathcal{N}(P)italic_Q ∈ caligraphic_N ( italic_P ), 𝒜,n(Q)β𝒜,n(P)subscript𝒜𝑛𝑄𝛽subscript𝒜𝑛𝑃\mathcal{R}_{\mathcal{A},n}(Q)\leq\beta\cdot\mathcal{R}_{\mathcal{A},n}(P)caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_Q ) ≤ italic_β ⋅ caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_P ). Further, suppose that 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ) is compact for any P𝑃Pitalic_P. If 𝒜𝒜\mathcal{A}caligraphic_A is α𝛼\alphaitalic_α-instance optimal in \mathcal{M}caligraphic_M with respect to 𝒩𝒩\mathcal{N}caligraphic_N, then it is αβ𝛼𝛽\alpha\betaitalic_α italic_β-locally minimal.

Proof.

Let P𝑃Pitalic_P be an instance and let 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be a competing algorithm. By definition of α𝛼\alphaitalic_α-instance optimality,

𝒜,n(P)subscript𝒜𝑛𝑃\displaystyle\mathcal{R}_{\mathcal{A},n}(P)caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_P ) αsupQ𝒩(P)𝒜,n(Q).absent𝛼subscriptsupremum𝑄𝒩𝑃subscriptsuperscript𝒜𝑛𝑄\displaystyle\leq\alpha\cdot\sup_{Q\in\mathcal{N}(P)}\mathcal{R}_{\mathcal{A}^% {\prime},n}(Q).≤ italic_α ⋅ roman_sup start_POSTSUBSCRIPT italic_Q ∈ caligraphic_N ( italic_P ) end_POSTSUBSCRIPT caligraphic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_n end_POSTSUBSCRIPT ( italic_Q ) .

By compactness, this implies that there is a Q𝑄Qitalic_Q achieving the supremum. In other words, there exists Q𝒩(P)superscript𝑄𝒩𝑃Q^{\star}\in\mathcal{N}(P)italic_Q start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∈ caligraphic_N ( italic_P ) such that

𝒜,n(P)subscript𝒜𝑛𝑃\displaystyle\mathcal{R}_{\mathcal{A},n}(P)caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_P ) α𝒜,n(Q).absent𝛼subscriptsuperscript𝒜𝑛superscript𝑄\displaystyle\leq\alpha\cdot\mathcal{R}_{\mathcal{A}^{\prime},n}(Q^{\star}).≤ italic_α ⋅ caligraphic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_n end_POSTSUBSCRIPT ( italic_Q start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) .

Since Q𝒩(P)superscript𝑄𝒩𝑃Q^{\star}\in\mathcal{N}(P)italic_Q start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∈ caligraphic_N ( italic_P ), our smoothness assumption implies that

𝒜,n(Q)subscript𝒜𝑛superscript𝑄\displaystyle\mathcal{R}_{\mathcal{A},n}(Q^{\star})caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_Q start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) β𝒜,n(P).absent𝛽subscript𝒜𝑛𝑃\displaystyle\leq\beta\cdot\mathcal{R}_{\mathcal{A},n}(P).≤ italic_β ⋅ caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_P ) .

Combining the last two inequalities, this Qsuperscript𝑄Q^{\star}italic_Q start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT satisfies

𝒜,n(Q)subscript𝒜𝑛superscript𝑄\displaystyle\mathcal{R}_{\mathcal{A},n}(Q^{\star})caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_Q start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) αβ𝒜,n(Q).absent𝛼𝛽subscriptsuperscript𝒜𝑛superscript𝑄\displaystyle\leq\alpha\beta\cdot\mathcal{R}_{\mathcal{A}^{\prime},n}(Q^{\star% }).≤ italic_α italic_β ⋅ caligraphic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_n end_POSTSUBSCRIPT ( italic_Q start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) .

Since P𝑃Pitalic_P and 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT were arbitrary, this implies that 𝒜𝒜\mathcal{A}caligraphic_A is αβ𝛼𝛽\alpha\betaitalic_α italic_β-locally minimal. ∎

Proposition 3.6.

Let 𝒜𝒜\mathcal{A}caligraphic_A be such that for all instances P𝑃Pitalic_P and for all Q𝒩(P)𝑄𝒩𝑃Q\in\mathcal{N}(P)italic_Q ∈ caligraphic_N ( italic_P ), 𝒜,n(Q)β1𝒜,n(P)subscript𝒜𝑛𝑄superscript𝛽1subscript𝒜𝑛𝑃\mathcal{R}_{\mathcal{A},n}(Q)\geq\beta^{-1}\cdot\mathcal{R}_{\mathcal{A},n}(P)caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_Q ) ≥ italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_P ). If 𝒜𝒜\mathcal{A}caligraphic_A is α𝛼\alphaitalic_α-locally minimal in \mathcal{M}caligraphic_M with respect to 𝒩𝒩\mathcal{N}caligraphic_N, then it is αβ𝛼𝛽\alpha\betaitalic_α italic_β-instance optimal.

Proof.

Let P𝑃Pitalic_P be an instance and let 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be a competing algorithm. By definition of α𝛼\alphaitalic_α-local minimality, there is a Q𝒩(P)superscript𝑄𝒩𝑃Q^{\star}\in\mathcal{N}(P)italic_Q start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∈ caligraphic_N ( italic_P ) such that

𝒜,n(Q)subscript𝒜𝑛superscript𝑄\displaystyle\mathcal{R}_{\mathcal{A},n}(Q^{\star})caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_Q start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) α𝒜,n(Q).absent𝛼subscriptsuperscript𝒜𝑛superscript𝑄\displaystyle\leq\alpha\cdot\mathcal{R}_{\mathcal{A}^{\prime},n}(Q^{\star}).≤ italic_α ⋅ caligraphic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_n end_POSTSUBSCRIPT ( italic_Q start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) .

Since Q𝒩(P)superscript𝑄𝒩𝑃Q^{\star}\in\mathcal{N}(P)italic_Q start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∈ caligraphic_N ( italic_P ), our smoothness assumption implies that

𝒜,n(P)subscript𝒜𝑛𝑃\displaystyle\mathcal{R}_{\mathcal{A},n}(P)caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_P ) β𝒜,n(Q).absent𝛽subscript𝒜𝑛superscript𝑄\displaystyle\leq\beta\cdot\mathcal{R}_{\mathcal{A},n}(Q^{\star}).≤ italic_β ⋅ caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_Q start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) .

Combining the last two inequalities, this Qsuperscript𝑄Q^{\star}italic_Q start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT satisfies

𝒜,n(P)subscript𝒜𝑛𝑃\displaystyle\mathcal{R}_{\mathcal{A},n}(P)caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_P ) αβ𝒜,n(Q)absent𝛼𝛽subscriptsuperscript𝒜𝑛superscript𝑄\displaystyle\leq\alpha\beta\cdot\mathcal{R}_{\mathcal{A}^{\prime},n}(Q^{\star})≤ italic_α italic_β ⋅ caligraphic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_n end_POSTSUBSCRIPT ( italic_Q start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT )
αβsupQ𝒩(P)cost(𝒜(Q),Q).absent𝛼𝛽subscriptsupremum𝑄𝒩𝑃𝑐𝑜𝑠𝑡superscript𝒜𝑄𝑄\displaystyle\leq\alpha\beta\cdot\sup_{Q\in\mathcal{N}(P)}cost(\mathcal{A}^{% \prime}(Q),Q).≤ italic_α italic_β ⋅ roman_sup start_POSTSUBSCRIPT italic_Q ∈ caligraphic_N ( italic_P ) end_POSTSUBSCRIPT italic_c italic_o italic_s italic_t ( caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Q ) , italic_Q ) .

Since P𝑃Pitalic_P and 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT were arbitrary, this implies that 𝒜𝒜\mathcal{A}caligraphic_A is αβ𝛼𝛽\alpha\betaitalic_α italic_β-instance optimal. ∎

A similar pair of results hold when the comparator algorithm 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is smooth with respect to the neighborhood map.

3.3 Relaxed Definitions

We finish by noting relaxations of the above definitions that share the same semantic meaning (our algorithms will achieve these relaxed notions).

Definition 3.7.

Given a function 𝒩:𝒫𝔓(𝒫):𝒩𝒫𝔓𝒫\mathcal{N}:\mathcal{P}\to\mathfrak{P}(\mathcal{P})caligraphic_N : caligraphic_P → fraktur_P ( caligraphic_P ), where 𝔓(𝒫)𝔓𝒫\mathfrak{P}(\mathcal{P})fraktur_P ( caligraphic_P ) is the power set of 𝒫𝒫\mathcal{P}caligraphic_P, we define the optimal estimation rate with respect to 𝒩𝒩\mathcal{N}caligraphic_N to be:

𝒩,n(P)=min𝒜supQ𝒩(P)𝒜,n(Q).subscript𝒩𝑛𝑃subscript𝒜subscriptsupremum𝑄𝒩𝑃subscript𝒜𝑛𝑄\mathcal{R}_{\mathcal{N},n}(P)=\min_{\mathcal{A}}\sup_{Q\in\mathcal{N}(P)}% \mathcal{R}_{\mathcal{A},n}(Q).caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n end_POSTSUBSCRIPT ( italic_P ) = roman_min start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT italic_Q ∈ caligraphic_N ( italic_P ) end_POSTSUBSCRIPT caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_Q ) . (4)

An algorithm 𝒜𝒜\mathcal{A}caligraphic_A is (α,β,γ)𝛼𝛽𝛾(\alpha,\beta,\gamma)( italic_α , italic_β , italic_γ )-instance optimal with respect to 𝒩𝒩\mathcal{N}caligraphic_N if for all P𝒫𝑃𝒫P\in\mathcal{P}italic_P ∈ caligraphic_P,

𝒜,n(P)α𝒩,βn(P)+γsubscript𝒜𝑛𝑃𝛼subscript𝒩𝛽𝑛𝑃𝛾\mathcal{R}_{\mathcal{A},n}(P)\leq\alpha\mathcal{R}_{\mathcal{N},\beta n}(P)+\gammacaligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_P ) ≤ italic_α caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_β italic_n end_POSTSUBSCRIPT ( italic_P ) + italic_γ
Definition 3.8.

Let \mathcal{M}caligraphic_M be a class of algorithms. We say that an algorithm 𝒜𝒜\mathcal{A}caligraphic_A is (α,β,γ)𝛼𝛽𝛾(\alpha,\beta,\gamma)( italic_α , italic_β , italic_γ )-locally minimal with respect to a neighborhood map 𝒩𝒩\mathcal{N}caligraphic_N, if for all instance P𝑃Pitalic_P, and all 𝒜superscript𝒜\mathcal{A}^{\prime}\in\mathcal{M}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_M, there is a Q𝒩(P)𝑄𝒩𝑃Q\in\mathcal{N}(P)italic_Q ∈ caligraphic_N ( italic_P ) such that 𝒜,n(Q)α𝒜,βn(Q)+γsubscript𝒜𝑛𝑄𝛼subscriptsuperscript𝒜𝛽𝑛𝑄𝛾\mathcal{R}_{\mathcal{A},n}(Q)\leq\alpha\cdot\mathcal{R}_{\mathcal{A}^{\prime}% ,\beta n}(Q)+\gammacaligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_Q ) ≤ italic_α ⋅ caligraphic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_β italic_n end_POSTSUBSCRIPT ( italic_Q ) + italic_γ.

Note that we think of β(0,1]𝛽01\beta\in(0,1]italic_β ∈ ( 0 , 1 ] and γ𝛾\gammaitalic_γ as non-negative. The reason these are relaxed definitions is because we allow for an additive approximation factor in addition to a multiplicative factor, and also compare to a benchmark rate that depends on a potentially smaller number of samples (and is hence easier to achieve). The original definition of instance optimality (Definition 3.1) can be obtained by setting β=1𝛽1\beta=1italic_β = 1 and γ=0𝛾0\gamma=0italic_γ = 0.

In our work, for most settings of interest, we roughly achieve β=1/(logn)O(1)𝛽1superscript𝑛𝑂1\beta=1/(\log n)^{O(1)}italic_β = 1 / ( roman_log italic_n ) start_POSTSUPERSCRIPT italic_O ( 1 ) end_POSTSUPERSCRIPT and γ𝛾\gammaitalic_γ to be an arbitrarily small polynomial in the inverse of the number of samples 1/n1𝑛1/n1 / italic_n at a log(1/γ)1𝛾\log(1/\gamma)roman_log ( start_ARG 1 / italic_γ end_ARG ) cost to the multiplicative factor. We don’t view this as a significant issue since we expect the benchmark rate with O~(n/logn)~𝑂𝑛𝑛\tilde{O}(n/\log n)over~ start_ARG italic_O end_ARG ( italic_n / roman_log italic_n ) samples to behave asymptotically similarly to that with n𝑛nitalic_n samples in most cases. We leave it as an open question as to whether the original definition of instance optimality can be achieved.

4 Additional Related Work

Instance Optimality for Differentially Private Statistics:

Several recent works have focused on formulating and giving ‘instance optimal’ differentially private algorithms for various statistical tasks. The work of McMillan, Smith and Ullman [MSU22] is most directly related to our work; they gave locally minimax optimal algorithms for parameter estimation for one-dimensional exponential families in the central model of differential privacy. The work of Duchi and Ruan [DR18] also gives locally-minimax optimal algorithms for various one-dimensional parameter estimation problems under the stronger constraint of local differential privacy. The notion of local minimax optimality both these papers use is based on the hardest one-dimensional sub-problem described in Section 3.1. While our results for density estimation in 1superscript1\mathbb{R}^{1}blackboard_R start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT satisfy this notion, they also satisfy a stronger notion described in Section 3.1. Additionally, as discussed in [MSU22], this definition is provably unsuitable for higher dimensions; we instead suggest a looser definition of instance optimality that is more promising in higher dimensions. More importantly, our paper is primarily focused on the non-parametric setting, and hence our techniques are different than the ones used in those papers, which focused primarily on parameter estimation.

Other Beyond Worse-Case Results in Central Differential Privacy:

Several additional works in the differential privacy literature study algorithms with accuracy that varies with the input dataset. Nearly all of them look at the empirical setting where we are concerned with the specific input dataset, rather than a distribution it may be drawn from. While initial algorithms in differential privacy added noise based on a worse case notion of global sensitivity, these works give various algorithmic frameworks that help develop algorithms with guarantees that adapt to the hardness of the input dataset. These include algorithms based on smooth sensitivity [NRS07, BS19], the propose-test-release framework [DL09, BA20], Lipschitz extensions [BBDS13, KNRS13, CZ13, RS16], and sensitivity pre-processing [CD20]. However, none of these works study a formal notion of instance optimality.

In contrast, some more recent work do study definitions of instance optimality in the empirical setting. A work of Asi and Duchi [AD20] studies two notions of instance optimality: one by comparing the performance of an algorithm on a dataset against the performance of the best unbiased algorithm on that dataset, and another based on an analogue of the ‘hardest one-dimensional sub-problem’ for the empirical setting (they compare the performance of an algorithm on a dataset with all benchmark algorithms that know that the input dataset is either of two possible datasets but whose performance is evaluated as the worse over the two datasets). They give a general mechanism known as the inverse sensitivity mechanism that they show is nearly instance optimal under these definitions for various problems such as median and mean estimation. Our work is focused on population quantities as opposed to empirical quantities—while these are related, they can be very different. For example, as pointed out in McMillan, Smith and Ullman [MSU22], using the inverse sensitivity mechanism in [AD20] to estimate the mean of a Gaussian (by using a locally minimax optimal algorithm for empirical mean) will result in infinite mean squared error, whereas other approaches that reason directly about the population quantities can get much better error.

In [DKSS23] and [HLY21], different notions of instance optimality are defined. Roughly, they compare the performance of an algorithm on a dataset with a benchmark algorithm that knows the input dataset but whose performance is evaluated as the worst-case performance over large subsets of the input dataset. While the details of the definitions in these papers vary slightly, both papers give instance-optimal algorithms for mean estimation under their respective definitions. For one-dimensional distributions, our algorithmic technique at a high level shares ideas with these algorithms—the algorithms in their papers try to adapt to the range of values in the dataset, whereas we try to adapt to the level of concentration of the distribution. However, the details of how this is done and the associated analyses vary. Our algorithm for general metric spaces uses different techniques. Our work differs from these works in a few other prominent ways: firstly, they are primarily concerned with estimating functionals of the underlying dataset, whereas we are concerned with density estimation in Wasserstein distance—these are problems with different output types and different error metrics. Finally, it is not clear if notions such as subset-based instance optimality that are well defined in the empirical setting transfer meaningfully to the distributional setting.

Instance-Optimal Statistical Estimation without Privacy Constraints:

Donoho and Liu [DL91] formulated the notion of the ‘hardest one-dimensional sub-problem’ as a way of capturing instance optimality for statistical estimation and gave non-private instance optimal algorithms for some one-dimensional parameter estimation problems. Cai and Low [CL15] formulated an instance-optimality type definition for non-parameteric estimation problems. Our results for Wasserstein density estimation over \mathbb{R}blackboard_R use a stronger version of this notion of instance optimality. In higher dimensions, this notion is provably unachievable, and so we define a different notion.

The other line of work most related to ours is on instance-optimal learning of discrete distributions [OS15, VV16, HO19]. In their setting, instance optimality is defined by comparing the performance of an algorithm on a discrete distribution P𝑃Pitalic_P to the minimax error of any algorithm on the class of discrete distributions with probability vectors that are permutations of the probability vector of P𝑃Pitalic_P. We note that this notion is not well suited to many metric spaces, because permutations may not preserve properties such as concentration of the distribution, and hence this notion of instance optimality may provide an overly pessimistic view of the performance of an algorithm. Our notion of instance optimality (in terms of Dsubscript𝐷D_{\infty}italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT neighborhood) compares against algorithms with a different type of prior knowledge- i.e., the location of where the distribution concentrates, and approximate values of the probabilities at each point. We note that these are technically incomparable, and may be useful in different settings. For estimation in Wasserstein distance, knowledge of where the distribution is concentrated could be very useful in algorithm design, and so comparing to algorithms with this type of knowledge is more appropriate. See Section 3 for more discussion.

Finally, there is another line of work on getting similar instance optimal guarantees for other statistical problems [ADJ+11, ADJ+12, AJOS13b, AJOS13a]. For the closeness testing problem (given two sequences, determine if they are produced by the same distribution, or different distributions), Acharya, Das, Jafarpour, Orlitsky, Pan and Suresh [ADJ+11, ADJ+12] developed a test (without any knowledge about the generating distributions) that achieves the same error with O(n3/2)𝑂superscript𝑛32O(n^{3/2})italic_O ( italic_n start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT ) samples that an optimal label-invariant test that knows the distributions p𝑝pitalic_p and q𝑞qitalic_q would achieve with n𝑛nitalic_n samples.

Other work on Differentially Private Statistics:

There is a lot of other work on private statistical estimation, and we survey the most relevant parts of the literature here. There is a long line of work on minimax parameter/distribution estimation on various parametric distribution families: product distributions [BUV18, KLSU19, ASZ20, CWZ19, Sin23], Gaussian, sub-Gaussian distributions (and more generally exponential families) [KV18, KLSU19, AAK21, BGS+21, KMS+22b, KMS22a, HKM22, KMV22, AL22, LKO22, TCK+22, HKMN23, AKT+23, BHS23, KDH23], mixtures of Gaussian distributions [KSSU19, AAL23b, AAL23a], heavy-tailed distributions [KSU20, Nar23], discrete distributions with finite support [DHS15, ASZ20], distributions with finite covers [BKSW21] and more. This line of work focuses on minimax guarantees in the parametric setting, i.e. optimizing the worst-case error of an algorithm over the entire class of distributions. Our work, on the other hand works in the non-parametric setting where we do not make assumptions about the distribution the dataset is drawn from, but instead give ‘instance-optimal’ algorithms that adapt to the hardness of the distribution the input dataset is drawn from.

There is also a line of work on differentially private CDF estimation [DNPR10, CSS11, BNS16, BNSV15, ALMM19, KLM+20, CLN+23], and quantile estimation [KSS22, GJK21, ASSU24]. Our algorithm for density estimation over \mathbb{R}blackboard_R uses a quantile estimation algorithm (based on a CDF estimator) as a subroutine. Finally, there is a line of work on differentially private testing [ASZ17, CDK17, CKM+19], and the work characterizing the sample complexity of simple hypothesis tests forms an important part of our analysis of the instance-optimal rate for distributions over \mathbb{R}blackboard_R.

Work on Estimation in Wasserstein Distance:

In addition to the recent works [BSV22, HVZ23] on private Wasserstein learning on [0,1]dsuperscript01𝑑[0,1]^{d}[ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, there is a plethora of works studying it in the non-private setting.

One line of work studies the convergence in Wasserstein distance of the empirical measure (on n𝑛nitalic_n samples) to the true measure, as a function of the measure and the number of samples n𝑛nitalic_n [Dud69, DY95, CR12, DSS11, BG14, FG15, BL19, WB19, Lei20, Fou23]. Some of the later works above can be viewed as studying this problem from a beyond worst-case analysis viewpoint. They give upper and lower bounds for the expected value of this quantity, in terms of various notions of ‘dimension’ of the underlying measure, such as the covering number of the support of the distribution, the upper and lower ‘Wasserstein dimensions’ of the measure, and others. Our work shows that the empirical measure, appropriately massaged, is approximately instance-optimal for density estimation without privacy constraints (for the notions of instance optimality we consider), and hence these works give us a handle on the instance-optimal rate as a function of the distribution and sample size n𝑛nitalic_n. Some more recent work studies minimax estimation in Wasserstein distance [SP19, NWB19], and show that without additional assumptions on the distribution, the empirical measure is minimax optimal. Our work extends this result to show that in the general non-parametric setting, the empirical measure is also approximately instance-optimal; to the best of our knowledge, instance optimal estimation in Wasserstein distance (even without privacy constraints) has not been previously studied.

5 Distribution Estimation on Hierarchically Separated Trees

Let us now turn to distribution estimation on arbitrary finite metric spaces. We will use the fact that any metric on a finite space can be embedding in a hierarchically separated tree (HST) metric to reduce the problem of density estimation in Wasserstein distance on an arbitrary metric space to density estimation in Wasserstein distance on an HST. In Section 5.2 we’ll characterise the target estimation rate 𝒩,nsubscript𝒩𝑛\mathcal{R}_{\mathcal{N},n}caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n end_POSTSUBSCRIPT. In Section 5.3, we’ll then provide an ε𝜀\varepsilonitalic_ε-DP algorithm and prove that it achieves this target estimation rate up to logarithmic factors.

5.1 Preliminaries on Hierarchically Separated Trees

A key component of our proof strategy is the reduction to Hierarchically Separated Trees (HSTs). HSTs are special class of tree metrics that are able to embed arbitrary metric spaces with low distortion. They are particularly well-behaved when working with the Wasserstein distance since the Wasserstein distance on an HST has a simple closed form.

Definition 5.1 (Hierarchically Separated Tree).

A hierarchically separated tree (HST) is a rooted weighted tree such that the edges between level \ellroman_ℓ and 11\ell-1roman_ℓ - 1 all have the same weight (denoted rsubscript𝑟r_{\ell}italic_r start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT) and the weights are geometrically decreasing so r+1=(1/2)rsubscript𝑟112subscript𝑟r_{\ell+1}=(1/2)r_{\ell}italic_r start_POSTSUBSCRIPT roman_ℓ + 1 end_POSTSUBSCRIPT = ( 1 / 2 ) italic_r start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT. Let 0pt0𝑝𝑡0pt0 italic_p italic_t be the depth of the tree.

An HST defines a metric on its leaf nodes by defining the distance between any two leaf nodes to be the weight of the minimum weight path between the two nodes. We will rely on two main facts about HSTs in this work.

Lemma 5.2 (Low distortion metric embeddings [FRT03]).

Let (V,d)𝑉𝑑(V,d)( italic_V , italic_d ) be a metric space with M𝑀Mitalic_M points. There exists a randomized, polynomial time algorithm that produces an HST where the leaf nodes of the tree correspond to the elements of the metric space and the induced tree metric dTsubscript𝑑𝑇d_{T}italic_d start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT is such that for all u,vV𝑢𝑣𝑉u,v\in Vitalic_u , italic_v ∈ italic_V

  • d(u,v)dT(u,v)𝑑𝑢𝑣subscript𝑑𝑇𝑢𝑣d(u,v)\leq d_{T}(u,v)italic_d ( italic_u , italic_v ) ≤ italic_d start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_u , italic_v )

  • 𝔼[dT(u,v)]O(logM)d(u,v)𝔼delimited-[]subscript𝑑𝑇𝑢𝑣𝑂𝑀𝑑𝑢𝑣\mathbb{E}[d_{T}(u,v)]\leq O(\log M)\cdot d(u,v)blackboard_E [ italic_d start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_u , italic_v ) ] ≤ italic_O ( roman_log italic_M ) ⋅ italic_d ( italic_u , italic_v )

The depth of the HST is logarithmic in the size of the metric space, 0pt=logM0𝑝𝑡𝑀0pt=\log M0 italic_p italic_t = roman_log italic_M.

An immediate consequence of the O(logM)𝑂𝑀O(\log M)italic_O ( roman_log italic_M ) metric distortion in Lemma 5.2 is that the Wasserstein distance in the original metric space is also preserved up to a O(logM)𝑂𝑀O(\log M)italic_O ( roman_log italic_M ) factor in expectation. Thus, Lemma 5.2 allows us to translate the problem of learning densities on an arbitrary metric space in Wasserstein distance to learning densities in Wasserstein distance on an HST. This is a useful tool since HST metrics are generally easier to work with and, as we’ll see below, the Wasserstein distance is particularly well-behaved on an HST. In order to use Lemma 5.2 to translate the problem of density estimation on a bounded ball in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT into density estimation on an HST, one discretizes the metric, paying a small additive term.

Corollary 5.3.

Given α>0𝛼0\alpha>0italic_α > 0, there is a probabilistic embedding f𝑓fitalic_f of [0,1]dsuperscript01𝑑[0,1]^{d}[ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT into an HST such that for all x,y[0,1]d𝑥𝑦superscript01𝑑x,y\in[0,1]^{d}italic_x , italic_y ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT:

  • d(x,y)αdT(f(x),f(y))𝑑𝑥𝑦𝛼subscript𝑑𝑇𝑓𝑥𝑓𝑦d(x,y)-\alpha\leq d_{T}(f(x),f(y))italic_d ( italic_x , italic_y ) - italic_α ≤ italic_d start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_f ( italic_x ) , italic_f ( italic_y ) )

  • 𝔼[dT(f(x),f(y)]O(dlog1α])(d(x,y)+α)\mathbb{E}[d_{T}(f(x),f(y)]\leq O(d\cdot\log\frac{1}{\alpha}])\cdot(d(x,y)+\alpha)blackboard_E [ italic_d start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_f ( italic_x ) , italic_f ( italic_y ) ] ≤ italic_O ( italic_d ⋅ roman_log divide start_ARG 1 end_ARG start_ARG italic_α end_ARG ] ) ⋅ ( italic_d ( italic_x , italic_y ) + italic_α )

The distortion is logarithmic in 1α1𝛼\frac{1}{\alpha}divide start_ARG 1 end_ARG start_ARG italic_α end_ARG, so taking α𝛼\alphaitalic_α to be polynomially small, one gets the distortion to be O(dlogn)𝑂𝑑𝑛O(d\log n)italic_O ( italic_d roman_log italic_n ). It is easy to see that this implies that the Wasserstein distance is preserved in both directions up to O(dlog1αO(d\log\frac{1}{\alpha}italic_O ( italic_d roman_log divide start_ARG 1 end_ARG start_ARG italic_α end_ARG, up to an α𝛼\alphaitalic_α additive error.

A distribution P𝑃Pitalic_P on the the underlying metric space in an HST induces a function 𝔊Psubscript𝔊𝑃\mathfrak{G}_{P}fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT on the nodes of the tree where the value of a node ν𝜈\nuitalic_ν is given by the weight in P𝑃Pitalic_P of the leaf nodes in the subtree rooted at ν𝜈\nuitalic_ν. For every level [0pt]delimited-[]0𝑝𝑡\ell\in[0pt]roman_ℓ ∈ [ 0 italic_p italic_t ] of the tree, let Psubscript𝑃P_{\ell}italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT be the distribution induced on the nodes at level \ellroman_ℓ where the probability of node ν𝜈\nuitalic_ν is 𝔊P(ν)subscript𝔊𝑃𝜈\mathfrak{G}_{P}(\nu)fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ). Thus Psubscript𝑃P_{\ell}italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT is a discrete distribution on a domain of size Nsubscript𝑁N_{\ell}italic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, where Nsubscript𝑁N_{\ell}italic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT is the number of nodes in level \ellroman_ℓ of the tree.

Lemma 5.4 (Closed form Wasserstein distance formula).

Given two distributions P𝑃Pitalic_P and Q𝑄Qitalic_Q defined on an HST metric space, the Wasserstein distance between P𝑃Pitalic_P and Q𝑄Qitalic_Q has the closed formula:

𝒲(P,Q)=12νrν|𝔊P(ν)𝔊Q(ν)|=rTV(P,Q),𝒲𝑃𝑄12subscript𝜈subscript𝑟𝜈subscript𝔊𝑃𝜈subscript𝔊𝑄𝜈subscriptsubscript𝑟TVsubscript𝑃subscript𝑄\mathcal{W}(P,Q)=\frac{1}{2}\sum_{\nu}r_{\nu}|\mathfrak{G}_{P}(\nu)-\mathfrak{% G}_{Q}(\nu)|=\sum_{\ell}r_{\ell}\text{\rm TV}(P_{\ell},Q_{\ell}),caligraphic_W ( italic_P , italic_Q ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT | fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) - fraktur_G start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_ν ) | = ∑ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT TV ( italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , italic_Q start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) ,

where rνsubscript𝑟𝜈r_{\nu}italic_r start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT is the length of the edge connecting ν𝜈\nuitalic_ν to its parent, and the sum is over all nodes in the tree.

5.2 The Target Estimation Rate

Recall the definition of our neighbourhood.

𝒩(P)={Q𝒫|D(P,Q)ln2}𝒩𝑃conditional-set𝑄𝒫subscript𝐷𝑃𝑄2\mathcal{N}(P)=\{Q\in\mathcal{P}\;|\;D_{\infty}(P,Q)\leq\ln 2\}caligraphic_N ( italic_P ) = { italic_Q ∈ caligraphic_P | italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_Q ) ≤ roman_ln 2 }

We will call a node ν𝜈\nuitalic_ν, α𝛼\alphaitalic_α-active node under the distribution P𝑃Pitalic_P if the weight in P𝑃Pitalic_P of the sub-tree rooted at ν𝜈\nuitalic_ν is greater than α𝛼\alphaitalic_α. Let γP(α)subscript𝛾𝑃𝛼\gamma_{{P}}\left({\alpha}\right)italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_α ) be the set of α𝛼\alphaitalic_α-active nodes under P𝑃Pitalic_P and γP(α)subscript𝛾subscript𝑃𝛼\gamma_{{P_{\ell}}}\left({\alpha}\right)italic_γ start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_α ) be the α𝛼\alphaitalic_α-active nodes at level \ellroman_ℓ.

Theorem 5.5.

Given a distribution P𝑃Pitalic_P on [N]delimited-[]𝑁[N][ italic_N ], ε>0𝜀0\varepsilon>0italic_ε > 0, δ[0,1]𝛿01\delta\in[0,1]italic_δ ∈ [ 0 , 1 ], and n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N, let κ=110εnmin{W(0.45εδ),0.6}𝜅110𝜀𝑛𝑊0.45𝜀𝛿0.6\kappa=\frac{1}{10\varepsilon n}\min\{W\left(\frac{0.45\varepsilon}{\delta}% \right),0.6\}italic_κ = divide start_ARG 1 end_ARG start_ARG 10 italic_ε italic_n end_ARG roman_min { italic_W ( divide start_ARG 0.45 italic_ε end_ARG start_ARG italic_δ end_ARG ) , 0.6 } where W(x)𝑊𝑥W(x)italic_W ( italic_x ) is the Lambert W function so W(x)eW(x)=x𝑊𝑥superscript𝑒𝑊𝑥𝑥W(x)e^{W(x)}=xitalic_W ( italic_x ) italic_e start_POSTSUPERSCRIPT italic_W ( italic_x ) end_POSTSUPERSCRIPT = italic_x, then

𝒩,n,ε(P)=Ω(maxrx[N]min{P(x)(1P(x)),P(x)(1P(x))n}+xγP(2κ)P(x)+(|γP(2κ)|1)κ),subscript𝒩𝑛𝜀𝑃Ωsubscriptsubscript𝑟subscript𝑥delimited-[]subscript𝑁subscript𝑃𝑥1subscript𝑃𝑥subscript𝑃𝑥1subscript𝑃𝑥𝑛subscript𝑥subscript𝛾subscript𝑃2𝜅subscript𝑃𝑥subscript𝛾subscript𝑃2𝜅1𝜅\mathcal{R}_{\mathcal{N},n,\varepsilon}(P)=\Omega\left(\max_{\ell}r_{\ell}\sum% _{x\in[N_{\ell}]}\min\left\{P_{\ell}(x)(1-P_{\ell}(x)),\sqrt{\frac{P_{\ell}(x)% (1-P_{\ell}(x))}{n}}\right\}+\sum_{x\notin\gamma_{{P_{\ell}}}\left({2\kappa}% \right)}P_{\ell}(x)+(|\gamma_{{P_{\ell}}}\left({2\kappa}\right)|-1)\kappa% \right),caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n , italic_ε end_POSTSUBSCRIPT ( italic_P ) = roman_Ω ( roman_max start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_x ∈ [ italic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT roman_min { italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) ( 1 - italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) ) , square-root start_ARG divide start_ARG italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) ( 1 - italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) ) end_ARG start_ARG italic_n end_ARG end_ARG } + ∑ start_POSTSUBSCRIPT italic_x ∉ italic_γ start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 2 italic_κ ) end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) + ( | italic_γ start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 2 italic_κ ) | - 1 ) italic_κ ) ,

where the max is over all the levels of the tree.

Note that κ1εnmin{log(1/δ),1}𝜅1𝜀𝑛1𝛿1\kappa\approx\frac{1}{\varepsilon n}\min\{\log(1/\delta),1\}italic_κ ≈ divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG roman_min { roman_log ( start_ARG 1 / italic_δ end_ARG ) , 1 } so the dependence on ε𝜀\varepsilonitalic_ε and n𝑛nitalic_n in Theorem 5.5 matches the upper bound in Theorem 5.13. The error rate 𝒩,n,εsubscript𝒩𝑛𝜀\mathcal{R}_{\mathcal{N},n,\varepsilon}caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n , italic_ε end_POSTSUBSCRIPT does indeed adapt to easy instances as we expected. The error decomposes into three components. The first component is the non-private sampling error; the error that would occur even if privacy was not required. The second component indicates that we can not estimate the value of nodes that have probability less than 1/(εn)1𝜀𝑛1/(\varepsilon n)1 / ( italic_ε italic_n ). The third component is the error due to privacy on the active nodes. If P𝑃Pitalic_P is highly concentrated then we expect most nodes to either be 1εn1𝜀𝑛\frac{1}{\varepsilon n}divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG-active or have weight 0, so the first two terms in 𝒩,n,ε(P)subscript𝒩𝑛𝜀𝑃\mathcal{R}_{\mathcal{N},n,\varepsilon}(P)caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n , italic_ε end_POSTSUBSCRIPT ( italic_P ) are small. There should also be few active nodes, making the last term smaller as well. Conversely, if P𝑃Pitalic_P has a large region of low density then we expect a large number of inactive nodes, as well as non-zero inactive nodes that are at higher levels of the tree and hence contribute more to the final term. Thus, in distributions with high dispersion we expect the right hand side to be large.

The proof of Theorem 4.1 will involve two main steps. First, we will reduce the lower bound on the HST to a lower bound on a star metric, or equivalently estimation of a discrete distribution in TV distance. We’ll then use a variant of Assouad’s inequality to prove the lower bounds on estimating discrete distributions in TV distance.

5.2.1 Reduction to Estimation in TV distance of Discrete Distributions

The key observation is that in order to estimate the distribution well in Wasserstein distance, an algorithm must estimate each level of the tree well in TV distance. Any estimate of P𝑃Pitalic_P also induces an estimate of Psubscript𝑃P_{\ell}italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT; let P^^𝑃\hat{P}over^ start_ARG italic_P end_ARG be an estimate of the distribution P𝑃Pitalic_P and P^subscript^𝑃\hat{P}_{\ell}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT be the induced estimate of the distribution at level \ellroman_ℓ. Then for any distribution P𝑃Pitalic_P

𝒲(P,P^)=[0pt]rTV(P,P^).𝒲𝑃^𝑃subscriptdelimited-[]0𝑝𝑡subscript𝑟𝑇𝑉subscript𝑃subscript^𝑃\mathcal{W}(P,\hat{P})=\sum_{\ell\in[0pt]}r_{\ell}TV(P_{\ell},\hat{P}_{\ell}).caligraphic_W ( italic_P , over^ start_ARG italic_P end_ARG ) = ∑ start_POSTSUBSCRIPT roman_ℓ ∈ [ 0 italic_p italic_t ] end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT italic_T italic_V ( italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) .

The following observation ensures that our notions of instance optimality in both the Wasserstein metric and the per-level TV distance are compatible at every level \ellroman_ℓ.

Theorem 5.6.

For every level [0pt]delimited-[]0𝑝𝑡\ell\in[0pt]roman_ℓ ∈ [ 0 italic_p italic_t ], define the neighborhood of Psubscript𝑃P_{\ell}italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT as 𝒩:Δ([N])𝔓(Δ([N])):subscript𝒩Δdelimited-[]subscript𝑁𝔓Δdelimited-[]subscript𝑁\mathcal{N}_{\ell}:\Delta([N_{\ell}])\to\mathfrak{P}(\Delta([N_{\ell}]))caligraphic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT : roman_Δ ( [ italic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ] ) → fraktur_P ( roman_Δ ( [ italic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ] ) ) by 𝒩(P)={Q|D(P,Q)ln2}subscript𝒩subscript𝑃conditional-setsubscript𝑄subscript𝐷subscript𝑃subscript𝑄2\mathcal{N}_{\ell}(P_{\ell})=\{Q_{\ell}\;|\;D_{\infty}(P_{\ell},Q_{\ell})\leq% \ln 2\}caligraphic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) = { italic_Q start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , italic_Q start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) ≤ roman_ln 2 }. Then,

𝒩,n,ε(P)max[0pt]r𝒩,n,ε(P),subscript𝒩𝑛𝜀𝑃subscriptdelimited-[]0𝑝𝑡subscript𝑟subscriptsubscript𝒩𝑛𝜀subscript𝑃\mathcal{R}_{\mathcal{N},n,\varepsilon}(P)\geq\max_{\ell\in[0pt]}r_{\ell}\cdot% \mathcal{R}_{\mathcal{N}_{\ell},n,\varepsilon}(P_{\ell}),caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n , italic_ε end_POSTSUBSCRIPT ( italic_P ) ≥ roman_max start_POSTSUBSCRIPT roman_ℓ ∈ [ 0 italic_p italic_t ] end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ⋅ caligraphic_R start_POSTSUBSCRIPT caligraphic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , italic_n , italic_ε end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) ,

where the error of P𝑃Pitalic_P is measured in the Wasserstein distance and Psubscript𝑃P_{\ell}italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT is measured in the TV distance.

Recall that 𝒩,n,ε(P)subscriptsubscript𝒩𝑛𝜀subscript𝑃\mathcal{R}_{\mathcal{N}_{\ell},n,\varepsilon}(P_{\ell})caligraphic_R start_POSTSUBSCRIPT caligraphic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , italic_n , italic_ε end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) is the optimal estimation rate with respect to 𝒩subscript𝒩\mathcal{N}_{\ell}caligraphic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT where the error is measured with respect to the total variation error. The proof of Theorem 5.6 can be found in Appendix C.

5.2.2 Characterizing Target Estimation Rate for Discrete Distributions

In light of Theorem 5.6, we will focus on characterizing the difficulty of estimating the distribution at a single level of the tree for the remainder of this section. Since this is fundamentally a statement about estimating discrete distributions in TV distance, we will state everything in this section in terms of general discrete distributions. Let N𝑁N\in\mathbb{N}italic_N ∈ blackboard_N, and let P𝑃Pitalic_P be a distribution on [N]delimited-[]𝑁[N][ italic_N ]. Define 𝒩(P)={Q|D(P,Q)ln2}𝒩𝑃conditional-set𝑄subscript𝐷𝑃𝑄2\mathcal{N}(P)=\{Q\;|\;D_{\infty}(P,Q)\leq\ln 2\}caligraphic_N ( italic_P ) = { italic_Q | italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_Q ) ≤ roman_ln 2 }. Our goal is to give a lower bound for f,n,ε(P)subscript𝑓𝑛𝜀𝑃\mathcal{R}_{f,n,\varepsilon}(P)caligraphic_R start_POSTSUBSCRIPT italic_f , italic_n , italic_ε end_POSTSUBSCRIPT ( italic_P ), where the metric is the TV distance.

Theorem 5.7.

Given ε>0𝜀0\varepsilon>0italic_ε > 0 and δ[0,1]𝛿01\delta\in[0,1]italic_δ ∈ [ 0 , 1 ], let κ=110εnmin{W(0.45εδ),0.6}𝜅110𝜀𝑛𝑊0.45𝜀𝛿0.6\kappa=\frac{1}{10\varepsilon n}\min\{W\left(\frac{0.45\varepsilon}{\delta}% \right),0.6\}italic_κ = divide start_ARG 1 end_ARG start_ARG 10 italic_ε italic_n end_ARG roman_min { italic_W ( divide start_ARG 0.45 italic_ε end_ARG start_ARG italic_δ end_ARG ) , 0.6 } where W(x)𝑊𝑥W(x)italic_W ( italic_x ) is the Lambert W function so W(x)eW(x)=x𝑊𝑥superscript𝑒𝑊𝑥𝑥W(x)e^{W(x)}=xitalic_W ( italic_x ) italic_e start_POSTSUPERSCRIPT italic_W ( italic_x ) end_POSTSUPERSCRIPT = italic_x. Given a distribution P𝑃Pitalic_P,

𝒩,n,ε(P)=Ω(x[N]min{P(x)(1P(x)),P(x)(1P(x))n}+xγP(2κ)P(x)+(|γP(2κ)|1)κ)subscript𝒩𝑛𝜀𝑃Ωsubscript𝑥delimited-[]𝑁𝑃𝑥1𝑃𝑥𝑃𝑥1𝑃𝑥𝑛subscript𝑥subscript𝛾𝑃2𝜅𝑃𝑥subscript𝛾𝑃2𝜅1𝜅\mathcal{R}_{\mathcal{N},n,\varepsilon}(P)=\Omega\left(\sum_{x\in[N]}\min\left% \{P(x)(1-P(x)),\sqrt{\frac{P(x)(1-P(x))}{n}}\right\}+\sum_{x\notin\gamma_{{P}}% \left({2\kappa}\right)}P(x)+(|\gamma_{{P}}\left({2\kappa}\right)|-1)\kappa\right)caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n , italic_ε end_POSTSUBSCRIPT ( italic_P ) = roman_Ω ( ∑ start_POSTSUBSCRIPT italic_x ∈ [ italic_N ] end_POSTSUBSCRIPT roman_min { italic_P ( italic_x ) ( 1 - italic_P ( italic_x ) ) , square-root start_ARG divide start_ARG italic_P ( italic_x ) ( 1 - italic_P ( italic_x ) ) end_ARG start_ARG italic_n end_ARG end_ARG } + ∑ start_POSTSUBSCRIPT italic_x ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 2 italic_κ ) end_POSTSUBSCRIPT italic_P ( italic_x ) + ( | italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 2 italic_κ ) | - 1 ) italic_κ )

Theorem 5.5 follows immediately from Theorem 5.6 and Theorem 5.7. The main tool we will use is a differentially private version of Assouad’s method. This gives us a method for lower bounding the error by constructing nets of distributions that are pairwise far in the relevant metric of interest, which for us in the TV distance. The following is a slight variant on the differentially private variant of Assouad’s lemma given in [ASZ21]. Rather than building a set of distributions indexed by a hypercube, we will build a set of distributions over a product of hypercubes. Since this is an extension of the version that appears in [ASZ21], we include a proof in Appendix C for completeness.

Lemma 5.8.

[A extension of (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-DP Assouad’s method [ASZ21]] Let k0,k1,subscript𝑘0subscript𝑘1k_{0},k_{1},\cdotsitalic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ be a sequence of natural numbers such that sks<subscript𝑠subscript𝑘𝑠\sum_{s}k_{s}<\infty∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT < ∞, ε>0𝜀0\varepsilon>0italic_ε > 0 and δ[0,1]𝛿01\delta\in[0,1]italic_δ ∈ [ 0 , 1 ]. Given a family of distributions 𝒫Δ(𝒳)𝒫Δ𝒳\mathcal{P}\subset\Delta(\mathcal{X})caligraphic_P ⊂ roman_Δ ( caligraphic_X ) on a space 𝒳𝒳\mathcal{X}caligraphic_X, a parameter θ:𝒫:𝜃𝒫\theta:\mathcal{P}\to\mathcal{M}italic_θ : caligraphic_P → caligraphic_M where \mathcal{M}caligraphic_M is a metric space with metric d𝑑ditalic_d, suppose that there exists a set 𝒱𝒫𝒱𝒫\mathcal{V}\subset\mathcal{P}caligraphic_V ⊂ caligraphic_P of distributions indexed by the product of hypercubes k0×k1×subscriptsubscript𝑘0subscriptsubscript𝑘1\mathcal{E}_{k_{0}}\times\mathcal{E}_{k_{1}}\times\cdotscaligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × ⋯ where k:={±1}kassignsubscript𝑘superscriptplus-or-minus1𝑘\mathcal{E}_{k}:=\{\pm 1\}^{k}caligraphic_E start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := { ± 1 } start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT such that for a sequence τ0,τ1,subscript𝜏0subscript𝜏1\tau_{0},\tau_{1},\cdotsitalic_τ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯,

(u0,u1,),(v0,v1,)k0×k1×,d(θ(pu),θ(pv))2sτsj=1ksχujsvjs.formulae-sequencefor-allsuperscript𝑢0superscript𝑢1superscript𝑣0superscript𝑣1subscriptsubscript𝑘0subscriptsubscript𝑘1𝑑𝜃subscript𝑝𝑢𝜃subscript𝑝𝑣2subscript𝑠subscript𝜏𝑠superscriptsubscript𝑗1subscript𝑘𝑠subscript𝜒subscriptsuperscript𝑢𝑠𝑗superscriptsubscript𝑣𝑗𝑠\forall(u^{0},u^{1},\cdots),(v^{0},v^{1},\cdots)\in\mathcal{E}_{k_{0}}\times% \mathcal{E}_{k_{1}}\times\cdots,\;\;\;d(\theta(p_{u}),\theta(p_{v}))\geq 2\sum% _{s}\tau_{s}\sum_{j=1}^{k_{s}}\chi_{u^{s}_{j}\neq v_{j}^{s}}.∀ ( italic_u start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , ⋯ ) , ( italic_v start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_v start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , ⋯ ) ∈ caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × ⋯ , italic_d ( italic_θ ( italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) , italic_θ ( italic_p start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) ) ≥ 2 ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_χ start_POSTSUBSCRIPT italic_u start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≠ italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT end_POSTSUBSCRIPT . (5)

For each coordinate s𝑠s\in\mathbb{N}italic_s ∈ blackboard_N, j[ks]𝑗delimited-[]subscript𝑘𝑠j\in[k_{s}]italic_j ∈ [ italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ], consider the mixture distributions obtained by averaging over all distributions with a fixed value at the (s,j)𝑠𝑗(s,j)( italic_s , italic_j )th coordinate:

p+(s,j)=2|k0×k1×|uk0×k1×:ujs=+1pu,p(s,j)=2|k0×k1×|uk0×k1×:ujs=1pu,formulae-sequencesubscript𝑝𝑠𝑗2subscriptsubscript𝑘0subscriptsubscript𝑘1subscript:𝑢subscriptsubscript𝑘0subscriptsubscript𝑘1subscriptsuperscript𝑢𝑠𝑗1subscript𝑝𝑢subscript𝑝𝑠𝑗2subscriptsubscript𝑘0subscriptsubscript𝑘1subscript:𝑢subscriptsubscript𝑘0subscriptsubscript𝑘1subscriptsuperscript𝑢𝑠𝑗1subscript𝑝𝑢p_{+(s,j)}=\frac{2}{|\mathcal{E}_{k_{0}}\times\mathcal{E}_{k_{1}}\times\cdots|% }\sum_{u\in\mathcal{E}_{k_{0}}\times\mathcal{E}_{k_{1}}\times\cdots:u^{s}_{j}=% +1}p_{u},\;\;p_{-(s,j)}=\frac{2}{|\mathcal{E}_{k_{0}}\times\mathcal{E}_{k_{1}}% \times\cdots|}\sum_{u\in\mathcal{E}_{k_{0}}\times\mathcal{E}_{k_{1}}\times% \cdots:u^{s}_{j}=-1}p_{u},italic_p start_POSTSUBSCRIPT + ( italic_s , italic_j ) end_POSTSUBSCRIPT = divide start_ARG 2 end_ARG start_ARG | caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × ⋯ | end_ARG ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × ⋯ : italic_u start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = + 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT - ( italic_s , italic_j ) end_POSTSUBSCRIPT = divide start_ARG 2 end_ARG start_ARG | caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × ⋯ | end_ARG ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × ⋯ : italic_u start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = - 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ,

and let ϕs,j:𝒳n{1,+1}:subscriptitalic-ϕ𝑠𝑗superscript𝒳𝑛11\phi_{s,j}:\mathcal{X}^{n}\to\{-1,+1\}italic_ϕ start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT : caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → { - 1 , + 1 } be a binary classifier. Then

min𝒜 is (ε,δ)-DPmaxp𝒱𝒜,n(p)12sτsj=1ksminϕs,j is (ε,δ)-DP(PrXp+(s,j)n(ϕs,j(X)1)+PrXp(s,j)n(ϕs,j(X)1)),subscript𝒜 is 𝜀𝛿-DPsubscript𝑝𝒱subscript𝒜𝑛𝑝12subscript𝑠subscript𝜏𝑠superscriptsubscript𝑗1subscript𝑘𝑠subscriptsubscriptitalic-ϕ𝑠𝑗 is 𝜀𝛿-DPsubscriptprobabilitysimilar-to𝑋superscriptsubscript𝑝𝑠𝑗𝑛subscriptitalic-ϕ𝑠𝑗𝑋1subscriptprobabilitysimilar-to𝑋superscriptsubscript𝑝𝑠𝑗𝑛subscriptitalic-ϕ𝑠𝑗𝑋1\min_{\mathcal{A}\text{ is }(\varepsilon,\delta)\text{-DP}}\max_{p\in\mathcal{% V}}\mathcal{R}_{\mathcal{A},n}(p)\geq\frac{1}{2}\sum_{s}\tau_{s}\sum_{j=1}^{k_% {s}}\min_{\phi_{s,j}\text{ is }(\varepsilon,\delta)\text{-DP}}(\Pr_{X\sim p_{+% (s,j)}^{n}}(\phi_{s,j}(X)\neq 1)+\Pr_{X\sim p_{-(s,j)}^{n}}(\phi_{s,j}(X)\neq-% 1)),roman_min start_POSTSUBSCRIPT caligraphic_A is ( italic_ε , italic_δ ) -DP end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_p ∈ caligraphic_V end_POSTSUBSCRIPT caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_p ) ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_min start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT is ( italic_ε , italic_δ ) -DP end_POSTSUBSCRIPT ( roman_Pr start_POSTSUBSCRIPT italic_X ∼ italic_p start_POSTSUBSCRIPT + ( italic_s , italic_j ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ϕ start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT ( italic_X ) ≠ 1 ) + roman_Pr start_POSTSUBSCRIPT italic_X ∼ italic_p start_POSTSUBSCRIPT - ( italic_s , italic_j ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ϕ start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT ( italic_X ) ≠ - 1 ) ) ,

where the min on the LHS is over all (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-DP mechanisms, and on the right hand side is over all (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-DP binary classifiers. Moreover, if for all s𝑠s\in\mathbb{N}italic_s ∈ blackboard_N, j[ks]𝑗delimited-[]subscript𝑘𝑠j\in[k_{s}]italic_j ∈ [ italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ], there exists a coupling (X,Y)𝑋𝑌(X,Y)( italic_X , italic_Y ) between p+(s,j)nsuperscriptsubscript𝑝𝑠𝑗𝑛p_{+(s,j)}^{n}italic_p start_POSTSUBSCRIPT + ( italic_s , italic_j ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and p(s,j)nsuperscriptsubscript𝑝𝑠𝑗𝑛p_{-(s,j)}^{n}italic_p start_POSTSUBSCRIPT - ( italic_s , italic_j ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT with 𝔼[dHam(X,Y)]Ds𝔼delimited-[]subscript𝑑𝐻𝑎𝑚𝑋𝑌subscript𝐷𝑠\mathbb{E}[d_{Ham}(X,Y)]\leq D_{s}blackboard_E [ italic_d start_POSTSUBSCRIPT italic_H italic_a italic_m end_POSTSUBSCRIPT ( italic_X , italic_Y ) ] ≤ italic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, then

min𝒜 is (ε,δ)-DPmaxp𝒱𝒜,n(p)sksτs2(0.9e10εDs10Dsδ)subscript𝒜 is 𝜀𝛿-DPsubscript𝑝𝒱subscript𝒜𝑛𝑝subscript𝑠subscript𝑘𝑠subscript𝜏𝑠20.9superscript𝑒10𝜀subscript𝐷𝑠10subscript𝐷𝑠𝛿\min_{\mathcal{A}\text{ is }(\varepsilon,\delta)\text{-DP}}\max_{p\in\mathcal{% V}}\mathcal{R}_{\mathcal{A},n}(p)\geq\sum_{s}\frac{k_{s}\tau_{s}}{2}(0.9e^{-10% \varepsilon D_{s}}-10D_{s}\delta)roman_min start_POSTSUBSCRIPT caligraphic_A is ( italic_ε , italic_δ ) -DP end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_p ∈ caligraphic_V end_POSTSUBSCRIPT caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_p ) ≥ ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT divide start_ARG italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ( 0.9 italic_e start_POSTSUPERSCRIPT - 10 italic_ε italic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT - 10 italic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_δ )

Note that an upper bound on TV(Pi,Pj)γ𝑇𝑉subscript𝑃𝑖subscript𝑃𝑗𝛾TV(P_{i},P_{j})\leq\gammaitalic_T italic_V ( italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ≤ italic_γ implies there exists a coupling (X,Y)𝑋𝑌(X,Y)( italic_X , italic_Y ) between Pinsuperscriptsubscript𝑃𝑖𝑛P_{i}^{n}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and Pjnsuperscriptsubscript𝑃𝑗𝑛P_{j}^{n}italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT such that 𝔼[dHam(X,Y)]nγ𝔼delimited-[]subscript𝑑𝐻𝑎𝑚𝑋𝑌𝑛𝛾\mathbb{E}[d_{Ham}(X,Y)]\leq n\gammablackboard_E [ italic_d start_POSTSUBSCRIPT italic_H italic_a italic_m end_POSTSUBSCRIPT ( italic_X , italic_Y ) ] ≤ italic_n italic_γ.

We will separately prove that each of the three terms in Theorem 5.7 belong in the lower bound. Each proof will follow the same underlying structure. Given a distribution P𝑃Pitalic_P, the main technical step is carefully designing a family of distributions in 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ) that satisfy the conditions of Lemma 5.8. Lemma 5.9 and Lemma 5.10 give lower bounds on the noise due to privacy. Lemma 5.11 gives lower bounds based on the error due to sampling.

Let

κ=110εmin{W(0.45εδ),0.6},𝜅110𝜀𝑊0.45𝜀𝛿0.6\kappa=\frac{1}{10\varepsilon}\min\{W\left(\frac{0.45\varepsilon}{\delta}% \right),0.6\},italic_κ = divide start_ARG 1 end_ARG start_ARG 10 italic_ε end_ARG roman_min { italic_W ( divide start_ARG 0.45 italic_ε end_ARG start_ARG italic_δ end_ARG ) , 0.6 } ,

where W(x)lnx(1o(1))lnlnx𝑊𝑥𝑥1𝑜1𝑥W(x)\approx\ln x-(1-o(1))\ln\ln xitalic_W ( italic_x ) ≈ roman_ln italic_x - ( 1 - italic_o ( 1 ) ) roman_ln roman_ln italic_x is the Lambert W function satisfying W(x)eW(x)=x𝑊𝑥superscript𝑒𝑊𝑥𝑥W(x)e^{W(x)}=xitalic_W ( italic_x ) italic_e start_POSTSUPERSCRIPT italic_W ( italic_x ) end_POSTSUPERSCRIPT = italic_x. In both lemma proofs we will use the inequality that if Dκ𝐷𝜅D\leq\kappaitalic_D ≤ italic_κ, then

0.9e10εD10Dδ0.9superscript𝑒10𝜀𝐷10𝐷𝛿\displaystyle 0.9e^{-10\varepsilon D}-10D\delta0.9 italic_e start_POSTSUPERSCRIPT - 10 italic_ε italic_D end_POSTSUPERSCRIPT - 10 italic_D italic_δ e10εD(0.9W(0.45εδ)eW(0.45εδ)δε)=e10εD(0.90.45εδδε)e10εD0.450.2absentsuperscript𝑒10𝜀𝐷0.9𝑊0.45𝜀𝛿superscript𝑒𝑊0.45𝜀𝛿𝛿𝜀superscript𝑒10𝜀𝐷0.90.45𝜀𝛿𝛿𝜀superscript𝑒10𝜀𝐷0.450.2\displaystyle\geq e^{-10\varepsilon D}\left(0.9-W\left(\frac{0.45\varepsilon}{% \delta}\right)e^{W\left(\frac{0.45\varepsilon}{\delta}\right)}\frac{\delta}{% \varepsilon}\right)=e^{-10\varepsilon D}\left(0.9-\frac{0.45\varepsilon}{% \delta}\frac{\delta}{\varepsilon}\right)\geq e^{-10\varepsilon D}0.45\geq 0.2≥ italic_e start_POSTSUPERSCRIPT - 10 italic_ε italic_D end_POSTSUPERSCRIPT ( 0.9 - italic_W ( divide start_ARG 0.45 italic_ε end_ARG start_ARG italic_δ end_ARG ) italic_e start_POSTSUPERSCRIPT italic_W ( divide start_ARG 0.45 italic_ε end_ARG start_ARG italic_δ end_ARG ) end_POSTSUPERSCRIPT divide start_ARG italic_δ end_ARG start_ARG italic_ε end_ARG ) = italic_e start_POSTSUPERSCRIPT - 10 italic_ε italic_D end_POSTSUPERSCRIPT ( 0.9 - divide start_ARG 0.45 italic_ε end_ARG start_ARG italic_δ end_ARG divide start_ARG italic_δ end_ARG start_ARG italic_ε end_ARG ) ≥ italic_e start_POSTSUPERSCRIPT - 10 italic_ε italic_D end_POSTSUPERSCRIPT 0.45 ≥ 0.2 (6)
Lemma 5.9.

Given a distribution P𝑃Pitalic_P, ε>0𝜀0\varepsilon>0italic_ε > 0, δ[0,1]𝛿01\delta\in[0,1]italic_δ ∈ [ 0 , 1 ] and n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N,

𝒩,n,ε(P)0.1(|γP(2κn)|1)κn.subscript𝒩𝑛𝜀𝑃0.1subscript𝛾𝑃2𝜅𝑛1𝜅𝑛\mathcal{R}_{\mathcal{N},n,\varepsilon}(P)\geq 0.1\left(\left|\gamma_{{P}}% \left({\frac{2\kappa}{n}}\right)\right|-1\right)\frac{\kappa}{n}.caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n , italic_ε end_POSTSUBSCRIPT ( italic_P ) ≥ 0.1 ( | italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 2 italic_κ end_ARG start_ARG italic_n end_ARG ) | - 1 ) divide start_ARG italic_κ end_ARG start_ARG italic_n end_ARG .
Proof.

Let L=|γP(2κ/n)|𝐿subscript𝛾𝑃2𝜅𝑛L=|\gamma_{{P}}\left({2\kappa/n}\right)|italic_L = | italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 2 italic_κ / italic_n ) | be the number of active nodes. If L=1𝐿1L=1italic_L = 1 then the RHS is 0 and so we are done. Otherwise, assume L>1𝐿1L>1italic_L > 1 and let k=L/21𝑘𝐿21k={\lfloor{L/2}\rfloor}\geq 1italic_k = ⌊ italic_L / 2 ⌋ ≥ 1. Using the notation from Lemma 5.8, let k0=ksubscript𝑘0𝑘k_{0}=kitalic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_k and ks=0subscript𝑘𝑠0k_{s}=0italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 0 for all s>0𝑠0s>0italic_s > 0. We will drop the reference to s𝑠sitalic_s in the notation since only s=0𝑠0s=0italic_s = 0 is significant.

Pair up the active nodes to form k𝑘kitalic_k pairs of active nodes denoted by (a1+,a1),,(ak+,ak)superscriptsubscript𝑎1superscriptsubscript𝑎1superscriptsubscript𝑎𝑘superscriptsubscript𝑎𝑘(a_{1}^{+},a_{1}^{-}),\cdots,(a_{k}^{+},a_{k}^{-})( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) , ⋯ , ( italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ). Given uk𝑢subscript𝑘u\in\mathcal{E}_{k}italic_u ∈ caligraphic_E start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, define the distribution pusubscript𝑝𝑢p_{u}italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT as follows: for all ajbγP(2κ/n)superscriptsubscript𝑎𝑗𝑏subscript𝛾𝑃2𝜅𝑛a_{j}^{b}\in\gamma_{{P}}\left({2\kappa/n}\right)italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT ∈ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 2 italic_κ / italic_n ), Pu(aj+)=P(aj+)+(κ/n)subscript𝑃𝑢superscriptsubscript𝑎𝑗𝑃superscriptsubscript𝑎𝑗𝜅𝑛P_{u}(a_{j}^{+})=P(a_{j}^{+})+(\kappa/n)italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) = italic_P ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) + ( italic_κ / italic_n ) and Pu(aj)=P(aj)(κ/n)subscript𝑃𝑢superscriptsubscript𝑎𝑗𝑃superscriptsubscript𝑎𝑗𝜅𝑛P_{u}(a_{j}^{-})=P(a_{j}^{-})-(\kappa/n)italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) = italic_P ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) - ( italic_κ / italic_n ) if uj=+1subscript𝑢𝑗1u_{j}=+1italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = + 1 and Pu(aj+)=P(aj+)(κ/n)subscript𝑃𝑢superscriptsubscript𝑎𝑗𝑃superscriptsubscript𝑎𝑗𝜅𝑛P_{u}(a_{j}^{+})=P(a_{j}^{+})-(\kappa/n)italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) = italic_P ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) - ( italic_κ / italic_n ) and Pu(aj)=P(aj)+(κ/n)subscript𝑃𝑢superscriptsubscript𝑎𝑗𝑃superscriptsubscript𝑎𝑗𝜅𝑛P_{u}(a_{j}^{-})=P(a_{j}^{-})+(\kappa/n)italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) = italic_P ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) + ( italic_κ / italic_n ) if uj=1subscript𝑢𝑗1u_{j}=-1italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = - 1. For all other x𝑥xitalic_x, Pu(x)=P(x)subscript𝑃𝑢𝑥𝑃𝑥P_{u}(x)=P(x)italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_x ) = italic_P ( italic_x ). It is immediate that for all u𝑢uitalic_u, Pu𝒩(P)subscript𝑃𝑢𝒩𝑃P_{u}\in\mathcal{N}(P)italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∈ caligraphic_N ( italic_P ). For any pair u,v𝑢𝑣u,vitalic_u , italic_v, d(θ(pu),θ(pv))=TV(pu,pv)=dHam(u,v)(κ/n)𝑑𝜃subscript𝑝𝑢𝜃subscript𝑝𝑣TVsubscript𝑝𝑢subscript𝑝𝑣subscript𝑑𝐻𝑎𝑚𝑢𝑣𝜅𝑛d(\theta(p_{u}),\theta(p_{v}))=\text{\rm TV}(p_{u},p_{v})=d_{Ham}(u,v)(\kappa/n)italic_d ( italic_θ ( italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) , italic_θ ( italic_p start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) ) = TV ( italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) = italic_d start_POSTSUBSCRIPT italic_H italic_a italic_m end_POSTSUBSCRIPT ( italic_u , italic_v ) ( italic_κ / italic_n ), so that Equation 5 is satisfied with τ=12(κ/n)𝜏12𝜅𝑛\tau=\frac{1}{2}(\kappa/n)italic_τ = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_κ / italic_n ). Further, given j[k]𝑗delimited-[]𝑘j\in[k]italic_j ∈ [ italic_k ], p+jsubscript𝑝𝑗p_{+j}italic_p start_POSTSUBSCRIPT + italic_j end_POSTSUBSCRIPT and pjsubscript𝑝𝑗p_{-j}italic_p start_POSTSUBSCRIPT - italic_j end_POSTSUBSCRIPT only differ on the probability of aj+superscriptsubscript𝑎𝑗a_{j}^{+}italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT and ajsuperscriptsubscript𝑎𝑗a_{j}^{-}italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT, so D/n=maxjTV(p+j,pj)=κ/n𝐷𝑛subscript𝑗TVsubscript𝑝𝑗subscript𝑝𝑗𝜅𝑛D/n=\max_{j}\text{\rm TV}(p_{+j},p_{-j})=\kappa/nitalic_D / italic_n = roman_max start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT TV ( italic_p start_POSTSUBSCRIPT + italic_j end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT - italic_j end_POSTSUBSCRIPT ) = italic_κ / italic_n and by Equation 6, 0.9e10εD10Dδ0.2.0.9superscript𝑒10𝜀𝐷10𝐷𝛿0.20.9e^{-10\varepsilon D}-10D\delta\geq 0.2.0.9 italic_e start_POSTSUPERSCRIPT - 10 italic_ε italic_D end_POSTSUPERSCRIPT - 10 italic_D italic_δ ≥ 0.2 . Noting that k(1/2)(γP(2κ/n)1)𝑘12subscript𝛾𝑃2𝜅𝑛1k\geq(1/2)(\gamma_{{P}}\left({2\kappa/n}\right)-1)italic_k ≥ ( 1 / 2 ) ( italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 2 italic_κ / italic_n ) - 1 ) completes the proof. ∎

Lemma 5.10.

For all ε>0𝜀0\varepsilon>0italic_ε > 0, δ[0,1]𝛿01\delta\in[0,1]italic_δ ∈ [ 0 , 1 ], n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N and distributions P𝑃Pitalic_P on [N]delimited-[]𝑁[N][ italic_N ], if κ<n/2𝜅𝑛2\kappa<n/2italic_κ < italic_n / 2, then

𝒩,n,ε(P)Ω(xγP(2κ)P(x)).subscript𝒩𝑛𝜀𝑃Ωsubscript𝑥subscript𝛾𝑃2𝜅𝑃𝑥\mathcal{R}_{\mathcal{N},n,\varepsilon}(P)\geq\Omega\left(\sum_{x\notin\gamma_% {{P}}\left({2\kappa}\right)}P(x)\right).caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n , italic_ε end_POSTSUBSCRIPT ( italic_P ) ≥ roman_Ω ( ∑ start_POSTSUBSCRIPT italic_x ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 2 italic_κ ) end_POSTSUBSCRIPT italic_P ( italic_x ) ) .

Since κ1εn𝜅1𝜀𝑛\kappa\leq\frac{1}{\varepsilon n}italic_κ ≤ divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG, the condition that κ<n/2𝜅𝑛2\kappa<n/2italic_κ < italic_n / 2 is a mild condition. For example, it is satisfied whenever ε>2/n2𝜀2superscript𝑛2\varepsilon>2/n^{2}italic_ε > 2 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

Similar to the proof of Lemma 5.9, we are going to pair up the coordinates and move mass between the coordinates to create the distributions indexed by the product of hypercubes. Since we want all the distributions we create to be in 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ), we will divide the space into scales such that all elements in the same scale have approximately the same probability of occurring. We’ll then move mass within these scales. For s𝑠s\in\mathbb{N}italic_s ∈ blackboard_N, let 𝒮s={x[N]|P(x)(2s1,2s]}subscript𝒮𝑠conditional-set𝑥delimited-[]𝑁𝑃𝑥superscript2𝑠1superscript2𝑠\mathcal{S}_{s}=\{x\in[N]\;|\;P(x)\in(2^{-s-1},2^{-s}]\}caligraphic_S start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = { italic_x ∈ [ italic_N ] | italic_P ( italic_x ) ∈ ( 2 start_POSTSUPERSCRIPT - italic_s - 1 end_POSTSUPERSCRIPT , 2 start_POSTSUPERSCRIPT - italic_s end_POSTSUPERSCRIPT ] }.

Proof.

Given s𝑠s\in\mathbb{N}italic_s ∈ blackboard_N, let 𝒮s=𝒮s{x|P(x)2κ/n}superscriptsubscript𝒮𝑠subscript𝒮𝑠conditional-set𝑥𝑃𝑥2𝜅𝑛\mathcal{S}_{s}^{\prime}=\mathcal{S}_{s}\cap\{x\;|\;P(x)\leq 2\kappa/n\}caligraphic_S start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = caligraphic_S start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∩ { italic_x | italic_P ( italic_x ) ≤ 2 italic_κ / italic_n } and ds=|𝒮s|subscript𝑑𝑠superscriptsubscript𝒮𝑠d_{s}=|\mathcal{S}_{s}^{\prime}|italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = | caligraphic_S start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT |.

Let us first consider the case that there exists a scale ssuperscript𝑠s^{*}italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT with ds=1subscript𝑑superscript𝑠1d_{s^{*}}=1italic_d start_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = 1 and P(x)18xγP(2κ)P(x)𝑃superscript𝑥18subscript𝑥subscript𝛾𝑃2𝜅𝑃𝑥P(x^{*})\geq\frac{1}{8}\sum_{x\notin\gamma_{{P}}\left({2\kappa}\right)}P(x)italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≥ divide start_ARG 1 end_ARG start_ARG 8 end_ARG ∑ start_POSTSUBSCRIPT italic_x ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 2 italic_κ ) end_POSTSUBSCRIPT italic_P ( italic_x ) where xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is the element in 𝒮ssuperscriptsubscript𝒮superscript𝑠\mathcal{S}_{s^{*}}^{\prime}caligraphic_S start_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Define Psuperscript𝑃P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT by P(x)=(1/2)P(x)superscript𝑃superscript𝑥12𝑃superscript𝑥P^{\prime}(x^{*})=(1/2)P(x^{*})italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = ( 1 / 2 ) italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) and for all xx𝑥superscript𝑥x\neq x^{*}italic_x ≠ italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, P(x)=1(1/2)P(x)1P(x)P(x)superscript𝑃𝑥112𝑃superscript𝑥1𝑃superscript𝑥𝑃𝑥P^{\prime}(x)=\frac{1-(1/2)P(x^{*})}{1-P(x^{*})}P(x)italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) = divide start_ARG 1 - ( 1 / 2 ) italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG 1 - italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG italic_P ( italic_x ). Since (1/2)P(x)2κ/n1/212𝑃superscript𝑥2𝜅𝑛12(1/2)P(x^{*})\leq 2\kappa/n\leq 1/2( 1 / 2 ) italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≤ 2 italic_κ / italic_n ≤ 1 / 2, P𝒩(P)superscript𝑃𝒩𝑃P^{\prime}\in\mathcal{N}(P)italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_N ( italic_P ). In this case we will use Lemma 5.8 with k0=1subscript𝑘01k_{0}=1italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1, ks=0subscript𝑘𝑠0k_{s}=0italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 0 otherwise, and k0subscriptsubscript𝑘0\mathcal{E}_{k_{0}}caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT corresponds to the set of distributions {P,P}𝑃superscript𝑃\{P,P^{\prime}\}{ italic_P , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT }. Then noting that TV(P,P)=(1/2)P(x)TV𝑃superscript𝑃12𝑃superscript𝑥\text{\rm TV}(P,P^{\prime})=(1/2)P(x^{*})TV ( italic_P , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = ( 1 / 2 ) italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) and using eqn (6) we have that τ=14P(x)𝜏14𝑃superscript𝑥\tau=\frac{1}{4}P(x^{*})italic_τ = divide start_ARG 1 end_ARG start_ARG 4 end_ARG italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) and D=(1/2)P(x)nκ𝐷12𝑃superscript𝑥𝑛𝜅D=(1/2)P(x^{*})n\leq\kappaitalic_D = ( 1 / 2 ) italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) italic_n ≤ italic_κ so that 𝒩,n,ε(P)(1/8)P(x)(0.2)=Ω(xγP(2κ)P(x))subscript𝒩𝑛𝜀𝑃18𝑃superscript𝑥0.2Ωsubscript𝑥subscript𝛾𝑃2𝜅𝑃𝑥\mathcal{R}_{\mathcal{N},n,\varepsilon}(P)\geq(1/8)P(x^{*})(0.2)=\Omega\left(% \sum_{x\notin\gamma_{{P}}\left({2\kappa}\right)}P(x)\right)caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n , italic_ε end_POSTSUBSCRIPT ( italic_P ) ≥ ( 1 / 8 ) italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ( 0.2 ) = roman_Ω ( ∑ start_POSTSUBSCRIPT italic_x ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 2 italic_κ ) end_POSTSUBSCRIPT italic_P ( italic_x ) ) and we are done.

Next suppose that for all scales s𝑠sitalic_s such that ds=1subscript𝑑𝑠1d_{s}=1italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 1 we have P(x)<18xγP(2κ)P(x)𝑃superscript𝑥18subscript𝑥subscript𝛾𝑃2𝜅𝑃𝑥P(x^{*})<\frac{1}{8}\sum_{x\notin\gamma_{{P}}\left({2\kappa}\right)}P(x)italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) < divide start_ARG 1 end_ARG start_ARG 8 end_ARG ∑ start_POSTSUBSCRIPT italic_x ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 2 italic_κ ) end_POSTSUBSCRIPT italic_P ( italic_x ). Let ssuperscript𝑠s^{*}italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT be the smallest s𝑠sitalic_s such that ds=1subscript𝑑𝑠1d_{s}=1italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 1. Since the scales 2s1superscript2𝑠12^{-s-1}2 start_POSTSUPERSCRIPT - italic_s - 1 end_POSTSUPERSCRIPT are geometrically decreasing,

s:ds=1x𝒮s{x|P(x)2κ/n}P(x)2s:ds=12s142s112xγP(2κ)P(x).subscript:𝑠subscript𝑑𝑠1subscript𝑥subscript𝒮𝑠conditional-set𝑥𝑃𝑥2𝜅𝑛𝑃𝑥2subscript:𝑠subscript𝑑𝑠1superscript2𝑠14superscript2superscript𝑠112subscript𝑥subscript𝛾𝑃2𝜅𝑃𝑥\sum_{s:d_{s}=1}\sum_{x\in\mathcal{S}_{s}\cap\{x|P(x)\leq 2\kappa/n\}}P(x)\leq 2% \sum_{s:d_{s}=1}2^{-s-1}\leq 4\cdot 2^{-s^{*}-1}\leq\frac{1}{2}\sum_{x\notin% \gamma_{{P}}\left({2\kappa}\right)}P(x).∑ start_POSTSUBSCRIPT italic_s : italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_S start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∩ { italic_x | italic_P ( italic_x ) ≤ 2 italic_κ / italic_n } end_POSTSUBSCRIPT italic_P ( italic_x ) ≤ 2 ∑ start_POSTSUBSCRIPT italic_s : italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT - italic_s - 1 end_POSTSUPERSCRIPT ≤ 4 ⋅ 2 start_POSTSUPERSCRIPT - italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_x ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 2 italic_κ ) end_POSTSUBSCRIPT italic_P ( italic_x ) .

It follows that s:ds>1x𝒮s{x|P(x)<2κ/n}P(x)(1/2)xγP(2κ)P(x)subscript:𝑠subscript𝑑𝑠1subscript𝑥subscript𝒮𝑠conditional-set𝑥𝑃𝑥2𝜅𝑛𝑃𝑥12subscript𝑥subscript𝛾𝑃2𝜅𝑃𝑥\sum_{s:d_{s}>1}\sum_{x\in\mathcal{S}_{s}\cap\{x|P(x)<2\kappa/n\}}P(x)\geq(1/2% )\sum_{x\notin\gamma_{{P}}\left({2\kappa}\right)}P(x)∑ start_POSTSUBSCRIPT italic_s : italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT > 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_S start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∩ { italic_x | italic_P ( italic_x ) < 2 italic_κ / italic_n } end_POSTSUBSCRIPT italic_P ( italic_x ) ≥ ( 1 / 2 ) ∑ start_POSTSUBSCRIPT italic_x ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 2 italic_κ ) end_POSTSUBSCRIPT italic_P ( italic_x ). Further,

xγP(2κ)P(x)2s:ds>1x𝒮s{x|P(x)<2κ/n}P(x)4s:ds>12s1ds16s:ds>12s1ds/2.subscript𝑥subscript𝛾𝑃2𝜅𝑃𝑥2subscript:𝑠subscript𝑑𝑠1subscript𝑥subscript𝒮𝑠conditional-set𝑥𝑃𝑥2𝜅𝑛𝑃𝑥4subscript:𝑠subscript𝑑𝑠1superscript2𝑠1subscript𝑑𝑠16subscript:𝑠subscript𝑑𝑠1superscript2𝑠1subscript𝑑𝑠2\sum_{x\notin\gamma_{{P}}\left({2\kappa}\right)}P(x)\leq 2\sum_{s:d_{s}>1}\sum% _{x\in\mathcal{S}_{s}\cap\{x|P(x)<2\kappa/n\}}P(x)\leq 4\sum_{s:d_{s}>1}2^{-s-% 1}d_{s}\leq 16\sum_{s:d_{s}>1}2^{-s-1}{\lfloor{d_{s}/2}\rfloor}.∑ start_POSTSUBSCRIPT italic_x ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 2 italic_κ ) end_POSTSUBSCRIPT italic_P ( italic_x ) ≤ 2 ∑ start_POSTSUBSCRIPT italic_s : italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT > 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_S start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∩ { italic_x | italic_P ( italic_x ) < 2 italic_κ / italic_n } end_POSTSUBSCRIPT italic_P ( italic_x ) ≤ 4 ∑ start_POSTSUBSCRIPT italic_s : italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT > 1 end_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT - italic_s - 1 end_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ≤ 16 ∑ start_POSTSUBSCRIPT italic_s : italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT > 1 end_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT - italic_s - 1 end_POSTSUPERSCRIPT ⌊ italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT / 2 ⌋ .

Thus we can (up to constants) ignore scales such that ds1subscript𝑑𝑠1d_{s}\leq 1italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ≤ 1 and assume that dssubscript𝑑𝑠d_{s}italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT is even for all scales.

Let ks=ds/2subscript𝑘𝑠subscript𝑑𝑠2k_{s}={\lfloor{d_{s}/2}\rfloor}italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = ⌊ italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT / 2 ⌋. Now, within each scale 𝒮ssubscript𝒮𝑠\mathcal{S}_{s}caligraphic_S start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, pair the elements to form kssubscript𝑘𝑠k_{s}italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT distinct pairs (as,j+,as,j)superscriptsubscript𝑎𝑠𝑗superscriptsubscript𝑎𝑠𝑗(a_{s,j}^{+},a_{s,j}^{-})( italic_a start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ). Given (u0,u1,)k0×k1×superscript𝑢0superscript𝑢1subscriptsubscript𝑘0subscriptsubscript𝑘1(u^{0},u^{1},\cdots)\in\mathcal{E}_{k_{0}}\times\mathcal{E}_{k_{1}}\times\cdots( italic_u start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , ⋯ ) ∈ caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × ⋯, define pusubscript𝑝𝑢p_{u}italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT by pu(as,j+)=p(as,j)+2s2subscript𝑝𝑢superscriptsubscript𝑎𝑠𝑗𝑝subscript𝑎𝑠𝑗superscript2𝑠2p_{u}(a_{s,j}^{+})=p(a_{s,j})+2^{-s-2}italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) = italic_p ( italic_a start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT ) + 2 start_POSTSUPERSCRIPT - italic_s - 2 end_POSTSUPERSCRIPT and pu(as,j)=p(as,j)2s2subscript𝑝𝑢superscriptsubscript𝑎𝑠𝑗𝑝subscript𝑎𝑠𝑗superscript2𝑠2p_{u}(a_{s,j}^{-})=p(a_{s,j})-2^{-s-2}italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) = italic_p ( italic_a start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT ) - 2 start_POSTSUPERSCRIPT - italic_s - 2 end_POSTSUPERSCRIPT if ujs=+1subscriptsuperscript𝑢𝑠𝑗1u^{s}_{j}=+1italic_u start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = + 1 and pu(as,j+)=p(as,j)2s2subscript𝑝𝑢superscriptsubscript𝑎𝑠𝑗𝑝subscript𝑎𝑠𝑗superscript2𝑠2p_{u}(a_{s,j}^{+})=p(a_{s,j})-2^{-s-2}italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) = italic_p ( italic_a start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT ) - 2 start_POSTSUPERSCRIPT - italic_s - 2 end_POSTSUPERSCRIPT and pu(as,j)=p(as,j)+2s2subscript𝑝𝑢superscriptsubscript𝑎𝑠𝑗𝑝subscript𝑎𝑠𝑗superscript2𝑠2p_{u}(a_{s,j}^{-})=p(a_{s,j})+2^{-s-2}italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) = italic_p ( italic_a start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT ) + 2 start_POSTSUPERSCRIPT - italic_s - 2 end_POSTSUPERSCRIPT if ujs=1subscriptsuperscript𝑢𝑠𝑗1u^{s}_{j}=-1italic_u start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = - 1. For all other elements, pu(x)=p(x)subscript𝑝𝑢𝑥𝑝𝑥p_{u}(x)=p(x)italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_x ) = italic_p ( italic_x ). Then, it is easy to see that for all u𝑢uitalic_u, pu𝒩(P)subscript𝑝𝑢𝒩𝑃p_{u}\in\mathcal{N}(P)italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∈ caligraphic_N ( italic_P ). Further, using notation from Lemma 5.8, Equation 5 is satisfied with τs=122s2subscript𝜏𝑠12superscript2𝑠2\tau_{s}=\frac{1}{2}2^{-s-2}italic_τ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG 2 start_POSTSUPERSCRIPT - italic_s - 2 end_POSTSUPERSCRIPT since d(θ(pu),θ(pv))=TV(pu,pv)=2sτsdHam(us,vs)𝑑𝜃subscript𝑝𝑢𝜃subscript𝑝𝑣TVsubscript𝑝𝑢subscript𝑝𝑣2subscript𝑠subscript𝜏𝑠subscript𝑑𝐻𝑎𝑚superscript𝑢𝑠superscript𝑣𝑠d(\theta(p_{u}),\theta(p_{v}))=\text{\rm TV}(p_{u},p_{v})=2\sum_{s}\tau_{s}d_{% Ham}(u^{s},v^{s})italic_d ( italic_θ ( italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) , italic_θ ( italic_p start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) ) = TV ( italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) = 2 ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_H italic_a italic_m end_POSTSUBSCRIPT ( italic_u start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_v start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ) and Ds/n=maxjTV(p+(s,j),p(s,j))=2s2subscript𝐷𝑠𝑛subscript𝑗TVsubscript𝑝𝑠𝑗subscript𝑝𝑠𝑗superscript2𝑠2D_{s}/n=\max_{j}\text{\rm TV}(p_{+(s,j)},p_{-(s,j)})=2^{-s-2}italic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT / italic_n = roman_max start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT TV ( italic_p start_POSTSUBSCRIPT + ( italic_s , italic_j ) end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT - ( italic_s , italic_j ) end_POSTSUBSCRIPT ) = 2 start_POSTSUPERSCRIPT - italic_s - 2 end_POSTSUPERSCRIPT, which is less than κ𝜅\kappaitalic_κ whenever ks>0subscript𝑘𝑠0k_{s}>0italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT > 0. By eqn (6), 0.9e10εDs10Dsδ0.20.9superscript𝑒10𝜀subscript𝐷𝑠10subscript𝐷𝑠𝛿0.20.9e^{-10\varepsilon D_{s}}-10D_{s}\delta\geq 0.20.9 italic_e start_POSTSUPERSCRIPT - 10 italic_ε italic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT - 10 italic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_δ ≥ 0.2 whenever ks>0subscript𝑘𝑠0k_{s}>0italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT > 0 so by Lemma 5.8 we have

𝒩,n,ε(P)s122s2ks(0.2)=0.24s:ds>12s1ds/20.24×16xγP(2κ)P(x)subscript𝒩𝑛𝜀𝑃subscript𝑠12superscript2𝑠2subscript𝑘𝑠0.20.24subscript:𝑠subscript𝑑𝑠1superscript2𝑠1subscript𝑑𝑠20.2416subscript𝑥subscript𝛾𝑃2𝜅𝑃𝑥\mathcal{R}_{\mathcal{N},n,\varepsilon}(P)\geq\sum_{s}\frac{1}{2}2^{-s-2}k_{s}% (0.2)=\frac{0.2}{4}\sum_{s:d_{s}>1}2^{-s-1}{\lfloor{d_{s}/2}\rfloor}\geq\frac{% 0.2}{4\times 16}\sum_{x\notin\gamma_{{P}}\left({2\kappa}\right)}P(x)caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n , italic_ε end_POSTSUBSCRIPT ( italic_P ) ≥ ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG 2 start_POSTSUPERSCRIPT - italic_s - 2 end_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( 0.2 ) = divide start_ARG 0.2 end_ARG start_ARG 4 end_ARG ∑ start_POSTSUBSCRIPT italic_s : italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT > 1 end_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT - italic_s - 1 end_POSTSUPERSCRIPT ⌊ italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT / 2 ⌋ ≥ divide start_ARG 0.2 end_ARG start_ARG 4 × 16 end_ARG ∑ start_POSTSUBSCRIPT italic_x ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 2 italic_κ ) end_POSTSUBSCRIPT italic_P ( italic_x )

which completes the proof. ∎

Next we lower bound the statistical term.

Lemma 5.11.

For all n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N, ε>0𝜀0\varepsilon>0italic_ε > 0, δ[0,1]𝛿01\delta\in[0,1]italic_δ ∈ [ 0 , 1 ] and distributions P𝑃Pitalic_P, if n2𝑛2n\geq 2italic_n ≥ 2 and ε>2/n𝜀2𝑛\varepsilon>2/nitalic_ε > 2 / italic_n, then

𝒩,n,ε(P)𝒩,n(P)Ω(x[N]min{P(x)(1P(x)),P(x)(1P(x)n}).\mathcal{R}_{\mathcal{N},n,\varepsilon}(P)\geq\mathcal{R}_{\mathcal{N},n}(P)% \geq\Omega\left(\sum_{x\in[N]}\min\left\{P(x)(1-P(x)),\sqrt{\frac{P(x)(1-P(x)}% {n}}\right\}\right).caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n , italic_ε end_POSTSUBSCRIPT ( italic_P ) ≥ caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n end_POSTSUBSCRIPT ( italic_P ) ≥ roman_Ω ( ∑ start_POSTSUBSCRIPT italic_x ∈ [ italic_N ] end_POSTSUBSCRIPT roman_min { italic_P ( italic_x ) ( 1 - italic_P ( italic_x ) ) , square-root start_ARG divide start_ARG italic_P ( italic_x ) ( 1 - italic_P ( italic_x ) end_ARG start_ARG italic_n end_ARG end_ARG } ) .

To streamline the notation, we will use L(x)𝐿𝑥L(x)italic_L ( italic_x ) to denote min{x(1x),x(1x)n}𝑥1𝑥𝑥1𝑥𝑛\min\left\{x(1-x),\sqrt{\frac{x(1-x)}{n}}\right\}roman_min { italic_x ( 1 - italic_x ) , square-root start_ARG divide start_ARG italic_x ( 1 - italic_x ) end_ARG start_ARG italic_n end_ARG end_ARG }. In order to prove Lemma 5.11, we will need the following standard result from the statistics literature which allows us to lower bound the performance of any simple classifier distinguishing two distributions P𝑃Pitalic_P and Q𝑄Qitalic_Q by the KL divergence between P𝑃Pitalic_P and Q𝑄Qitalic_Q. We give a specific result for distinguishing Bernoulli random variables since we’ll use this in the proof of Lemma 5.11.

Lemma 5.12.

Given any pair of distributions P𝑃Pitalic_P and Q𝑄Qitalic_Q on the same domain,

minϕ(PrXPn(ϕ(X)=1)+PrXQn(ϕ(X)=1))12(1nKL(P,Q)),subscriptitalic-ϕsubscriptprobabilitysimilar-to𝑋superscript𝑃𝑛italic-ϕ𝑋1subscriptprobabilitysimilar-to𝑋superscript𝑄𝑛italic-ϕ𝑋1121𝑛KL𝑃𝑄\min_{\phi}\left(\Pr_{X\sim P^{n}}(\phi(X)=1)+\Pr_{X\sim Q^{n}}(\phi(X)=-1)% \right)\geq\frac{1}{2}(1-\sqrt{n\text{\rm KL}(P,Q)}),roman_min start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( roman_Pr start_POSTSUBSCRIPT italic_X ∼ italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ϕ ( italic_X ) = 1 ) + roman_Pr start_POSTSUBSCRIPT italic_X ∼ italic_Q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ϕ ( italic_X ) = - 1 ) ) ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( 1 - square-root start_ARG italic_n KL ( italic_P , italic_Q ) end_ARG ) ,

where the minimum is over all binary classifiers. In particular, if P=Bernoulli(pα)𝑃Bernoulli𝑝𝛼P=\texttt{Bernoulli}(p-\alpha)italic_P = Bernoulli ( italic_p - italic_α ) and Q=Bernoulli(p+α)𝑄Bernoulli𝑝𝛼Q=\texttt{Bernoulli}(p+\alpha)italic_Q = Bernoulli ( italic_p + italic_α ) where 0α12L(p)0𝛼12𝐿𝑝0\leq\alpha\leq\frac{1}{2}L(p)0 ≤ italic_α ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_L ( italic_p ) then

minϕ(PrXPn(ϕ(X)=1)+PrXQn(ϕ(X)=1))1/4,subscriptitalic-ϕsubscriptprobabilitysimilar-to𝑋superscript𝑃𝑛italic-ϕ𝑋1subscriptprobabilitysimilar-to𝑋superscript𝑄𝑛italic-ϕ𝑋114\min_{\phi}\left(\Pr_{X\sim P^{n}}(\phi(X)=1)+\Pr_{X\sim Q^{n}}(\phi(X)=-1)% \right)\geq 1/4,roman_min start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( roman_Pr start_POSTSUBSCRIPT italic_X ∼ italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ϕ ( italic_X ) = 1 ) + roman_Pr start_POSTSUBSCRIPT italic_X ∼ italic_Q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ϕ ( italic_X ) = - 1 ) ) ≥ 1 / 4 ,

where again the minimum is over all binary classifiers.

The proof of Lemma 5.12 can be found in Appendix C

Proof of Lemma 5.11.

As in the proof of Lemma 5.10, first suppose there exists a scale ssuperscript𝑠s^{*}italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT with ds=1subscript𝑑superscript𝑠1d_{s^{*}}=1italic_d start_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = 1 and there exists x𝒮ssuperscript𝑥subscript𝒮superscript𝑠x^{*}\in\mathcal{S}_{s^{*}}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_S start_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT such that

12L(P(x))160x[N]L(P(x)).12𝐿𝑃superscript𝑥160subscript𝑥delimited-[]𝑁𝐿𝑃𝑥\frac{1}{2}L(P(x^{*}))\geq\frac{1}{60}\sum_{x\in[N]}L(P(x)).divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_L ( italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ≥ divide start_ARG 1 end_ARG start_ARG 60 end_ARG ∑ start_POSTSUBSCRIPT italic_x ∈ [ italic_N ] end_POSTSUBSCRIPT italic_L ( italic_P ( italic_x ) ) .

Then define a distribution Psuperscript𝑃P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT by P(x)=P(x)12L(P(x))superscript𝑃superscript𝑥𝑃superscript𝑥12𝐿𝑃superscript𝑥P^{\prime}(x^{*})=P(x^{*})-\frac{1}{2}L(P(x^{*}))italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_L ( italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) and for all xx𝑥superscript𝑥x\neq x^{*}italic_x ≠ italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, P(x)=1P(x)+12L(P(x))1P(x)P(x)superscript𝑃𝑥1𝑃superscript𝑥12𝐿𝑃superscript𝑥1𝑃superscript𝑥𝑃𝑥P^{\prime}(x)=\frac{1-P(x^{*})+\frac{1}{2}L(P(x^{*}))}{1-P(x^{*})}P(x)italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) = divide start_ARG 1 - italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_L ( italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) end_ARG start_ARG 1 - italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG italic_P ( italic_x ). Then P𝒩(P)superscript𝑃𝒩𝑃P^{\prime}\in\mathcal{N}(P)italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_N ( italic_P ) since 12L(P(x))<12min{P(x),(1P(x))}12𝐿𝑃superscript𝑥12𝑃superscript𝑥1𝑃superscript𝑥\frac{1}{2}L(P(x^{*}))<\frac{1}{2}\min\{P(x^{*}),(1-P(x^{*}))\}divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_L ( italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) < divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_min { italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) , ( 1 - italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) }. Then we will use Lemma 5.8 with k0=1subscript𝑘01k_{0}=1italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 and ks=0subscript𝑘𝑠0k_{s}=0italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 0 for s>0𝑠0s>0italic_s > 0, and k0subscriptsubscript𝑘0\mathcal{E}_{k_{0}}caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT corresponds to {P,P}𝑃superscript𝑃\{P,P^{\prime}\}{ italic_P , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT }. Now,

KL(P,P)KLsuperscript𝑃𝑃\displaystyle\text{\rm KL}(P^{\prime},P)KL ( italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_P ) =(P(x)12L(P(x)))lnP(x)12L(P(x))P(x)+(1P(x)+12L(P(x)))ln1P(x)+12L(P(x))1P(x)absent𝑃superscript𝑥12𝐿𝑃superscript𝑥𝑃superscript𝑥12𝐿𝑃superscript𝑥𝑃superscript𝑥1𝑃superscript𝑥12𝐿𝑃superscript𝑥1𝑃superscript𝑥12𝐿𝑃superscript𝑥1𝑃superscript𝑥\displaystyle=(P(x^{*})-\frac{1}{2}L(P(x^{*})))\ln\frac{P(x^{*})-\frac{1}{2}L(% P(x^{*}))}{P(x^{*})}+(1-P(x^{*})+\frac{1}{2}L(P(x^{*})))\ln\frac{1-P(x^{*})+% \frac{1}{2}L(P(x^{*}))}{1-P(x^{*})}= ( italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_L ( italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ) roman_ln divide start_ARG italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_L ( italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) end_ARG start_ARG italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG + ( 1 - italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_L ( italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ) roman_ln divide start_ARG 1 - italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_L ( italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) end_ARG start_ARG 1 - italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG
14nabsent14𝑛\displaystyle\leq\frac{1}{4n}≤ divide start_ARG 1 end_ARG start_ARG 4 italic_n end_ARG

(for more detail on the proof of this inequality see the proof of Lemma 5.12) so

minϕ(PrXPn(ϕ(X)=1)+PrXPn(ϕ(X)=1))12(1nKL(P,P))1/4subscriptitalic-ϕsubscriptprobabilitysimilar-to𝑋superscript𝑃𝑛italic-ϕ𝑋1subscriptprobabilitysimilar-to𝑋superscriptsuperscript𝑃𝑛italic-ϕ𝑋1121𝑛KL𝑃superscript𝑃14\min_{\phi}\left(\Pr_{X\sim P^{n}}(\phi(X)=1)+\Pr_{X\sim{P^{\prime}}^{n}}(\phi% (X)=-1)\right)\geq\frac{1}{2}(1-\sqrt{n\text{\rm KL}(P,P^{\prime})})\geq 1/4roman_min start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( roman_Pr start_POSTSUBSCRIPT italic_X ∼ italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ϕ ( italic_X ) = 1 ) + roman_Pr start_POSTSUBSCRIPT italic_X ∼ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ϕ ( italic_X ) = - 1 ) ) ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( 1 - square-root start_ARG italic_n KL ( italic_P , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG ) ≥ 1 / 4

and τ0=TV(P,P)=12L(P(x))subscript𝜏0TV𝑃superscript𝑃12𝐿𝑃superscript𝑥\tau_{0}=\text{\rm TV}(P,P^{\prime})=\frac{1}{2}L(P(x^{*}))italic_τ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = TV ( italic_P , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_L ( italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ). Thus by Lemma 5.8,

𝒩,n,ε(P)𝒩,n(P)12L(P(x))141480x[N]12L(P(x)),subscript𝒩𝑛𝜀𝑃subscript𝒩𝑛𝑃12𝐿𝑃superscript𝑥141480subscript𝑥delimited-[]𝑁12𝐿𝑃𝑥\mathcal{R}_{\mathcal{N},n,\varepsilon}(P)\geq\mathcal{R}_{\mathcal{N},n}(P)% \geq\frac{1}{2}L(P(x^{*}))\frac{1}{4}\geq\frac{1}{480}\sum_{x\in[N]}\frac{1}{2% }L(P(x)),caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n , italic_ε end_POSTSUBSCRIPT ( italic_P ) ≥ caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n end_POSTSUBSCRIPT ( italic_P ) ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_L ( italic_P ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) divide start_ARG 1 end_ARG start_ARG 4 end_ARG ≥ divide start_ARG 1 end_ARG start_ARG 480 end_ARG ∑ start_POSTSUBSCRIPT italic_x ∈ [ italic_N ] end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_L ( italic_P ( italic_x ) ) ,

and we are done.

On the other hand, suppose that for all scales s𝑠sitalic_s such that ds=1subscript𝑑𝑠1d_{s}=1italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 1 we have

L(P(xs))130x[N]L(P(x)),𝐿𝑃subscript𝑥𝑠130subscript𝑥delimited-[]𝑁𝐿𝑃𝑥L(P(x_{s}))~{}\leq~{}\frac{1}{30}\sum_{x\in[N]}L(P(x)),italic_L ( italic_P ( italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) ) ≤ divide start_ARG 1 end_ARG start_ARG 30 end_ARG ∑ start_POSTSUBSCRIPT italic_x ∈ [ italic_N ] end_POSTSUBSCRIPT italic_L ( italic_P ( italic_x ) ) ,

where 𝒮s={xs}subscript𝒮𝑠subscript𝑥𝑠\mathcal{S}_{s}=\{x_{s}\}caligraphic_S start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = { italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT }. As in the proof of Lemma 5.10, we will argue that we can ignore any singleton scales, and assume that dssubscript𝑑𝑠d_{s}italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT is even for all scales. Let s=min{s>0|ds=1}superscript𝑠𝑠conditional0subscript𝑑𝑠1s^{*}=\min\{s>0\;|\;d_{s}=1\}italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_min { italic_s > 0 | italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 1 } so

s:ds=1L(P(xs))subscript:𝑠subscript𝑑𝑠1𝐿𝑃subscript𝑥𝑠\displaystyle\sum_{s:d_{s}=1}L(P(x_{s}))∑ start_POSTSUBSCRIPT italic_s : italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT italic_L ( italic_P ( italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) ) χd0=1L(P(x0))+s>0:ds=1min{2s,2sn}absentsubscript𝜒subscript𝑑01𝐿𝑃subscript𝑥0subscript:𝑠0subscript𝑑𝑠1superscript2𝑠superscript2𝑠𝑛\displaystyle\leq\chi_{d_{0}=1}L(P(x_{0}))+\sum_{s>0:d_{s}=1}\min\left\{2^{-s}% ,\sqrt{\frac{2^{-s}}{n}}\right\}≤ italic_χ start_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT italic_L ( italic_P ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) + ∑ start_POSTSUBSCRIPT italic_s > 0 : italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT roman_min { 2 start_POSTSUPERSCRIPT - italic_s end_POSTSUPERSCRIPT , square-root start_ARG divide start_ARG 2 start_POSTSUPERSCRIPT - italic_s end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG end_ARG }
χd0=1L(P(x0))+(2+2)min{2s,2sn}absentsubscript𝜒subscript𝑑01𝐿𝑃subscript𝑥022superscript2superscript𝑠superscript2superscript𝑠𝑛\displaystyle\leq\chi_{d_{0}=1}L(P(x_{0}))+(2+\sqrt{2})\min\left\{2^{-s^{*}},% \sqrt{\frac{2^{-s^{*}}}{n}}\right\}≤ italic_χ start_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT italic_L ( italic_P ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) + ( 2 + square-root start_ARG 2 end_ARG ) roman_min { 2 start_POSTSUPERSCRIPT - italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT , square-root start_ARG divide start_ARG 2 start_POSTSUPERSCRIPT - italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG end_ARG }
χd0=1L(P(x0))+2(2+2)min{P(xs),P(xs)n}absentsubscript𝜒subscript𝑑01𝐿𝑃subscript𝑥0222𝑃subscript𝑥superscript𝑠𝑃subscript𝑥superscript𝑠𝑛\displaystyle\leq\chi_{d_{0}=1}L(P(x_{0}))+2(2+\sqrt{2})\min\left\{P(x_{s^{*}}% ),\sqrt{\frac{P(x_{s^{*}})}{n}}\right\}≤ italic_χ start_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT italic_L ( italic_P ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) + 2 ( 2 + square-root start_ARG 2 end_ARG ) roman_min { italic_P ( italic_x start_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) , square-root start_ARG divide start_ARG italic_P ( italic_x start_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_ARG start_ARG italic_n end_ARG end_ARG }
χd0=1L(P(x0))+4(2+2)L(P(xs))absentsubscript𝜒subscript𝑑01𝐿𝑃subscript𝑥0422𝐿𝑃subscript𝑥superscript𝑠\displaystyle\leq\chi_{d_{0}=1}L(P(x_{0}))+4(2+\sqrt{2})L(P(x_{s^{*}}))≤ italic_χ start_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT italic_L ( italic_P ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) + 4 ( 2 + square-root start_ARG 2 end_ARG ) italic_L ( italic_P ( italic_x start_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) )
1+4(2+2)30x[N]L(P(x)).absent142230subscript𝑥delimited-[]𝑁𝐿𝑃𝑥\displaystyle\leq\frac{1+4(2+\sqrt{2})}{30}\sum_{x\in[N]}L(P(x)).≤ divide start_ARG 1 + 4 ( 2 + square-root start_ARG 2 end_ARG ) end_ARG start_ARG 30 end_ARG ∑ start_POSTSUBSCRIPT italic_x ∈ [ italic_N ] end_POSTSUBSCRIPT italic_L ( italic_P ( italic_x ) ) .

Therefore, s:ds>1x𝒮sL(P(x))(1/2)x[N]L(P(x))subscript:𝑠subscript𝑑𝑠1subscript𝑥subscript𝒮𝑠𝐿𝑃𝑥12subscript𝑥delimited-[]𝑁𝐿𝑃𝑥\sum_{s:d_{s}>1}\sum_{x\in\mathcal{S}_{s}}L(P(x))\geq(1/2)\sum_{x\in[N]}L(P(x))∑ start_POSTSUBSCRIPT italic_s : italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT > 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_S start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_L ( italic_P ( italic_x ) ) ≥ ( 1 / 2 ) ∑ start_POSTSUBSCRIPT italic_x ∈ [ italic_N ] end_POSTSUBSCRIPT italic_L ( italic_P ( italic_x ) ) and so

sL(2s1)ds/2subscript𝑠𝐿superscript2𝑠1subscript𝑑𝑠2\displaystyle\sum_{s}L(2^{-s-1}){\lfloor{d_{s}/2}\rfloor}∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_L ( 2 start_POSTSUPERSCRIPT - italic_s - 1 end_POSTSUPERSCRIPT ) ⌊ italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT / 2 ⌋ s:ds>1L(2s1)13dsabsentsubscript:𝑠subscript𝑑𝑠1𝐿superscript2𝑠113subscript𝑑𝑠\displaystyle\geq\sum_{s:d_{s}>1}L(2^{-s-1})\frac{1}{3}d_{s}≥ ∑ start_POSTSUBSCRIPT italic_s : italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT > 1 end_POSTSUBSCRIPT italic_L ( 2 start_POSTSUPERSCRIPT - italic_s - 1 end_POSTSUPERSCRIPT ) divide start_ARG 1 end_ARG start_ARG 3 end_ARG italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT
s:ds>1x𝒮sL(2s1)13absentsubscript:𝑠subscript𝑑𝑠1subscript𝑥subscript𝒮𝑠𝐿superscript2𝑠113\displaystyle\geq\sum_{s:d_{s}>1}\sum_{x\in\mathcal{S}_{s}}L(2^{-s-1})\frac{1}% {3}≥ ∑ start_POSTSUBSCRIPT italic_s : italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT > 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_S start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_L ( 2 start_POSTSUPERSCRIPT - italic_s - 1 end_POSTSUPERSCRIPT ) divide start_ARG 1 end_ARG start_ARG 3 end_ARG
132x[N]L(P(x))absent132subscript𝑥delimited-[]𝑁𝐿𝑃𝑥\displaystyle\geq\frac{1}{3\sqrt{2}}\sum_{x\in[N]}L(P(x))≥ divide start_ARG 1 end_ARG start_ARG 3 square-root start_ARG 2 end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_x ∈ [ italic_N ] end_POSTSUBSCRIPT italic_L ( italic_P ( italic_x ) ) (7)

where the first inequality follows from ds/2(1/3)dssubscript𝑑𝑠213subscript𝑑𝑠{\lfloor{d_{s}/2}\rfloor}\geq(1/3)d_{s}⌊ italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT / 2 ⌋ ≥ ( 1 / 3 ) italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT whenever ds>1subscript𝑑𝑠1d_{s}>1italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT > 1, and the second follows because 2s1P(x)1/2superscript2𝑠1𝑃𝑥122^{-s-1}\leq P(x)\leq 1/22 start_POSTSUPERSCRIPT - italic_s - 1 end_POSTSUPERSCRIPT ≤ italic_P ( italic_x ) ≤ 1 / 2 for all x𝒮s𝑥subscript𝒮𝑠x\in\mathcal{S}_{s}italic_x ∈ caligraphic_S start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT such that ds>1subscript𝑑𝑠1d_{s}>1italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT > 1.

Assume that dssubscript𝑑𝑠d_{s}italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT is even for all s𝑠sitalic_s. Within each scale 𝒮ssubscript𝒮𝑠\mathcal{S}_{s}caligraphic_S start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, pair the elements to form ks=ds/2subscript𝑘𝑠subscript𝑑𝑠2k_{s}=d_{s}/2italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = italic_d start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT / 2 distinct pairs (as,j+,as,j)subscriptsuperscript𝑎𝑠𝑗subscriptsuperscript𝑎𝑠𝑗(a^{+}_{s,j},a^{-}_{s,j})( italic_a start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT , italic_a start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT ) per scale. For all s𝑠s\in\mathbb{N}italic_s ∈ blackboard_N, let αs=12L(2s1)subscript𝛼𝑠12𝐿superscript2𝑠1\alpha_{s}=\frac{1}{2}L(2^{-s-1})italic_α start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_L ( 2 start_POSTSUPERSCRIPT - italic_s - 1 end_POSTSUPERSCRIPT ), and note that for all x𝒮s𝑥subscript𝒮𝑠x\in\mathcal{S}_{s}italic_x ∈ caligraphic_S start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and s>0𝑠0s>0italic_s > 0, αs12L(P(x))subscript𝛼𝑠12𝐿𝑃𝑥\alpha_{s}\leq\frac{1}{2}L(P(x))italic_α start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_L ( italic_P ( italic_x ) ). Given (u0,u1,)k0×k1×superscript𝑢0superscript𝑢1subscriptsubscript𝑘0subscriptsubscript𝑘1(u^{0},u^{1},\cdots)\in\mathcal{E}_{k_{0}}\times\mathcal{E}_{k_{1}}\times\cdots( italic_u start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , ⋯ ) ∈ caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × ⋯, define pusubscript𝑝𝑢p_{u}italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT by pu(as,j+)=p(as,j+)+αssubscript𝑝𝑢superscriptsubscript𝑎𝑠𝑗𝑝superscriptsubscript𝑎𝑠𝑗subscript𝛼𝑠p_{u}(a_{s,j}^{+})=p(a_{s,j}^{+})+\alpha_{s}italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) = italic_p ( italic_a start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) + italic_α start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and pu(as,j)=p(as,j)αssubscript𝑝𝑢superscriptsubscript𝑎𝑠𝑗𝑝superscriptsubscript𝑎𝑠𝑗subscript𝛼𝑠p_{u}(a_{s,j}^{-})=p(a_{s,j}^{-})-\alpha_{s}italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) = italic_p ( italic_a start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) - italic_α start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT if ujs=+1subscriptsuperscript𝑢𝑠𝑗1u^{s}_{j}=+1italic_u start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = + 1 and pu(as,j+)=p(as,j+)αssubscript𝑝𝑢superscriptsubscript𝑎𝑠𝑗𝑝superscriptsubscript𝑎𝑠𝑗subscript𝛼𝑠p_{u}(a_{s,j}^{+})=p(a_{s,j}^{+})-\alpha_{s}italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) = italic_p ( italic_a start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) - italic_α start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and pu(as,j)=p(as,j)+αssubscript𝑝𝑢superscriptsubscript𝑎𝑠𝑗𝑝superscriptsubscript𝑎𝑠𝑗subscript𝛼𝑠p_{u}(a_{s,j}^{-})=p(a_{s,j}^{-})+\alpha_{s}italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) = italic_p ( italic_a start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) + italic_α start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT if ujs=1subscriptsuperscript𝑢𝑠𝑗1u^{s}_{j}=-1italic_u start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = - 1. For all other elements, pu(x)=p(x)subscript𝑝𝑢𝑥𝑝𝑥p_{u}(x)=p(x)italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_x ) = italic_p ( italic_x ). Then, for all u𝑢uitalic_u, pu𝒩(P)subscript𝑝𝑢𝒩𝑃p_{u}\in\mathcal{N}(P)italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∈ caligraphic_N ( italic_P ). Further, using notation from Lemma 5.8, we have τs=αssubscript𝜏𝑠subscript𝛼𝑠\tau_{s}=\alpha_{s}italic_τ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = italic_α start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT. Also, for any (s,j)𝑠𝑗(s,j)( italic_s , italic_j ), p+(s,j)subscript𝑝𝑠𝑗p_{+(s,j)}italic_p start_POSTSUBSCRIPT + ( italic_s , italic_j ) end_POSTSUBSCRIPT and p(s,j)subscript𝑝𝑠𝑗p_{-(s,j)}italic_p start_POSTSUBSCRIPT - ( italic_s , italic_j ) end_POSTSUBSCRIPT only differ on as,j+subscriptsuperscript𝑎𝑠𝑗a^{+}_{s,j}italic_a start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT and as,jsubscriptsuperscript𝑎𝑠𝑗a^{-}_{s,j}italic_a start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT where p+(s,j)(as,j+)=P(as,j+)+αssubscript𝑝𝑠𝑗subscriptsuperscript𝑎𝑠𝑗𝑃subscriptsuperscript𝑎𝑠𝑗subscript𝛼𝑠p_{+(s,j)}(a^{+}_{s,j})=P(a^{+}_{s,j})+\alpha_{s}italic_p start_POSTSUBSCRIPT + ( italic_s , italic_j ) end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT ) = italic_P ( italic_a start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT ) + italic_α start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and p(s,j)(as,j+)=P(as,j+)αssubscript𝑝𝑠𝑗subscriptsuperscript𝑎𝑠𝑗𝑃subscriptsuperscript𝑎𝑠𝑗subscript𝛼𝑠p_{-(s,j)}(a^{+}_{s,j})=P(a^{+}_{s,j})-\alpha_{s}italic_p start_POSTSUBSCRIPT - ( italic_s , italic_j ) end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT ) = italic_P ( italic_a start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT ) - italic_α start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT. Therefore, by Lemma 5.12, and the post-processing inequality,

minϕ(PrXp+(s,jn(ϕ(X)=1)+PrXp(s,jn(ϕ(X)=1))1/4.\min_{\phi}\left(\Pr_{X\sim{p_{+(s,j}}^{n}}(\phi(X)=1)+\Pr_{X\sim{p_{-(s,j}}^{% n}}(\phi(X)=-1)\right)\geq 1/4.roman_min start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( roman_Pr start_POSTSUBSCRIPT italic_X ∼ italic_p start_POSTSUBSCRIPT + ( italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ϕ ( italic_X ) = 1 ) + roman_Pr start_POSTSUBSCRIPT italic_X ∼ italic_p start_POSTSUBSCRIPT - ( italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ϕ ( italic_X ) = - 1 ) ) ≥ 1 / 4 .

Lemma 5.8 then implies the result. ∎

Theorem 5.7 follows immediately from Lemma 5.9, Lemma 5.10 and Lemma 5.11.

5.3 An ε𝜀\varepsilonitalic_ε-DP Distribution Estimation Algorithm

Now, let us return to HSTs and designing an estimation algorithm that achieves the target estimation rate, up to logarithmic factors. As in the one-dimensional setting, we want to restrict to only privately estimating the density at a small number (εnabsent𝜀𝑛\approx\varepsilon n≈ italic_ε italic_n) of points. While we could try to mimic the one-dimensional solution by privately estimating a solution to the εn𝜀𝑛\varepsilon nitalic_ε italic_n-median problem, it’s not clear how to prove that such an approach is instance-optimal. It turns out that a simpler solution more amenable to analysis will suffice. Our algorithm has two stages; first we attempt to find the set of log(1/δ)εn1𝛿𝜀𝑛\frac{\log(1/\delta)}{\varepsilon n}divide start_ARG roman_log ( start_ARG 1 / italic_δ end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG-active nodes, then we estimate the weight of these active nodes. Since these nodes have weight greater than log(1/δ)εn1𝛿𝜀𝑛\frac{\log(1/\delta)}{\varepsilon n}divide start_ARG roman_log ( start_ARG 1 / italic_δ end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG, we can privately estimate them to within constant multiplicative error.

Let 𝒳𝒳\mathcal{X}caligraphic_X be the underlying metric space so PΔ(𝒳)𝑃Δ𝒳P\in\Delta(\mathcal{X})italic_P ∈ roman_Δ ( caligraphic_X ). For any set S𝑆Sitalic_S of nodes and a function F𝐹Fitalic_F defined on the nodes, define the function F|Sevaluated-at𝐹𝑆F|_{S}italic_F | start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT as F|S(ν)=F(ν)evaluated-at𝐹𝑆𝜈𝐹𝜈F|_{S}(\nu)=F(\nu)italic_F | start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( italic_ν ) = italic_F ( italic_ν ) if νS𝜈𝑆\nu\in Sitalic_ν ∈ italic_S and F|S(ν)=0evaluated-at𝐹𝑆𝜈0F|_{S}(\nu)=0italic_F | start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( italic_ν ) = 0 otherwise. Given two functions F𝐹Fitalic_F and G𝐺Gitalic_G defined on the nodes, we define

𝔚(F,G)=νrν|F(ν)G(ν)|,𝔚𝐹𝐺subscript𝜈subscript𝑟𝜈𝐹𝜈𝐺𝜈\mathfrak{W}(F,G)=\sum_{\nu}r_{\nu}|F(\nu)-G(\nu)|,fraktur_W ( italic_F , italic_G ) = ∑ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT | italic_F ( italic_ν ) - italic_G ( italic_ν ) | ,

where rνsubscript𝑟𝜈r_{\nu}italic_r start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT is the length of the edge connecting ν𝜈\nuitalic_ν to its parent, and the sum is over all nodes in the tree. So by Lemma 5.4, 𝒲(P,Q)=𝔚(𝔊P,𝔊Q)𝒲𝑃𝑄𝔚subscript𝔊𝑃subscript𝔊𝑄\mathcal{W}(P,Q)=\mathfrak{W}(\mathfrak{G}_{P},\mathfrak{G}_{Q})caligraphic_W ( italic_P , italic_Q ) = fraktur_W ( fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT , fraktur_G start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ). Note that 𝔚𝔚\mathfrak{W}fraktur_W satisfies the triangle inequality.

Algorithm 1 PrivDensityEstTree
1:Input: D𝒳n,ε𝐷superscript𝒳𝑛𝜀D\in\mathcal{X}^{n},\varepsilonitalic_D ∈ caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_ε
2:𝔊P^=EmpDist(D)^subscript𝔊𝑃EmpDist𝐷\widehat{\mathfrak{G}_{P}}=\texttt{EmpDist}(D)over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG = EmpDist ( italic_D ) \triangleright Compute empirical distribution.
3:γ^ε=LocateActiveNodes(𝔊P^;ε)subscript^𝛾𝜀LocateActiveNodes^subscript𝔊𝑃𝜀\hat{\gamma}_{\varepsilon}=\texttt{LocateActiveNodes}(\widehat{\mathfrak{G}_{P% }};\varepsilon)over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT = LocateActiveNodes ( over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ; italic_ε ) \triangleright Privately approximate set of active nodes.
4:Define 𝔊P^n,γ^ε~~subscript𝔊subscript^𝑃𝑛subscript^𝛾𝜀\widetilde{\mathfrak{G}_{\hat{P}_{n},\hat{\gamma}_{\varepsilon}}}over~ start_ARG fraktur_G start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG by 𝔊P^n,γ^ε~(x)={0if xγ^ε𝔊P^(x)+𝖫𝖺𝗉(1εn))otherwise.\widetilde{\mathfrak{G}_{\hat{P}_{n},\hat{\gamma}_{\varepsilon}}}(x)=\begin{% cases}0&\text{if }x\notin\hat{\gamma}_{\varepsilon}\\ \widehat{\mathfrak{G}_{P}}(x)+\mathsf{Lap}(\frac{1}{\varepsilon n}))&\text{% otherwise.}\end{cases}over~ start_ARG fraktur_G start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ( italic_x ) = { start_ROW start_CELL 0 end_CELL start_CELL if italic_x ∉ over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ( italic_x ) + sansserif_Lap ( divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG ) ) end_CELL start_CELL otherwise. end_CELL end_ROW\triangleright Approximate densities.
5:P^n,ε=Projection(𝔊P^n,γ^ε~)subscript^𝑃𝑛𝜀Projection~subscript𝔊subscript^𝑃𝑛subscript^𝛾𝜀\hat{P}_{n,\varepsilon}=\texttt{Projection}(\widetilde{\mathfrak{G}_{\hat{P}_{% n},\hat{\gamma}_{\varepsilon}}})over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n , italic_ε end_POSTSUBSCRIPT = Projection ( over~ start_ARG fraktur_G start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ) \triangleright Project noisy densities onto space of distributions.
6:return P^n,εsubscript^𝑃𝑛𝜀\hat{P}_{n,\varepsilon}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n , italic_ε end_POSTSUBSCRIPT

A high-level outline of the proposed algorithm is given in Algorithm 1. Now, we state the main theorem of this section.

Theorem 5.13.

Given any ε>0𝜀0\varepsilon>0italic_ε > 0, PrivDensityEstTree is
(0pt+1)ε0𝑝𝑡1𝜀(0pt+1)\varepsilon( 0 italic_p italic_t + 1 ) italic_ε-DP. Given a distribution P𝑃Pitalic_P, with probability 1(0ptlogn+40ptεn)β10𝑝𝑡𝑛40𝑝𝑡𝜀𝑛𝛽1-(0pt\log n+40pt\varepsilon n)\beta1 - ( 0 italic_p italic_t roman_log italic_n + 40 italic_p italic_t italic_ε italic_n ) italic_β,

𝒲(P,P^ε)=O([0pt]x[N]min{P(x),1P(x)P(x)log(n/β)n}\displaystyle\mathcal{W}(P,\hat{P}_{\varepsilon})=O\Bigg{(}\sum_{\ell\in[0pt]}% \sum_{x\in[N_{\ell}]}\min\left\{P_{\ell}(x),1-P_{\ell}(x)\sqrt{\frac{P_{\ell}(% x)\log(n/\beta)}{n}}\right\}caligraphic_W ( italic_P , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ) = italic_O ( ∑ start_POSTSUBSCRIPT roman_ℓ ∈ [ 0 italic_p italic_t ] end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_x ∈ [ italic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT roman_min { italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) , 1 - italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) square-root start_ARG divide start_ARG italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG }
+νγP(max{2εn+2log(2/β)εn,192log(n/β)εn})𝔊P(ν)+|γP(12εn)1|log(1/β)εn)\displaystyle\hskip 72.26999pt+\sum_{\nu\notin\gamma_{{P}}\left({\max\{\frac{2% }{\varepsilon n}+2\frac{\log(2/\beta)}{\varepsilon n},\frac{192\log(n/\beta)}{% \varepsilon n}\}}\right)}\mathfrak{G}_{P}(\nu)+\frac{|\gamma_{{P_{\ell}}}\left% ({\frac{1}{2\varepsilon n}}\right)-1|\log(1/\beta)}{\varepsilon n}\Bigg{)}+ ∑ start_POSTSUBSCRIPT italic_ν ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( roman_max { divide start_ARG 2 end_ARG start_ARG italic_ε italic_n end_ARG + 2 divide start_ARG roman_log ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG , divide start_ARG 192 roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG } ) end_POSTSUBSCRIPT fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) + divide start_ARG | italic_γ start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) - 1 | roman_log ( start_ARG 1 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG )

This bound has the same three terms as our lower bound on 𝒩,n,εsubscript𝒩𝑛𝜀\mathcal{R}_{\mathcal{N},n,\varepsilon}caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n , italic_ε end_POSTSUBSCRIPT in Theorem 5.5 corresponding again to the empirical error (the error inherent even in the absence of a privacy requirement), the error from the private algorithm not being able to estimate the probability of events that occur with probability less than log(1/δ)/εnabsent1𝛿𝜀𝑛\approx\log(1/\delta)/\varepsilon n≈ roman_log ( start_ARG 1 / italic_δ end_ARG ) / italic_ε italic_n, and the error due to the noise added to the active nodes. The maximum over the levels that appeared in the lower bound is replaced with a sum over the levels in the upper bound, so, up to logarithmic factors, the upper bound is within a factor of 0pt0𝑝𝑡0pt0 italic_p italic_t of the lower bound. Since we can not hope to locate the set of log(1/δ)/(εn)1𝛿𝜀𝑛\log(1/\delta)/(\varepsilon n)roman_log ( start_ARG 1 / italic_δ end_ARG ) / ( italic_ε italic_n )-active nodes exactly with a private algorithm, we find a set γ^nsubscript^𝛾𝑛\hat{\gamma}_{n}over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT that is guaranteed to satisfy

γP(max{2εn+2log(2/β)n,192log(n/β)n})γ^εγP(12εn).subscript𝛾𝑃2𝜀𝑛22𝛽𝑛192𝑛𝛽𝑛subscript^𝛾𝜀subscript𝛾𝑃12𝜀𝑛\gamma_{{P}}\left({\max\left\{\frac{2}{\varepsilon n}+2\frac{\log(2/\beta)}{n}% ,\frac{192\log(n/\beta)}{n}\right\}}\right)\subset\hat{\gamma}_{\varepsilon}% \subset\gamma_{{P}}\left({\frac{1}{2\varepsilon n}}\right).italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( roman_max { divide start_ARG 2 end_ARG start_ARG italic_ε italic_n end_ARG + 2 divide start_ARG roman_log ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG , divide start_ARG 192 roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG } ) ⊂ over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ⊂ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) .

Note that max{2εn+2log(2/β)n,192log(n/β)n}Clog(n/β)εn2𝜀𝑛22𝛽𝑛192𝑛𝛽𝑛𝐶𝑛𝛽𝜀𝑛\max\left\{\frac{2}{\varepsilon n}+2\frac{\log(2/\beta)}{n},\frac{192\log(n/% \beta)}{n}\right\}\leq\frac{C\log(n/\beta)}{\varepsilon n}roman_max { divide start_ARG 2 end_ARG start_ARG italic_ε italic_n end_ARG + 2 divide start_ARG roman_log ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG , divide start_ARG 192 roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG } ≤ divide start_ARG italic_C roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG so the error introduced here by not estimating γP(1εn)subscript𝛾𝑃1𝜀𝑛\gamma_{{P}}\left({\frac{1}{\varepsilon n}}\right)italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG ) perfectly is at most a logarithmic multiplicative factor.

Algorithm 2 EmpDist
1:Input: D𝒳n,A𝐷superscript𝒳𝑛𝐴D\in\mathcal{X}^{n},Aitalic_D ∈ caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_A
2:Let P^nsubscript^𝑃𝑛\hat{P}_{n}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be the empirical distribution.
3:for all node ν𝜈\nuitalic_ν do
4:     𝔊P^(ν)={0𝔊P^n(ν)<log(n/β)n1𝔊P^n(ν)>1logn/β)n𝔊P^n(ν)otherwise\widehat{\mathfrak{G}_{P}}(\nu)=\begin{cases}0&\mathfrak{G}_{\hat{P}_{n}}(\nu)% <\frac{\sqrt{\log(n/\beta)}}{n}\\ 1&\mathfrak{G}_{\hat{P}_{n}}(\nu)>1-\frac{\sqrt{\log n/\beta)}}{n}\\ \mathfrak{G}_{\hat{P}_{n}}(\nu)&\text{otherwise}\end{cases}over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ( italic_ν ) = { start_ROW start_CELL 0 end_CELL start_CELL fraktur_G start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_ν ) < divide start_ARG square-root start_ARG roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG end_ARG start_ARG italic_n end_ARG end_CELL end_ROW start_ROW start_CELL 1 end_CELL start_CELL fraktur_G start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_ν ) > 1 - divide start_ARG square-root start_ARG roman_log italic_n / italic_β ) end_ARG end_ARG start_ARG italic_n end_ARG end_CELL end_ROW start_ROW start_CELL fraktur_G start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_ν ) end_CELL start_CELL otherwise end_CELL end_ROW

The first step of our algorithm is to estimate the empirical distribution. We use a truncated version of the standard empirical distribution. This allows us to achieve an error rate of min{P(x),P(x)/n}𝑃𝑥𝑃𝑥𝑛~{}\min\{P(x),\sqrt{P(x)/n}\}roman_min { italic_P ( italic_x ) , square-root start_ARG italic_P ( italic_x ) / italic_n end_ARG } even when P(x)𝑃𝑥P(x)italic_P ( italic_x ) is small.

The proof of the following lemma is contained in Appendix C.

Lemma 5.14.

For any distribution P𝑃Pitalic_P, if log(n/β)>1𝑛𝛽1\log(n/\beta)>1roman_log ( start_ARG italic_n / italic_β end_ARG ) > 1 then with probability 130ptβ130𝑝𝑡𝛽1-30pt\beta1 - 30 italic_p italic_t italic_β,

𝔚(𝔊P^,𝔊P)[0pt]x[N]min{P(x)(1P(x)),43P(x)(1P(x))log(n/β)n}𝔚^subscript𝔊𝑃subscript𝔊𝑃subscriptdelimited-[]0𝑝𝑡subscript𝑥delimited-[]subscript𝑁subscript𝑃𝑥1subscript𝑃𝑥43subscript𝑃𝑥1subscript𝑃𝑥𝑛𝛽𝑛\mathfrak{W}(\widehat{\mathfrak{G}_{P}},\mathfrak{G}_{P})\leq\sum_{\ell\in[0pt% ]}\sum_{x\in[N_{\ell}]}\min\left\{P_{\ell}(x)(1-P_{\ell}(x)),4\sqrt{3\frac{P_{% \ell}(x)(1-P_{\ell}(x))\log(n/\beta)}{n}}\right\}fraktur_W ( over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG , fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ) ≤ ∑ start_POSTSUBSCRIPT roman_ℓ ∈ [ 0 italic_p italic_t ] end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_x ∈ [ italic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT roman_min { italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) ( 1 - italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) ) , 4 square-root start_ARG 3 divide start_ARG italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) ( 1 - italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) ) roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG }
Algorithm 3 LocateActiveNodes
1:Input: 𝔊P^,ε^subscript𝔊𝑃𝜀\widehat{\mathfrak{G}_{P}},\varepsilonover^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG , italic_ε
2:Let =00\ell=0roman_ℓ = 0 and γ^ε,0={ν}subscript^𝛾𝜀0𝜈\hat{\gamma}_{\varepsilon,0}=\{\nu\}over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε , 0 end_POSTSUBSCRIPT = { italic_ν } where ν𝜈\nuitalic_ν is the root node.
3:while γ^ε,subscript^𝛾𝜀\hat{\gamma}_{\varepsilon,\ell}\neq\varnothingover^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε , roman_ℓ end_POSTSUBSCRIPT ≠ ∅ and <0pt0𝑝𝑡\ell<0ptroman_ℓ < 0 italic_p italic_t do
4:     γ^ε,+1=subscript^𝛾𝜀1\hat{\gamma}_{\varepsilon,\ell+1}=\varnothingover^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε , roman_ℓ + 1 end_POSTSUBSCRIPT = ∅
5:     for all νγ^ε,𝜈subscript^𝛾𝜀\nu\in\hat{\gamma}_{\varepsilon,\ell}italic_ν ∈ over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε , roman_ℓ end_POSTSUBSCRIPT do
6:         for all children νsuperscript𝜈\nu^{\prime}italic_ν start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of ν𝜈\nuitalic_ν do
7:              if 𝔊P^(ν)+𝖫𝖺𝗉(1εn)>2κ+log(2/β)εn^subscript𝔊𝑃superscript𝜈𝖫𝖺𝗉1𝜀𝑛2𝜅2𝛽𝜀𝑛\widehat{\mathfrak{G}_{P}}(\nu^{\prime})+\mathsf{Lap}(\frac{1}{\varepsilon n})% >2\kappa+\frac{\log(2/\beta)}{\varepsilon n}over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ( italic_ν start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) + sansserif_Lap ( divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG ) > 2 italic_κ + divide start_ARG roman_log ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG then
8:                  γ^ε,+1=γ^ε,+1+{ν}subscript^𝛾𝜀1subscript^𝛾𝜀1superscript𝜈\hat{\gamma}_{\varepsilon,\ell+1}=\hat{\gamma}_{\varepsilon,\ell+1}+\{\nu^{% \prime}\}over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε , roman_ℓ + 1 end_POSTSUBSCRIPT = over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε , roman_ℓ + 1 end_POSTSUBSCRIPT + { italic_ν start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT }                             
9:     =+11\ell=\ell+1roman_ℓ = roman_ℓ + 1
10:return γ^ε,subscript^𝛾𝜀\cup\hat{\gamma}_{\varepsilon,\ell}∪ over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε , roman_ℓ end_POSTSUBSCRIPT

The goal of Algorithm 3 is to estimate the set of 1/(εn)1𝜀𝑛1/(\varepsilon n)1 / ( italic_ε italic_n )-active nodes.

The next lemma allows us to bound how close to the goal we get. The proof is contained in Appendix C.

Lemma 5.15.

Let γ^εsubscript^𝛾𝜀\hat{\gamma}_{\varepsilon}over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT be the set of active nodes found in Algorithm 1. Then with probability 10pt(logn+4εn)β10𝑝𝑡𝑛4𝜀𝑛𝛽1-0pt(\log n+4\varepsilon n)\beta1 - 0 italic_p italic_t ( roman_log italic_n + 4 italic_ε italic_n ) italic_β,

γP(max{2εn+4log(2/β)εn,192log(n/β)n})γ^εγP(12εn).subscript𝛾𝑃2𝜀𝑛42𝛽𝜀𝑛192𝑛𝛽𝑛subscript^𝛾𝜀subscript𝛾𝑃12𝜀𝑛\gamma_{{P}}\left({\max\left\{\frac{2}{\varepsilon n}+4\frac{\log(2/\beta)}{% \varepsilon n},\frac{192\log(n/\beta)}{n}\right\}}\right)\subset\hat{\gamma}_{% \varepsilon}\subset\gamma_{{P}}\left({\frac{1}{2\varepsilon n}}\right).italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( roman_max { divide start_ARG 2 end_ARG start_ARG italic_ε italic_n end_ARG + 4 divide start_ARG roman_log ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG , divide start_ARG 192 roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG } ) ⊂ over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ⊂ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) .

We also prove the following lemma relating the error due to estimating the active nodes to a quantity depending on the true active nodes.

Lemma 5.16.

If γP(max{2εn+4log(2/β)εn,192log(n/β)n})γ^εsubscript𝛾𝑃2𝜀𝑛42𝛽𝜀𝑛192𝑛𝛽𝑛subscript^𝛾𝜀\gamma_{{P}}\left({\max\{\frac{2}{\varepsilon n}+4\frac{\log(2/\beta)}{% \varepsilon n},\frac{192\log(n/\beta)}{n}\}}\right)\subset\hat{\gamma}_{\varepsilon}italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( roman_max { divide start_ARG 2 end_ARG start_ARG italic_ε italic_n end_ARG + 4 divide start_ARG roman_log ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG , divide start_ARG 192 roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG } ) ⊂ over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT then

𝔚(𝔊P^,𝔊P^|γ^ε)𝔚(𝔊P,𝔊P^)+𝔚(𝔊P,𝔊P|γP(max{2εn+4log(2/β)εn,192log(n/β)n}))𝔚^subscript𝔊𝑃evaluated-at^subscript𝔊𝑃subscript^𝛾𝜀𝔚subscript𝔊𝑃^subscript𝔊𝑃𝔚subscript𝔊𝑃evaluated-atsubscript𝔊𝑃subscript𝛾𝑃2𝜀𝑛42𝛽𝜀𝑛192𝑛𝛽𝑛\mathfrak{W}(\widehat{\mathfrak{G}_{P}},\widehat{\mathfrak{G}_{P}}|_{\hat{% \gamma}_{\varepsilon}})\leq\mathfrak{W}(\mathfrak{G}_{P},\widehat{\mathfrak{G}% _{P}})+\mathfrak{W}(\mathfrak{G}_{P},\mathfrak{G}_{P}|_{\gamma_{{P}}\left({% \max\{\frac{2}{\varepsilon n}+4\frac{\log(2/\beta)}{\varepsilon n},\frac{192% \log(n/\beta)}{n}\}}\right)})fraktur_W ( over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG , over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG | start_POSTSUBSCRIPT over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ fraktur_W ( fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT , over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ) + fraktur_W ( fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT , fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( roman_max { divide start_ARG 2 end_ARG start_ARG italic_ε italic_n end_ARG + 4 divide start_ARG roman_log ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG , divide start_ARG 192 roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG } ) end_POSTSUBSCRIPT )
Algorithm 4 Projection
1:Input: 𝔊𝔊\mathfrak{G}fraktur_G, a real-valued function on the nodes of the HST such that 𝔊(ν0)=1𝔊subscript𝜈01\mathfrak{G}(\nu_{0})=1fraktur_G ( italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = 1 where ν0subscript𝜈0\nu_{0}italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the root node.
2:𝔊¯=𝔊¯𝔊𝔊\bar{\mathfrak{G}}=\mathfrak{G}over¯ start_ARG fraktur_G end_ARG = fraktur_G
3:for =0:0pt1:00𝑝𝑡1\ell=0:0pt-1roman_ℓ = 0 : 0 italic_p italic_t - 1 do
4:     for all nodes ν𝜈\nuitalic_ν at level \ellroman_ℓ do
5:         Let Aν=𝔊(ν)subscript𝐴𝜈𝔊superscript𝜈A_{\nu}=\sum\mathfrak{G}(\nu^{\prime})italic_A start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT = ∑ fraktur_G ( italic_ν start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) where the sum is over the children of ν𝜈\nuitalic_ν.
6:         Let dνsubscript𝑑𝜈d_{\nu}italic_d start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT be the number of children of ν𝜈\nuitalic_ν
7:         if Aν=0subscript𝐴𝜈0A_{\nu}=0italic_A start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT = 0 then
8:              for all children νsuperscript𝜈\nu^{\prime}italic_ν start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of ν𝜈\nuitalic_ν do
9:                  𝔊¯(ν)=1dν𝔊¯(ν)¯𝔊superscript𝜈1subscript𝑑𝜈¯𝔊𝜈\bar{\mathfrak{G}}(\nu^{\prime})=\frac{1}{d_{\nu}}\bar{\mathfrak{G}}(\nu)over¯ start_ARG fraktur_G end_ARG ( italic_ν start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_d start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG over¯ start_ARG fraktur_G end_ARG ( italic_ν )               
10:         else
11:              for all children νsuperscript𝜈\nu^{\prime}italic_ν start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of ν𝜈\nuitalic_ν do
12:                  𝔊¯(ν)=𝔊¯(ν)Aν𝔊(ν)¯𝔊superscript𝜈¯𝔊𝜈subscript𝐴𝜈𝔊superscript𝜈\bar{\mathfrak{G}}(\nu^{\prime})=\frac{\bar{\mathfrak{G}}(\nu)}{A_{\nu}}% \mathfrak{G}(\nu^{\prime})over¯ start_ARG fraktur_G end_ARG ( italic_ν start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = divide start_ARG over¯ start_ARG fraktur_G end_ARG ( italic_ν ) end_ARG start_ARG italic_A start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG fraktur_G ( italic_ν start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )                             
13:return 𝔊¯¯𝔊\bar{\mathfrak{G}}over¯ start_ARG fraktur_G end_ARG

The key component of this proof is that any discrepancy between the weight of the nodes on P𝑃Pitalic_P and that assigned by 𝔊P^^subscript𝔊𝑃\widehat{\mathfrak{G}_{P}}over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG was already paid for in 𝒲(P,𝔊P^)𝒲𝑃^subscript𝔊𝑃\mathcal{W}(P,\widehat{\mathfrak{G}_{P}})caligraphic_W ( italic_P , over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ). The final step in Algorithm 1 is to project the noisy function 𝔊Pn^,γε^~~subscript𝔊^subscript𝑃𝑛^subscript𝛾𝜀\widetilde{\mathfrak{G}_{\hat{P_{n}},\hat{\gamma_{\varepsilon}}}}over~ start_ARG fraktur_G start_POSTSUBSCRIPT over^ start_ARG italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG , over^ start_ARG italic_γ start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT end_ARG into the space of distributions on the underlying metric space. We’d like to do this in a way that preserves, up to a constant, the 𝔚𝔚\mathfrak{W}fraktur_W distance between P𝑃Pitalic_P and 𝔊Pn^,γε^~~subscript𝔊^subscript𝑃𝑛^subscript𝛾𝜀\widetilde{\mathfrak{G}_{\hat{P_{n}},\hat{\gamma_{\varepsilon}}}}over~ start_ARG fraktur_G start_POSTSUBSCRIPT over^ start_ARG italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG , over^ start_ARG italic_γ start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT end_ARG. We will do this iteratively starting from the root node, by ensuring that the sum of each node’s children add up to it’s assigned value. Since we know the root node has value 1, this results in a valid distribution. We start from the top of the tree since errors in higher nodes of the contribute more to the Wasserstein distance. While errors in higher nodes of the tree propagate can propagate to lower levels, the predominant influence on the overall error is retained at the top level due to the geometric nature of the edge weights.

Lemma 5.17.

For any real-valued function 𝔊𝔊\mathfrak{G}fraktur_G on the nodes of the HST such that 𝔊(ν0)=1𝔊subscript𝜈01\mathfrak{G}(\nu_{0})=1fraktur_G ( italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = 1 where ν0subscript𝜈0\nu_{0}italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the root node and given any distribution P𝑃Pitalic_P,

𝒲(P,Projection(𝔊))4𝔚(𝔊P,𝔊).𝒲𝑃Projection𝔊4𝔚subscript𝔊𝑃𝔊\mathcal{W}(P,\texttt{Projection}(\mathfrak{G}))\leq 4\mathfrak{W}(\mathfrak{G% }_{P},\mathfrak{G}).caligraphic_W ( italic_P , Projection ( fraktur_G ) ) ≤ 4 fraktur_W ( fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT , fraktur_G ) .

Combining the above lemmas appropriately gives the proof of Theorem 5.13 (see Appendix C).

Proof of Theorem 5.13.

The privacy follows from the fact that each user contributes to at most 0pt0𝑝𝑡0pt0 italic_p italic_t queries in LocateActiveNodes and at most one coordinate in the computation of 𝔊Pn^,γ^n~~subscript𝔊^subscript𝑃𝑛subscript^𝛾𝑛\widetilde{\mathfrak{G}_{\hat{P_{n}},\hat{\gamma}_{n}}}over~ start_ARG fraktur_G start_POSTSUBSCRIPT over^ start_ARG italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG , over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG in line 4 in PrivDensityEstTree.

For the utility, we will consider each level \ellroman_ℓ individually. First suppose that |γP(1/(2εn))|>1subscript𝛾subscript𝑃12𝜀𝑛1|\gamma_{{P_{\ell}}}\left({1/(2\varepsilon n)}\right)|>1| italic_γ start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 1 / ( 2 italic_ε italic_n ) ) | > 1.

𝒲(P,(P^ε))𝒲subscript𝑃subscriptsubscript^𝑃𝜀\displaystyle\mathcal{W}(P_{\ell},(\hat{P}_{\varepsilon})_{\ell})caligraphic_W ( italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) 2𝔚((𝔊P),(𝔊P^n,γ^ε~))absent2𝔚subscriptsubscript𝔊𝑃subscript~subscript𝔊subscript^𝑃𝑛subscript^𝛾𝜀\displaystyle\leq 2\mathfrak{W}((\mathfrak{G}_{P})_{\ell},(\widetilde{% \mathfrak{G}_{\hat{P}_{n},\hat{\gamma}_{\varepsilon}}})_{\ell})≤ 2 fraktur_W ( ( fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , ( over~ start_ARG fraktur_G start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT )
2(𝔚((𝔊P),(𝔊P^))+𝔚((𝔊P^),(𝔊P^|γ^ε))+𝔚((𝔊P^|γ^ε),(𝔊P^n,γ^ε~)))absent2𝔚subscriptsubscript𝔊𝑃subscript^subscript𝔊𝑃𝔚subscript^subscript𝔊𝑃subscriptconditional^subscript𝔊𝑃subscript^𝛾𝜀𝔚subscriptconditional^subscript𝔊𝑃subscript^𝛾𝜀subscript~subscript𝔊subscript^𝑃𝑛subscript^𝛾𝜀\displaystyle\leq 2\left(\mathfrak{W}((\mathfrak{G}_{P})_{\ell},(\widehat{% \mathfrak{G}_{P}})_{\ell})+\mathfrak{W}((\widehat{\mathfrak{G}_{P}})_{\ell},(% \widehat{\mathfrak{G}_{P}}|{\hat{\gamma}_{\varepsilon}})_{\ell})+\mathfrak{W}(% (\widehat{\mathfrak{G}_{P}}|{\hat{\gamma}_{\varepsilon}})_{\ell},(\widetilde{% \mathfrak{G}_{\hat{P}_{n},\hat{\gamma}_{\varepsilon}}})_{\ell})\right)≤ 2 ( fraktur_W ( ( fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , ( over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) + fraktur_W ( ( over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , ( over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG | over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) + fraktur_W ( ( over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG | over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , ( over~ start_ARG fraktur_G start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) ) (8)
2(2𝔚((𝔊P),(𝔊P^))+𝔚((𝔊P),(𝔊P|γP(max{2εn+2log(2/β)n,log(n/β)n})))+𝔚((𝔊P^|γ^ε),(𝔊P^n,γ^ε~)))absent22𝔚subscriptsubscript𝔊𝑃subscript^subscript𝔊𝑃𝔚subscriptsubscript𝔊𝑃subscriptevaluated-atsubscript𝔊𝑃subscript𝛾𝑃2𝜀𝑛22𝛽𝑛𝑛𝛽𝑛𝔚subscriptevaluated-at^subscript𝔊𝑃subscript^𝛾𝜀subscript~subscript𝔊subscript^𝑃𝑛subscript^𝛾𝜀\displaystyle\leq 2\left(2\mathfrak{W}((\mathfrak{G}_{P})_{\ell},(\widehat{% \mathfrak{G}_{P}})_{\ell})+\mathfrak{W}((\mathfrak{G}_{P})_{\ell},(\mathfrak{G% }_{P}|_{\gamma_{{P}}\left({\max\{\frac{2}{\varepsilon n}+2\frac{\log(2/\beta)}% {n},\frac{\log(n/\beta)}{n}\}}\right)})_{\ell})+\mathfrak{W}((\widehat{% \mathfrak{G}_{P}}|_{\hat{\gamma}_{\varepsilon}})_{\ell},(\widetilde{\mathfrak{% G}_{\hat{P}_{n},\hat{\gamma}_{\varepsilon}}})_{\ell})\right)≤ 2 ( 2 fraktur_W ( ( fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , ( over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) + fraktur_W ( ( fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , ( fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( roman_max { divide start_ARG 2 end_ARG start_ARG italic_ε italic_n end_ARG + 2 divide start_ARG roman_log ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG , divide start_ARG roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG } ) end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) + fraktur_W ( ( over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG | start_POSTSUBSCRIPT over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , ( over~ start_ARG fraktur_G start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) )

where the first inequality follow from Lemma 5.17, the second inequality follows from the triangle inequality and Lemma 5.4, and the third follows from Lemma 5.16 and Lemma 5.15. Finally,

𝔚(𝔊P^|γ^ε),(𝔊P^n,γ^ε~))νγP(12εn)rν|𝖫𝖺𝗉(1εn)|12νγP(12εn)rν|𝖫𝖺𝗉(1εn)|\mathfrak{W}(\widehat{\mathfrak{G}_{P}}|_{\hat{\gamma}_{\varepsilon}})_{\ell},% (\widetilde{\mathfrak{G}_{\hat{P}_{n},\hat{\gamma}_{\varepsilon}}})_{\ell})% \leq\sum_{\nu\in\gamma_{{P}}\left({\frac{1}{2\varepsilon n}}\right)}r_{\nu}|% \mathsf{Lap}(\frac{1}{\varepsilon n})|\leq\frac{1}{2}\sum_{\nu\in\gamma_{{P}}% \left({\frac{1}{2\varepsilon n}}\right)}r_{\nu}|\mathsf{Lap}(\frac{1}{% \varepsilon n})|fraktur_W ( over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG | start_POSTSUBSCRIPT over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , ( over~ start_ARG fraktur_G start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) ≤ ∑ start_POSTSUBSCRIPT italic_ν ∈ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT | sansserif_Lap ( divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG ) | ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_ν ∈ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT | sansserif_Lap ( divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG ) |

The final statement then follows from Lemma 5.14 and basic concentration bounds on the Laplacian distribution.

If |γP(1/(2εn))|=1subscript𝛾subscript𝑃12𝜀𝑛1|\gamma_{{P_{\ell}}}\left({1/(2\varepsilon n)}\right)|=1| italic_γ start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 1 / ( 2 italic_ε italic_n ) ) | = 1, then the proof goes through for all except the final term related to the noise due to privacy. We consider two cases. Let xγP(1/(2εn))𝑥subscript𝛾subscript𝑃12𝜀𝑛x\in\gamma_{{P_{\ell}}}\left({1/(2\varepsilon n)}\right)italic_x ∈ italic_γ start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 1 / ( 2 italic_ε italic_n ) ). First suppose that P(x)>112εnsubscript𝑃𝑥112𝜀𝑛P_{\ell}(x)>1-\frac{1}{2\varepsilon n}italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) > 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG then no node that is in a level above x𝑥xitalic_x, but is not a direct ancestor of x𝑥xitalic_x is in γP(1/(2εn))subscript𝛾subscript𝑃12𝜀𝑛\gamma_{{P_{\ell}}}\left({1/(2\varepsilon n)}\right)italic_γ start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 1 / ( 2 italic_ε italic_n ) ). Therefore, since the projection algorithm is top-down, (P^n,ε)subscriptsubscript^𝑃𝑛𝜀(\hat{P}_{n,\varepsilon})_{\ell}( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n , italic_ε end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT will be concentrated on x𝑥xitalic_x. Therefore, the error of level \ellroman_ℓ is simply (1P(x))1𝑃𝑥(1-P(x))( 1 - italic_P ( italic_x ) ), which can be charged to the first term plus the sum of the weight of the inactive nodes, which is in the second term. Next, suppose that P(x)<112εnsubscript𝑃𝑥112𝜀𝑛P_{\ell}(x)<1-\frac{1}{2\varepsilon n}italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) < 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG then sum of the inactive nodes (in term two) dominates the error due to adding noise to P(x)𝑃𝑥P(x)italic_P ( italic_x )

6 Instance Optimal Density Estimation on \mathbb{R}blackboard_R in Wasserstein distance

Let us now consider the setting of estimating distributions P𝑃Pitalic_P on 𝒳=𝒳\mathcal{X}=\mathbb{R}caligraphic_X = blackboard_R. In this setting, the target estimation rate is that of an algorithm that knows that the distribution is either P𝑃Pitalic_P or QPsubscript𝑄𝑃Q_{P}italic_Q start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT for a distribution QPsubscript𝑄𝑃Q_{P}italic_Q start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT such that D(P,QP)ln2subscript𝐷𝑃subscript𝑄𝑃2D_{\infty}(P,Q_{P})\leq\ln 2italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_Q start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ) ≤ roman_ln 2. This definition of instance-optimality strengthens that corresponding to the so-called hardest-one dimensional subproblem [DL91], since this is a harder estimation rate to achieve. A formal description of the target estimation rate is given in Section 3.1 and Section 3.3. In Section 6.1, we lower bound this estimation rate using hypothesis testing techniques. Then, in Section 6.2, we give an algorithm that up to polylogarithmic factors, uniformly achieves the lower bound, and hence approximately achieves the instance-optimal estimation rate. Our instance optimality results apply to all continuous distributions in a bounded interval with density functions (though it is likely that they apply more generally). All omitted proofs can be found in Appendix F.

6.1 General Lower Bound

To state the main theorem in this section, we will introduce some notation. We start by defining the restriction of a distribution.

Definition 6.1.

For any distribution P𝑃Pitalic_P over \mathbb{R}blackboard_R with a density function, the restriction P|u,vevaluated-at𝑃𝑢𝑣P|_{u,v}italic_P | start_POSTSUBSCRIPT italic_u , italic_v end_POSTSUBSCRIPT of P𝑃Pitalic_P with respect to uv𝑢𝑣u\leq v\in\mathbb{R}italic_u ≤ italic_v ∈ blackboard_R is defined as the distribution with the following CDF function F’:

FPu,v(t)={0t<uFP(x)ut<v1tvsubscriptsuperscript𝐹subscript𝑃𝑢𝑣𝑡cases0𝑡𝑢subscript𝐹𝑃𝑥𝑢𝑡𝑣1𝑡𝑣F^{\prime}_{P_{u,v}}(t)=\begin{cases}0&\text{$t<u$}\\ F_{P}(x)&\text{$u\leq t<v$}\\ 1&\text{$t\geq v$}\\ \end{cases}italic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_u , italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) = { start_ROW start_CELL 0 end_CELL start_CELL italic_t < italic_u end_CELL end_ROW start_ROW start_CELL italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_x ) end_CELL start_CELL italic_u ≤ italic_t < italic_v end_CELL end_ROW start_ROW start_CELL 1 end_CELL start_CELL italic_t ≥ italic_v end_CELL end_ROW

If u=v𝑢𝑣u=vitalic_u = italic_v, then Fsuperscript𝐹F^{\prime}italic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is a step function that goes to 1111 at that point and is 00 prior to that point.

Also recall the following definition of quantiles.

Definition 6.2.

For 0<α10𝛼10<\alpha\leq 10 < italic_α ≤ 1, the α𝛼\alphaitalic_α-quantile of a distribution P𝑃Pitalic_P over \mathbb{R}blackboard_R is defined as follows:

qα(P)=argmint{PryP(yt)α}.subscript𝑞𝛼𝑃subscript𝑡subscriptprobabilitysimilar-to𝑦𝑃𝑦𝑡𝛼q_{\alpha}(P)=\arg\min_{t}\{\Pr_{y\sim P}(y\leq t)\geq\alpha\}.italic_q start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_P ) = roman_arg roman_min start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT { roman_Pr start_POSTSUBSCRIPT italic_y ∼ italic_P end_POSTSUBSCRIPT ( italic_y ≤ italic_t ) ≥ italic_α } .

When the distribution P𝑃Pitalic_P is clear from context, we will sometimes abuse notation and use qαsubscript𝑞𝛼q_{\alpha}italic_q start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT when we mean qα(P)subscript𝑞𝛼𝑃q_{\alpha}(P)italic_q start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_P ). The main theorem we will prove in this section is the following:

Theorem 6.3.

There exists a constant C𝐶Citalic_C such that given a continuous distribution P𝑃Pitalic_P on \mathbb{R}blackboard_R with bounded expectation and ε(0,1],nformulae-sequence𝜀01𝑛\varepsilon\in(0,1],n\in\mathbb{N}italic_ε ∈ ( 0 , 1 ] , italic_n ∈ blackboard_N,

loc,n,ε(P)=Ω(1εn\displaystyle\mathcal{R}_{loc,n,\varepsilon}(P)=\Omega\Bigg{(}\frac{1}{% \varepsilon n}caligraphic_R start_POSTSUBSCRIPT italic_l italic_o italic_c , italic_n , italic_ε end_POSTSUBSCRIPT ( italic_P ) = roman_Ω ( divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG (q11Cεnq1Cεn)+𝒲(P,P|q1Cεn,q11Cεn)subscript𝑞11𝐶𝜀𝑛subscript𝑞1𝐶𝜀𝑛𝒲𝑃evaluated-at𝑃subscript𝑞1𝐶𝜀𝑛subscript𝑞11𝐶𝜀𝑛\displaystyle\left(q_{1-\frac{1}{C\varepsilon n}}-q_{\frac{1}{C\varepsilon n}}% \right)+\mathcal{W}(P,P|_{q_{\frac{1}{C\varepsilon n}},q_{1-\frac{1}{C% \varepsilon n}}})( italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT ) + caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT )
+1logn𝔼[𝒲(P|q1Cεn,q11Cεn,P^n|q1Cεn,q11Cεn)]),\displaystyle+\frac{1}{\sqrt{\log n}}\mathbb{E}\left[\mathcal{W}(P|_{q_{\frac{% 1}{C\varepsilon n}},q_{1-\frac{1}{C\varepsilon n}}},\hat{P}_{n}|_{q_{\frac{1}{% C\varepsilon n}},q_{1-\frac{1}{C\varepsilon n}}})\right]\Bigg{)},+ divide start_ARG 1 end_ARG start_ARG square-root start_ARG roman_log italic_n end_ARG end_ARG blackboard_E [ caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] ) ,

where P^nsubscript^𝑃𝑛\hat{P}_{n}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the empirical distribution on n𝑛nitalic_n samples drawn independently from P𝑃Pitalic_P.

The same result can be extended to (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-DP algorithms as well for δ=o(1n)𝛿𝑜1𝑛\delta=o(\frac{1}{n})italic_δ = italic_o ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG )

We discuss each of the terms in turn. Note that the final term is related to the expected Wasserstein distance between the empirical distribution and the true distribution. There is now a long line of work characterizing this quantity in terms of the distribution (See Section 4), but essentially, if the distribution is more concentrated, this term is smaller. The first term is a very particular inter-quantile distance that is also much smaller for concentrated distributions, and can be large for relatively dispersed distributions. The second term characterizes the length of the tails of the distribution—longer tails make this Wasserstein distance larger. Overall, this rate is significantly lower for more concentrated distributions with small support, and relatively large for more dispersed distributions. We prove this theorem over the following couple of sections; in Section 6.1.1 we characterize the cost of private instance optimality, and in Section 6.1.2 we characterize the cost of achieving instance optimality without privacy (this non-private characterization is also new to our work, to the best of our knowledge). Combining the theorems in those sections gives the above result.

6.1.1 The Privacy Term

The main theorem we will prove in this section is the following.

Theorem 6.4.

Fix ε(0,1]𝜀01\varepsilon\in(0,1]italic_ε ∈ ( 0 , 1 ], n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N. For all distributions P𝑃Pitalic_P over \mathbb{R}blackboard_R that have a density function and finite expectation, there exists another distribution Q′′superscript𝑄′′Q^{\prime\prime}italic_Q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT such that D(P,Q)2subscript𝐷𝑃𝑄2D_{\infty}(P,Q)\leq 2italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_Q ) ≤ 2, that is indistinguishable from P𝑃Pitalic_P given O(n)𝑂𝑛O(n)italic_O ( italic_n ) samples such that for all ε𝜀\varepsilonitalic_ε-DP algorithms A:nΔ():𝐴superscript𝑛ΔA:\mathbb{R}^{n}\to\Delta(\mathbb{R})italic_A : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → roman_Δ ( blackboard_R ), with probability at least 0.250.250.250.25 over the draws 𝐱Pnsimilar-to𝐱superscript𝑃𝑛{\bf x}\sim P^{n}bold_x ∼ italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, 𝐱Q′′nsimilar-tosuperscript𝐱superscript𝑄′′𝑛{\bf x}^{\prime}\sim Q^{\prime\prime n}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_Q start_POSTSUPERSCRIPT ′ ′ italic_n end_POSTSUPERSCRIPT, the following holds for some constant C𝐶Citalic_C.

max(𝒲(P,A(𝐱)),𝒲(Q′′,A(𝐱)))14Cεn(q11Cεnq1Cεn)+14𝒲(P,P|q1Cεn,q11Cεn).𝒲𝑃𝐴𝐱𝒲superscript𝑄′′𝐴superscript𝐱14𝐶𝜀𝑛subscript𝑞11𝐶𝜀𝑛subscript𝑞1𝐶𝜀𝑛14𝒲𝑃evaluated-at𝑃subscript𝑞1𝐶𝜀𝑛subscript𝑞11𝐶𝜀𝑛\max(\mathcal{W}(P,A({\bf x})),\mathcal{W}(Q^{\prime\prime},A({\bf x}^{\prime}% )))\geq\frac{1}{4C\varepsilon n}\left(q_{1-\frac{1}{C\varepsilon n}}-q_{\frac{% 1}{C\varepsilon n}}\right)+\frac{1}{4}\mathcal{W}(P,P|_{q_{\frac{1}{C% \varepsilon n}},q_{1-\frac{1}{C\varepsilon n}}}).roman_max ( caligraphic_W ( italic_P , italic_A ( bold_x ) ) , caligraphic_W ( italic_Q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , italic_A ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ) ≥ divide start_ARG 1 end_ARG start_ARG 4 italic_C italic_ε italic_n end_ARG ( italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 4 end_ARG caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) .

We start with some notation. For any distribution P𝑃Pitalic_P with a density, let fPsubscript𝑓𝑃f_{P}italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT denote its density function. Throughout this section, we will use qαsubscript𝑞𝛼q_{\alpha}italic_q start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT to represent the α𝛼\alphaitalic_α-quantile of distribution P𝑃Pitalic_P. Let L(P)𝐿𝑃L(P)italic_L ( italic_P ) be the ‘starting point’ of distribution P𝑃Pitalic_P (defined as inft{t:FP(t)>0}subscriptinfimum𝑡conditional-set𝑡subscript𝐹𝑃𝑡0\inf_{t\in\mathbb{R}}\{t:F_{P}(t)>0\}roman_inf start_POSTSUBSCRIPT italic_t ∈ blackboard_R end_POSTSUBSCRIPT { italic_t : italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) > 0 } if the infimum exists, and -\infty- ∞ otherwise.

Next, we describe some results on differentially private testing that we will use. We say that a testing algorithm Atestsubscript𝐴𝑡𝑒𝑠𝑡A_{test}italic_A start_POSTSUBSCRIPT italic_t italic_e italic_s italic_t end_POSTSUBSCRIPT distinguishes two distributions P𝑃Pitalic_P and Q𝑄Qitalic_Q with n𝑛nitalic_n samples, if given the promise that a dataset of size n𝑛nitalic_n is drawn from either Pnsuperscript𝑃𝑛P^{n}italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT or Qnsuperscript𝑄𝑛Q^{n}italic_Q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, with probability at least 2323\frac{2}{3}divide start_ARG 2 end_ARG start_ARG 3 end_ARG, it outputs P𝑃Pitalic_P if the dataset was drawn from Pnsuperscript𝑃𝑛P^{n}italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and Q𝑄Qitalic_Q if it was drawn from Qnsuperscript𝑄𝑛Q^{n}italic_Q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. We now state a theorem lower bounding the sample complexity of differentially private hypothesis testing.

Theorem 6.5 ([CKM+19, Theorem 1.2]).

Fix n,ε>0formulae-sequence𝑛𝜀0n\in\mathbb{N},\varepsilon>0italic_n ∈ blackboard_N , italic_ε > 0. For every pair of distributions P,Q𝑃𝑄P,Qitalic_P , italic_Q over \mathbb{R}blackboard_R, if there exists an ε𝜀\varepsilonitalic_ε-DP testing algorithm444The same bounds (and hence all our results in this subsection) can be extended to (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-DP (with δε𝛿𝜀\delta\leq\varepsilonitalic_δ ≤ italic_ε) by using an equivalence of pure and approximate DP for identity and closeness testing [ASZ17, Lemma 5]. Atestsubscript𝐴𝑡𝑒𝑠𝑡A_{test}italic_A start_POSTSUBSCRIPT italic_t italic_e italic_s italic_t end_POSTSUBSCRIPT that distinguishes P𝑃Pitalic_P and Q𝑄Qitalic_Q with n𝑛nitalic_n samples, then

n=Ω(1ετ(P,Q)+(1τ(P,Q))H2(P,Q)),𝑛Ω1𝜀𝜏𝑃𝑄1𝜏𝑃𝑄superscript𝐻2superscript𝑃superscript𝑄n=\Omega\left(\frac{1}{\varepsilon\tau(P,Q)+(1-\tau(P,Q))H^{2}(P^{\prime},Q^{% \prime})}\right),italic_n = roman_Ω ( divide start_ARG 1 end_ARG start_ARG italic_ε italic_τ ( italic_P , italic_Q ) + ( 1 - italic_τ ( italic_P , italic_Q ) ) italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG ) ,

where

τ(P,Q)=max{max{eεfP(t)fQ(t),0}𝑑t,max{eεfQ(t)fP(t),0}𝑑t},𝜏𝑃𝑄subscriptsuperscript𝑒𝜀subscript𝑓𝑃𝑡subscript𝑓𝑄𝑡0differential-d𝑡subscriptsuperscript𝑒𝜀subscript𝑓𝑄𝑡subscript𝑓𝑃𝑡0differential-d𝑡\tau(P,Q)=\max\Big{\{}\int_{\mathbb{R}}\max\{e^{\varepsilon}f_{P}(t)-f_{Q}(t),% 0\}dt,\int_{\mathbb{R}}\max\{e^{\varepsilon}f_{Q}(t)-f_{P}(t),0\}dt\Big{\}},italic_τ ( italic_P , italic_Q ) = roman_max { ∫ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT roman_max { italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) , 0 } italic_d italic_t , ∫ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT roman_max { italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) - italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) , 0 } italic_d italic_t } ,

and H2(,)superscript𝐻2H^{2}(\cdot,\cdot)italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ⋅ , ⋅ ) is the squared Hellinger distance between P=min(eεQ,P)1τ(P,Q)superscript𝑃superscript𝑒𝜀𝑄𝑃1𝜏𝑃𝑄P^{\prime}=\frac{\min(e^{\varepsilon}Q,P)}{1-\tau(P,Q)}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG roman_min ( italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT italic_Q , italic_P ) end_ARG start_ARG 1 - italic_τ ( italic_P , italic_Q ) end_ARG, and Q=min(eεP,Q)1τ(P,Q)superscript𝑄superscript𝑒superscript𝜀𝑃𝑄1𝜏𝑃𝑄Q^{\prime}=\frac{\min(e^{\varepsilon^{\prime}}P,Q)}{1-\tau(P,Q)}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG roman_min ( italic_e start_POSTSUPERSCRIPT italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_P , italic_Q ) end_ARG start_ARG 1 - italic_τ ( italic_P , italic_Q ) end_ARG, where 0εε0superscript𝜀𝜀0\leq\varepsilon^{\prime}\leq\varepsilon0 ≤ italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≤ italic_ε is such that if τ(P,Q)=max{fP(t)eεfQ(t),0}𝑑t𝜏𝑃𝑄subscriptsubscript𝑓𝑃𝑡superscript𝑒𝜀subscript𝑓𝑄𝑡0differential-d𝑡\tau(P,Q)=\int_{\mathbb{R}}\max\{f_{P}(t)-e^{\varepsilon}f_{Q}(t),0\}dtitalic_τ ( italic_P , italic_Q ) = ∫ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT roman_max { italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) , 0 } italic_d italic_t, then εsuperscript𝜀\varepsilon^{\prime}italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the maximum value such that

τ(P,Q)=max{fQ(t)eεfP(t),0}𝑑t,𝜏𝑃𝑄subscriptsubscript𝑓𝑄𝑡superscript𝑒superscript𝜀subscript𝑓𝑃𝑡0differential-d𝑡\tau(P,Q)=\int_{\mathbb{R}}\max\{f_{Q}(t)-e^{\varepsilon^{\prime}}f_{P}(t),0\}dt,italic_τ ( italic_P , italic_Q ) = ∫ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT roman_max { italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) - italic_e start_POSTSUPERSCRIPT italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) , 0 } italic_d italic_t ,

else εsuperscript𝜀\varepsilon^{\prime}italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the maximum value such that

τ(P,Q)=max{fP(t)eεfQ(t),0}𝑑t.𝜏𝑃𝑄subscriptsubscript𝑓𝑃𝑡superscript𝑒superscript𝜀subscript𝑓𝑄𝑡0differential-d𝑡\tau(P,Q)=\int_{\mathbb{R}}\max\{f_{P}(t)-e^{\varepsilon^{\prime}}f_{Q}(t),0\}dt.italic_τ ( italic_P , italic_Q ) = ∫ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT roman_max { italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_e start_POSTSUPERSCRIPT italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) , 0 } italic_d italic_t .

We now are ready to start proving our main theorem.

Proof.

(of Theorem 6.4) The idea is to construct Q𝑄Qitalic_Q from P𝑃Pitalic_P by moving mass from the leftmost quantiles to the rightmost quantile. We do this such that Q𝑄Qitalic_Q is statistically close enough to P𝑃Pitalic_P such that the two distributions can not be distinguished with n𝑛nitalic_n samples, but is also far from P𝑃Pitalic_P in Wasserstein distance. This produces a lower bound of (1/2)𝒲(P,Q)12𝒲𝑃𝑄(1/2)\mathcal{W}(P,Q)( 1 / 2 ) caligraphic_W ( italic_P , italic_Q ) on how well an algorithm can simultaneously estimate P𝑃Pitalic_P and Q𝑄Qitalic_Q since if there was an algorithm that produced good estimates of P𝑃Pitalic_P and Q𝑄Qitalic_Q in Wasserstein distance with n𝑛nitalic_n samples, then we could tell them apart, and this would give a contradiction.

Let k𝑘kitalic_k be a quantity to be set later. Formally, we define Q𝑄Qitalic_Q as the distribution with the following density function.

fQ(t)={12fP(t),for t<q1/kfP(t),for q1/kt<q11k32fP(t)for q11kt}subscript𝑓𝑄𝑡12subscript𝑓𝑃𝑡for 𝑡subscript𝑞1𝑘subscript𝑓𝑃𝑡for subscript𝑞1𝑘𝑡subscript𝑞11𝑘32subscript𝑓𝑃𝑡for subscript𝑞11𝑘𝑡f_{Q}(t)=\left\{\begin{array}[]{lr}\frac{1}{2}f_{P}(t),&\text{for }t<q_{1/k}\\ f_{P}(t),&\text{for }q_{1/k}\leq t<q_{1-\frac{1}{k}}\\ \frac{3}{2}f_{P}(t)&\text{for }q_{1-\frac{1}{k}}\leq t\end{array}\right\}italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) = { start_ARRAY start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) , end_CELL start_CELL for italic_t < italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) , end_CELL start_CELL for italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT ≤ italic_t < italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG 3 end_ARG start_ARG 2 end_ARG italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) end_CELL start_CELL for italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARRAY }

Note that by the definition of Q𝑄Qitalic_Q, we have that D(P,Q)2subscript𝐷𝑃𝑄2D_{\infty}(P,Q)\leq 2italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_Q ) ≤ 2.

We will prove that the sample complexity of telling apart P𝑃Pitalic_P and Q𝑄Qitalic_Q under (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-DP is Ω(k/ε)Ω𝑘𝜀\Omega(k/\varepsilon)roman_Ω ( italic_k / italic_ε ), using known results on hypothesis testing. Then, we will argue that the Wasserstein distance between P𝑃Pitalic_P and Q𝑄Qitalic_Q is sufficiently large. Setting k𝑘kitalic_k appropriately will complete the proof.

Define SCε,δ(P,Q)𝑆subscript𝐶𝜀𝛿𝑃𝑄SC_{\varepsilon,\delta}(P,Q)italic_S italic_C start_POSTSUBSCRIPT italic_ε , italic_δ end_POSTSUBSCRIPT ( italic_P , italic_Q ) to be the smallest n𝑛nitalic_n such that there exists an (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-DP testing algorithm that distinguishes P𝑃Pitalic_P and Q𝑄Qitalic_Q; called the sample complexity of privately distinguishing P𝑃Pitalic_P and Q𝑄Qitalic_Q.

Lemma 6.6.

SCε,δ(P,Q)=Ω(k/ε).𝑆subscript𝐶𝜀𝛿𝑃𝑄Ω𝑘𝜀SC_{\varepsilon,\delta}(P,Q)=\Omega(k/\varepsilon).italic_S italic_C start_POSTSUBSCRIPT italic_ε , italic_δ end_POSTSUBSCRIPT ( italic_P , italic_Q ) = roman_Ω ( italic_k / italic_ε ) .

The proof of this lemma is in Appendix F. We next argue that P𝑃Pitalic_P and Q𝑄Qitalic_Q are sufficiently far away in Wasserstein distance.

Lemma 6.7.

𝒲(P,Q)12k(q11kq1/k)+12𝒲(P,P|q1k,q11k)𝒲𝑃𝑄12𝑘subscript𝑞11𝑘subscript𝑞1𝑘12𝒲𝑃evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘\mathcal{W}(P,Q)\geq\frac{1}{2k}(q_{1-\frac{1}{k}}-q_{1/k})+\frac{1}{2}% \mathcal{W}(P,P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}})caligraphic_W ( italic_P , italic_Q ) ≥ divide start_ARG 1 end_ARG start_ARG 2 italic_k end_ARG ( italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ).

The proof of this lemma is also in Appendix F.

Finally, we are ready to prove the theorem. Assume that with probability larger than 0.750.750.750.75 over the draw of two datasets 𝐱Pnsimilar-to𝐱superscript𝑃𝑛{\bf x}\sim P^{n}bold_x ∼ italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, 𝐱Qnsimilar-tosuperscript𝐱superscript𝑄𝑛{\bf x}^{\prime}\sim Q^{n}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_Q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and the randomness used by invocations of algorithm A𝐴Aitalic_A we have that max(𝒲(P,A(𝐱)),𝒲(Q,A(𝐱))<12𝒲(P,Q)\max(\mathcal{W}(P,A({\bf x})),\mathcal{W}(Q,A({\bf x}^{\prime}))<\frac{1}{2}% \mathcal{W}(P,Q)roman_max ( caligraphic_W ( italic_P , italic_A ( bold_x ) ) , caligraphic_W ( italic_Q , italic_A ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) < divide start_ARG 1 end_ARG start_ARG 2 end_ARG caligraphic_W ( italic_P , italic_Q ). Then, given a dataset 𝐱′′superscript𝐱′′{\bf x}^{\prime\prime}bold_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT of size n𝑛nitalic_n, we can perform the following test: run the differentially private algorithm A𝐴Aitalic_A on the dataset 𝐱′′superscript𝐱′′{\bf x}^{\prime\prime}bold_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT and compute 𝒲(P,A(𝐱′′))𝒲𝑃𝐴superscript𝐱′′\mathcal{W}(P,A({\bf x}^{\prime\prime}))caligraphic_W ( italic_P , italic_A ( bold_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ) and 𝒲(Q,A(𝐱′′))𝒲𝑄𝐴superscript𝐱′′\mathcal{W}(Q,A({\bf x}^{\prime\prime}))caligraphic_W ( italic_Q , italic_A ( bold_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ) and output the distribution with lower distance. Then, note that 𝒲(P,Q)𝒲(P,A(𝐱′′))+𝒲(Q,A(𝐱′′))𝒲𝑃𝑄𝒲𝑃𝐴superscript𝐱′′𝒲𝑄𝐴superscript𝐱′′\mathcal{W}(P,Q)\leq\mathcal{W}(P,A({\bf x}^{\prime\prime}))+\mathcal{W}(Q,A({% \bf x}^{\prime\prime}))caligraphic_W ( italic_P , italic_Q ) ≤ caligraphic_W ( italic_P , italic_A ( bold_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ) + caligraphic_W ( italic_Q , italic_A ( bold_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ) which implies that with probability at least 0.750.750.750.75, 𝒲(Q,A(𝐱′′))>12𝒲(P,Q)𝒲𝑄𝐴superscript𝐱′′12𝒲𝑃𝑄\mathcal{W}(Q,A({\bf x}^{\prime\prime}))>\frac{1}{2}\mathcal{W}(P,Q)caligraphic_W ( italic_Q , italic_A ( bold_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ) > divide start_ARG 1 end_ARG start_ARG 2 end_ARG caligraphic_W ( italic_P , italic_Q ) if the dataset 𝐱′′superscript𝐱′′{\bf x}^{\prime\prime}bold_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT was sampled from Pnsuperscript𝑃𝑛P^{n}italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT (by the accuracy guarantee). A similar argument shows that with probability at least 0.750.750.750.75, 𝒲(P,A(𝐱′′))>12𝒲(P,Q)𝒲𝑃𝐴superscript𝐱′′12𝒲𝑃𝑄\mathcal{W}(P,A({\bf x}^{\prime\prime}))>\frac{1}{2}\mathcal{W}(P,Q)caligraphic_W ( italic_P , italic_A ( bold_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ) > divide start_ARG 1 end_ARG start_ARG 2 end_ARG caligraphic_W ( italic_P , italic_Q ) if the dataset 𝐱′′superscript𝐱′′{\bf x}^{\prime\prime}bold_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT was sampled from Qnsuperscript𝑄𝑛Q^{n}italic_Q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Hence, with n𝑛nitalic_n samples we have defined a test that distinguishes P𝑃Pitalic_P and Q𝑄Qitalic_Q. However, for k=Cεn𝑘𝐶𝜀𝑛k=C\varepsilon nitalic_k = italic_C italic_ε italic_n for some constant C𝐶Citalic_C, by Lemma 6.6 we get that any differentially private test distinguishing P𝑃Pitalic_P and Q𝑄Qitalic_Q requires more than n𝑛nitalic_n samples, which is a contradiction. Hence, with probability at least 0.250.250.250.25 over the draw of two datasets 𝐱Pnsimilar-to𝐱superscript𝑃𝑛{\bf x}\sim P^{n}bold_x ∼ italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, 𝐱Qnsimilar-tosuperscript𝐱superscript𝑄𝑛{\bf x}^{\prime}\sim Q^{n}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_Q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and the randomness used by invocations of algorithm A𝐴Aitalic_A we have that max(𝒲(P,A(𝐱)),𝒲(Q,A(𝐱))12𝒲(P,Q)14Cεn(q11Cεnq1/Cεn)+14𝒲(P,P|q1Cεn,q11Cεn)\max(\mathcal{W}(P,A({\bf x})),\mathcal{W}(Q,A({\bf x}^{\prime}))\geq\frac{1}{% 2}\mathcal{W}(P,Q)\geq\frac{1}{4C\varepsilon n}(q_{1-\frac{1}{C\varepsilon n}}% -q_{1/C\varepsilon n})+\frac{1}{4}\mathcal{W}(P,P|_{q_{\frac{1}{C\varepsilon n% }},q_{1-\frac{1}{C\varepsilon n}}})roman_max ( caligraphic_W ( italic_P , italic_A ( bold_x ) ) , caligraphic_W ( italic_Q , italic_A ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG caligraphic_W ( italic_P , italic_Q ) ≥ divide start_ARG 1 end_ARG start_ARG 4 italic_C italic_ε italic_n end_ARG ( italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / italic_C italic_ε italic_n end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 4 end_ARG caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ,where the last inequality is by invoking Lemma 6.7 with k=Cεn𝑘𝐶𝜀𝑛k=C\varepsilon nitalic_k = italic_C italic_ε italic_n.

6.1.2 Empirical Term

In this section, we prove the following result.

Theorem 6.8.

Fix sufficiently large natural numbers n,k>0𝑛𝑘0n,k>0italic_n , italic_k > 0 and let C,C>0𝐶superscript𝐶0C,C^{\prime}>0italic_C , italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT > 0 be sufficiently small constants. For all algorithms A:nΔ:𝐴superscript𝑛subscriptΔA:\mathbb{R}^{n}\to\Delta_{\mathbb{R}}italic_A : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → roman_Δ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT, the following holds. For all continuous distributions P𝑃Pitalic_P over \mathbb{R}blackboard_R with a density and with bounded expectation, there exists another distribution Q𝑄Qitalic_Q (with D(P,Q)ln2subscript𝐷𝑃𝑄2D_{\infty}(P,Q)\leq\ln 2italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_Q ) ≤ roman_ln 2), that is indistinguishable from P𝑃Pitalic_P given O(n)𝑂𝑛O(n)italic_O ( italic_n ) samples, such that with probability at least 0.250.250.250.25 over the draws 𝐱Pnsimilar-to𝐱superscript𝑃𝑛{\bf x}\sim P^{n}bold_x ∼ italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, 𝐱Qnsimilar-tosuperscript𝐱superscript𝑄𝑛{\bf x}^{\prime}\sim Q^{n}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_Q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, the following holds.

max(𝒲(P,A(𝐱)),𝒲(Q,A(𝐱)))Clogn𝔼𝐱′′Pn[𝒲(P|q1k,q11k,P^n|q1k,q11k)],𝒲𝑃𝐴𝐱𝒲𝑄𝐴superscript𝐱superscript𝐶𝑛subscript𝔼similar-tosuperscript𝐱′′superscript𝑃𝑛delimited-[]𝒲evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘evaluated-atsubscript^𝑃𝑛subscript𝑞1𝑘subscript𝑞11𝑘\max(\mathcal{W}(P,A({\bf x})),\mathcal{W}(Q,A({\bf x}^{\prime})))\geq\frac{C^% {\prime}}{\sqrt{\log n}}\mathbb{E}_{{\bf x}^{\prime\prime}\sim P^{n}}\left[% \mathcal{W}\left(P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}},\hat{P}_{n}|_{q_{\frac% {1}{k}},q_{1-\frac{1}{k}}}\right)\right],roman_max ( caligraphic_W ( italic_P , italic_A ( bold_x ) ) , caligraphic_W ( italic_Q , italic_A ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ) ≥ divide start_ARG italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG roman_log italic_n end_ARG end_ARG blackboard_E start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∼ italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] ,

where qαsubscript𝑞𝛼q_{\alpha}italic_q start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT is the α𝛼\alphaitalic_α-quantile of P𝑃Pitalic_P.

Before going into the proof, we state the following result on the sample complexity of testing. This is a folklore result but for a proof of the lower bound see [BY02] and the upper bound see [Can17].

Theorem 6.9.

Fix n,ε>0formulae-sequence𝑛𝜀0n\in\mathbb{N},\varepsilon>0italic_n ∈ blackboard_N , italic_ε > 0. For every pair of distributions P,Q𝑃𝑄P,Qitalic_P , italic_Q over \mathbb{R}blackboard_R, if there exists a testing algorithm Atestsubscript𝐴𝑡𝑒𝑠𝑡A_{test}italic_A start_POSTSUBSCRIPT italic_t italic_e italic_s italic_t end_POSTSUBSCRIPT that distinguishes P𝑃Pitalic_P and Q𝑄Qitalic_Q with n𝑛nitalic_n samples, then

n=Ω(1H2(P,Q)),𝑛Ω1superscript𝐻2𝑃𝑄n=\Omega\left(\frac{1}{H^{2}(P,Q)}\right),italic_n = roman_Ω ( divide start_ARG 1 end_ARG start_ARG italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P , italic_Q ) end_ARG ) ,

wherer H2(,)superscript𝐻2H^{2}(\cdot,\cdot)italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ⋅ , ⋅ ) represents the squared Hellinger distance between P𝑃Pitalic_P and Q𝑄Qitalic_Q.

Throughout the proof, we will use qαsubscript𝑞𝛼q_{\alpha}italic_q start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT to represent the α𝛼\alphaitalic_α-quantile of distribution P𝑃Pitalic_P.

Proof of Theorem 6.8.

Q𝑄Qitalic_Q is constructed by adding progressively more mass to P𝑃Pitalic_P up until q1/2subscript𝑞12q_{1/2}italic_q start_POSTSUBSCRIPT 1 / 2 end_POSTSUBSCRIPT and subtracting proportionate amounts of mass from P𝑃Pitalic_P afterwards. Intuitively, this is done in such a way that to ‘change’ P𝑃Pitalic_P to Q𝑄Qitalic_Q, for all i2𝑖2i\geq 2italic_i ≥ 2 one has to move roughly min{12in,12i}1superscript2𝑖𝑛1superscript2𝑖\min\{\frac{1}{\sqrt{2^{i}n}},\frac{1}{2^{i}}\}roman_min { divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_n end_ARG end_ARG , divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG } mass from q1/2isubscript𝑞1superscript2𝑖q_{1/2^{i}}italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT to q11/2isubscript𝑞11superscript2𝑖q_{1-1/2^{i}}italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. This ensures that the Wasserstein distance between P𝑃Pitalic_P and Q𝑄Qitalic_Q is larger than the expected Wasserstein distance between P𝑃Pitalic_P and its empirical distribution on n𝑛nitalic_n samples P^nsubscript^𝑃𝑛\hat{P}_{n}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. This is carefully done to ensure that P𝑃Pitalic_P is indistinguishable from Q𝑄Qitalic_Q.

Formally, consider i𝑖iitalic_i in the range [2,logn1)2𝑛1[2,\log n-1)[ 2 , roman_log italic_n - 1 ). For all x(q1/2i,q1/2i1]𝑥subscript𝑞1superscript2𝑖subscript𝑞1superscript2𝑖1x\in(q_{1/2^{i}},q_{1/2^{i-1}}]italic_x ∈ ( italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ], we set fQ(t)=fP(t)[1+2in]subscript𝑓𝑄𝑡subscript𝑓𝑃𝑡delimited-[]1superscript2𝑖𝑛f_{Q}(t)=f_{P}(t)\left[1+\sqrt{\frac{2^{i}}{n}}\right]italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) = italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) [ 1 + square-root start_ARG divide start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG end_ARG ]. For all t(q11/2i1,q11/2i]𝑡subscript𝑞11superscript2𝑖1subscript𝑞11superscript2𝑖t\in(q_{1-1/2^{i-1}},q_{1-1/2^{i}}]italic_t ∈ ( italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ], we set fQ(t)=fP(t)[12in]subscript𝑓𝑄𝑡subscript𝑓𝑃𝑡delimited-[]1superscript2𝑖𝑛f_{Q}(t)=f_{P}(t)\left[1-\sqrt{\frac{2^{i}}{n}}\right]italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) = italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) [ 1 - square-root start_ARG divide start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG end_ARG ]. Next, consider i𝑖iitalic_i in the range [logn,)𝑛[\log n,\infty)[ roman_log italic_n , ∞ ). For all t(q1/2i,q1/2i1]𝑡subscript𝑞1superscript2𝑖subscript𝑞1superscript2𝑖1t\in(q_{1/2^{i}},q_{1/2^{i-1}}]italic_t ∈ ( italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ], we set fQ(t)=fP(t)[1+12]subscript𝑓𝑄𝑡subscript𝑓𝑃𝑡delimited-[]112f_{Q}(t)=f_{P}(t)\left[1+\frac{1}{2}\right]italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) = italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) [ 1 + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ]. For all t(q11/2i1,q11/2i]𝑡subscript𝑞11superscript2𝑖1subscript𝑞11superscript2𝑖t\in(q_{1-1/2^{i-1}},q_{1-1/2^{i}}]italic_t ∈ ( italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ], we set fQ(t)=fP(t)[112]subscript𝑓𝑄𝑡subscript𝑓𝑃𝑡delimited-[]112f_{Q}(t)=f_{P}(t)\left[1-\frac{1}{2}\right]italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) = italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) [ 1 - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ]. Note that P𝑃Pitalic_P has bounded expectation by assumption, and hence, so does Q𝑄Qitalic_Q. Additionally, note that D(P,Q)ln2subscript𝐷𝑃𝑄2D_{\infty}(P,Q)\leq\ln 2italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_Q ) ≤ roman_ln 2.

There are two key considerations balanced in the design of Q𝑄Qitalic_Q. On one hand, we need Q𝑄Qitalic_Q to be indistinguishable from P𝑃Pitalic_P given O~(n)~𝑂𝑛\tilde{O}(n)over~ start_ARG italic_O end_ARG ( italic_n ) samples. On the other hand, we need Q𝑄Qitalic_Q to be sufficiently far away from P𝑃Pitalic_P in Wasserstein distance. This ensures that given an accurate algorithm for estimating the density of the distribution (in Wasserstein distance) given access to O~(n)~𝑂𝑛\tilde{O}(n)over~ start_ARG italic_O end_ARG ( italic_n ) samples from it, we can design a test distinguishing P𝑃Pitalic_P and Q𝑄Qitalic_Q with that many samples, thereby contradicting their indistinguishability.

Detailed proofs of claims below can be found in Appendix F. First, we show that P𝑃Pitalic_P is indistinguishable from Q𝑄Qitalic_Q.

Lemma 6.10.
KL(P,Q)=O(logn/n).𝐾𝐿𝑃𝑄𝑂𝑛𝑛KL(P,Q)=O(\log n/n).italic_K italic_L ( italic_P , italic_Q ) = italic_O ( roman_log italic_n / italic_n ) .

Next, we establish a lower bound on the Wasserstein distance between P𝑃Pitalic_P and Q𝑄Qitalic_Q.

Lemma 6.11.
𝒲(P,Q)14[j=2logn112jn[q11/2jq1/2j]+j=logn12j[q11/2jq1/2j]].𝒲𝑃𝑄14delimited-[]superscriptsubscript𝑗2𝑛11superscript2𝑗𝑛delimited-[]subscript𝑞11superscript2𝑗subscript𝑞1superscript2𝑗superscriptsubscript𝑗𝑛1superscript2𝑗delimited-[]subscript𝑞11superscript2𝑗subscript𝑞1superscript2𝑗\mathcal{W}(P,Q)\geq\frac{1}{4}\left[\sum_{j=2}^{\log n-1}\frac{1}{\sqrt{2^{j}% n}}\left[q_{1-1/2^{j}}-q_{1/2^{j}}\right]+\sum_{j=\log n}^{\infty}\frac{1}{2^{% j}}\left[q_{1-1/2^{j}}-q_{1/2^{j}}\right]\right].caligraphic_W ( italic_P , italic_Q ) ≥ divide start_ARG 1 end_ARG start_ARG 4 end_ARG [ ∑ start_POSTSUBSCRIPT italic_j = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_n end_ARG end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] + ∑ start_POSTSUBSCRIPT italic_j = roman_log italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] ] .

Next, we upper bound the expected Wasserstein distance between the distribution P𝑃Pitalic_P and its empirical distribution on n𝑛nitalic_n samples.

Lemma 6.12.
𝔼[𝒲(P,P^n)]8[i=2logn112in[q11/2iq1/2i]+i=logn12i[q11/2iq1/2i]]𝔼delimited-[]𝒲𝑃subscript^𝑃𝑛8delimited-[]superscriptsubscript𝑖2𝑛11superscript2𝑖𝑛delimited-[]subscript𝑞11superscript2𝑖subscript𝑞1superscript2𝑖superscriptsubscript𝑖𝑛1superscript2𝑖delimited-[]subscript𝑞11superscript2𝑖subscript𝑞1superscript2𝑖\mathbb{E}[\mathcal{W}(P,\hat{P}_{n})]\leq 8\left[\sum_{i=2}^{\log n-1}\frac{1% }{\sqrt{2^{i}n}}\left[q_{1-1/2^{i}}-q_{1/2^{i}}\right]+\sum_{i=\log n}^{\infty% }\frac{1}{2^{i}}\left[q_{1-1/2^{i}}-q_{1/2^{i}}\right]\right]blackboard_E [ caligraphic_W ( italic_P , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ] ≤ 8 [ ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_n end_ARG end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] + ∑ start_POSTSUBSCRIPT italic_i = roman_log italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] ]

We now prove a simple claim regarding restrictions.

Claim 6.13 (Restrictions preserve Wasserstein distance).

For all datasets 𝐱𝐱{\bf x}bold_x, and any natural number k>1𝑘1k>1italic_k > 1 we have that

𝒲(P|q1k,q11k,P^n|q1k,q11k)𝒲(P,P^n).𝒲evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘evaluated-atsubscript^𝑃𝑛subscript𝑞1𝑘subscript𝑞11𝑘𝒲𝑃subscript^𝑃𝑛\mathcal{W}(P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}},\hat{P}_{n}|_{q_{\frac{1}{k% }},q_{1-\frac{1}{k}}})\leq\mathcal{W}(P,\hat{P}_{n}).caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ caligraphic_W ( italic_P , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) .

Finally, we are ready to put the above lemmas together to prove Theorem 6.8. Fix n=nClognsuperscript𝑛𝑛𝐶𝑛n^{\prime}=\frac{n}{C\log n}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG italic_n end_ARG start_ARG italic_C roman_log italic_n end_ARG. Assume, for sake of contradiction, that with probability larger than 0.750.750.750.75 over the draw of two datasets 𝐱Pnsimilar-to𝐱superscript𝑃superscript𝑛{\bf x}\sim P^{n^{\prime}}bold_x ∼ italic_P start_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, 𝐱Qnsimilar-tosuperscript𝐱superscript𝑄superscript𝑛{\bf x}^{\prime}\sim Q^{n^{\prime}}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_Q start_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, and the randomness used by invocations of algorithm A𝐴Aitalic_A we have that max(𝒲(P,A(𝐱)),𝒲(Q,A(𝐱))12W1(P,Q)\max(\mathcal{W}(P,A({\bf x})),\mathcal{W}(Q,A({\bf x}^{\prime}))\leq\frac{1}{% 2}W_{1}(P,Q)roman_max ( caligraphic_W ( italic_P , italic_A ( bold_x ) ) , caligraphic_W ( italic_Q , italic_A ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_P , italic_Q ). Then, given a dataset 𝐱′′superscript𝐱′′{\bf x^{\prime\prime}}bold_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT of size nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, we perform the following test: run the differentially private algorithm A𝐴Aitalic_A on the dataset 𝐱′′superscript𝐱′′{\bf x}^{\prime\prime}bold_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT and compute 𝒲(P,A(𝐱′′))𝒲𝑃𝐴superscript𝐱′′\mathcal{W}(P,A({\bf x}^{\prime\prime}))caligraphic_W ( italic_P , italic_A ( bold_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ) and 𝒲(Q,A(𝐱′′))𝒲𝑄𝐴superscript𝐱′′\mathcal{W}(Q,A({\bf x}^{\prime\prime}))caligraphic_W ( italic_Q , italic_A ( bold_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ) and output the distribution with lower distance. Then, note that 𝒲(P,Q)𝒲(P,A(𝐱′′))+𝒲(Q,A(𝐱′′))𝒲𝑃𝑄𝒲𝑃𝐴superscript𝐱′′𝒲𝑄𝐴superscript𝐱′′\mathcal{W}(P,Q)\leq\mathcal{W}(P,A({\bf x}^{\prime\prime}))+\mathcal{W}(Q,A({% \bf x}^{\prime\prime}))caligraphic_W ( italic_P , italic_Q ) ≤ caligraphic_W ( italic_P , italic_A ( bold_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ) + caligraphic_W ( italic_Q , italic_A ( bold_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ) which implies that with probability at least 0.750.750.750.75, 𝒲(Q,A(𝐱′′))12𝒲(P,Q)𝒲𝑄𝐴superscript𝐱′′12𝒲𝑃𝑄\mathcal{W}(Q,A({\bf x}^{\prime\prime}))\geq\frac{1}{2}\mathcal{W}(P,Q)caligraphic_W ( italic_Q , italic_A ( bold_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ) ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG caligraphic_W ( italic_P , italic_Q ) if 𝐱′′Pnsimilar-tosuperscript𝐱′′superscript𝑃superscript𝑛{\bf x}^{\prime\prime}\sim P^{n^{\prime}}bold_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∼ italic_P start_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT (by the accuracy guarantee). A similar argument shows that with probability at least 0.750.750.750.75, 𝒲(P,A(𝐱′′))12𝒲(P,Q)𝒲𝑃𝐴superscript𝐱′′12𝒲𝑃𝑄\mathcal{W}(P,A({\bf x}^{\prime\prime}))\geq\frac{1}{2}\mathcal{W}(P,Q)caligraphic_W ( italic_P , italic_A ( bold_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ) ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG caligraphic_W ( italic_P , italic_Q ) if 𝐱′′Qnsimilar-tosuperscript𝐱′′superscript𝑄superscript𝑛{\bf x}^{\prime\prime}\sim Q^{n^{\prime}}bold_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∼ italic_Q start_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. Hence, with nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT samples we have defined a test that distinguishes P𝑃Pitalic_P and Q𝑄Qitalic_Q. However, by Lemma 6.10 bounding the KL𝐾𝐿KLitalic_K italic_L divergence between P𝑃Pitalic_P and Q𝑄Qitalic_Q, Theorem 6.9 on sample complexity lower bounds for testing, and Lemma A.4 on the relationship between KL and Hellinger distance, we get that any statistical test distinguishing P𝑃Pitalic_P and Q𝑄Qitalic_Q requires more than nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT samples, which is a contradiction. Hence, with probability at least 0.250.250.250.25 over the draw of two datasets 𝐱Pnsimilar-to𝐱superscript𝑃superscript𝑛{\bf x}\sim P^{n^{\prime}}bold_x ∼ italic_P start_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, 𝐱Qnsimilar-tosuperscript𝐱superscript𝑄superscript𝑛{\bf x}^{\prime}\sim Q^{n^{\prime}}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_Q start_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, and the randomness used by invocations of algorithm A𝐴Aitalic_A we must have that

max(𝒲(P,A(𝐱)),𝒲(Q,A(𝐱))12𝒲(P,Q).\max(\mathcal{W}(P,A({\bf x})),\mathcal{W}(Q,A({\bf x^{\prime}}))\geq\frac{1}{% 2}\mathcal{W}(P,Q).roman_max ( caligraphic_W ( italic_P , italic_A ( bold_x ) ) , caligraphic_W ( italic_Q , italic_A ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG caligraphic_W ( italic_P , italic_Q ) . (9)

Next, note that by Lemma 6.12 (with value nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT), we have that

𝔼[𝒲(P,P^n)]𝔼delimited-[]𝒲𝑃subscript^𝑃superscript𝑛\displaystyle\mathbb{E}[\mathcal{W}(P,\hat{P}_{n^{\prime}})]blackboard_E [ caligraphic_W ( italic_P , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ] 8[i=2logn112in[q11/2iq1/2i]+i=logn12i[q11/2iq1/2i]]absent8delimited-[]superscriptsubscript𝑖2superscript𝑛11superscript2𝑖superscript𝑛delimited-[]subscript𝑞11superscript2𝑖subscript𝑞1superscript2𝑖superscriptsubscript𝑖superscript𝑛1superscript2𝑖delimited-[]subscript𝑞11superscript2𝑖subscript𝑞1superscript2𝑖\displaystyle\leq 8\left[\sum_{i=2}^{\log n^{\prime}-1}\frac{1}{\sqrt{2^{i}n^{% \prime}}}\left[q_{1-1/2^{i}}-q_{1/2^{i}}\right]+\sum_{i=\log n^{\prime}}^{% \infty}\frac{1}{2^{i}}\left[q_{1-1/2^{i}}-q_{1/2^{i}}\right]\right]≤ 8 [ ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] + ∑ start_POSTSUBSCRIPT italic_i = roman_log italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] ]
=8[i=2lognClogn1Clogn2in[q11/2iq1/2i]+i=lognClogn12i[q11/2iq1/2i]]absent8delimited-[]superscriptsubscript𝑖2𝑛𝐶𝑛1𝐶𝑛superscript2𝑖𝑛delimited-[]subscript𝑞11superscript2𝑖subscript𝑞1superscript2𝑖superscriptsubscript𝑖𝑛𝐶𝑛1superscript2𝑖delimited-[]subscript𝑞11superscript2𝑖subscript𝑞1superscript2𝑖\displaystyle=8\left[\sum_{i=2}^{\log\frac{n}{C\log n}-1}\frac{\sqrt{C\log n}}% {\sqrt{2^{i}n}}\left[q_{1-1/2^{i}}-q_{1/2^{i}}\right]+\sum_{i=\log\frac{n}{C% \log n}}^{\infty}\frac{1}{2^{i}}\left[q_{1-1/2^{i}}-q_{1/2^{i}}\right]\right]= 8 [ ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log divide start_ARG italic_n end_ARG start_ARG italic_C roman_log italic_n end_ARG - 1 end_POSTSUPERSCRIPT divide start_ARG square-root start_ARG italic_C roman_log italic_n end_ARG end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_n end_ARG end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] + ∑ start_POSTSUBSCRIPT italic_i = roman_log divide start_ARG italic_n end_ARG start_ARG italic_C roman_log italic_n end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] ]
=8[Clogni=2lognlog(Clogn)112in[q11/2iq1/2i]+i=lognlog(Clogn)logn112i[q11/2iq1/2i]\displaystyle=8\Bigg{[}\sqrt{C\log n}\sum_{i=2}^{\log n-\log(C\log n)-1}\frac{% 1}{\sqrt{2^{i}n}}\left[q_{1-1/2^{i}}-q_{1/2^{i}}\right]+\sum_{i=\log n-\log(C% \log n)}^{\log n-1}\frac{1}{2^{i}}\left[q_{1-1/2^{i}}-q_{1/2^{i}}\right]= 8 [ square-root start_ARG italic_C roman_log italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - roman_log ( start_ARG italic_C roman_log italic_n end_ARG ) - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_n end_ARG end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] + ∑ start_POSTSUBSCRIPT italic_i = roman_log italic_n - roman_log ( start_ARG italic_C roman_log italic_n end_ARG ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ]
+i=logn12i[q11/2iq1/2i]]\displaystyle+\sum_{i=\log n}^{\infty}\frac{1}{2^{i}}\left[q_{1-1/2^{i}}-q_{1/% 2^{i}}\right]\Bigg{]}+ ∑ start_POSTSUBSCRIPT italic_i = roman_log italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] ]

Analyzing the middle term in the above sum, we have that

i=lognlog(Clogn)logn112i[q11/2iq1/2i]superscriptsubscript𝑖𝑛𝐶𝑛𝑛11superscript2𝑖delimited-[]subscript𝑞11superscript2𝑖subscript𝑞1superscript2𝑖\displaystyle\sum_{i=\log n-\log(C\log n)}^{\log n-1}\frac{1}{2^{i}}\left[q_{1% -1/2^{i}}-q_{1/2^{i}}\right]∑ start_POSTSUBSCRIPT italic_i = roman_log italic_n - roman_log ( start_ARG italic_C roman_log italic_n end_ARG ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] i=lognlog(Clogn)logn112i12lognlog(Clogn)[q11/2iq1/2i]absentsuperscriptsubscript𝑖𝑛𝐶𝑛𝑛11superscript2𝑖1superscript2𝑛𝐶𝑛delimited-[]subscript𝑞11superscript2𝑖subscript𝑞1superscript2𝑖\displaystyle\leq\sum_{i=\log n-\log(C\log n)}^{\log n-1}\frac{1}{\sqrt{2^{i}}% }\frac{1}{\sqrt{2^{\log n-\log(C\log n)}}}\left[q_{1-1/2^{i}}-q_{1/2^{i}}\right]≤ ∑ start_POSTSUBSCRIPT italic_i = roman_log italic_n - roman_log ( start_ARG italic_C roman_log italic_n end_ARG ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG end_ARG divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT roman_log italic_n - roman_log ( start_ARG italic_C roman_log italic_n end_ARG ) end_POSTSUPERSCRIPT end_ARG end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ]
i=lognlog(Clogn)logn112iClognn[q11/2iq1/2i]absentsuperscriptsubscript𝑖𝑛𝐶𝑛𝑛11superscript2𝑖𝐶𝑛𝑛delimited-[]subscript𝑞11superscript2𝑖subscript𝑞1superscript2𝑖\displaystyle\leq\sum_{i=\log n-\log(C\log n)}^{\log n-1}\frac{1}{\sqrt{2^{i}}% }\frac{\sqrt{C\log n}}{\sqrt{n}}\left[q_{1-1/2^{i}}-q_{1/2^{i}}\right]≤ ∑ start_POSTSUBSCRIPT italic_i = roman_log italic_n - roman_log ( start_ARG italic_C roman_log italic_n end_ARG ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG end_ARG divide start_ARG square-root start_ARG italic_C roman_log italic_n end_ARG end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ]
=Clogni=lognlog(Clogn)logn112in[q11/2iq1/2i]absent𝐶𝑛superscriptsubscript𝑖𝑛𝐶𝑛𝑛11superscript2𝑖𝑛delimited-[]subscript𝑞11superscript2𝑖subscript𝑞1superscript2𝑖\displaystyle=\sqrt{C\log n}\sum_{i=\log n-\log(C\log n)}^{\log n-1}\frac{1}{% \sqrt{2^{i}n}}\left[q_{1-1/2^{i}}-q_{1/2^{i}}\right]= square-root start_ARG italic_C roman_log italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = roman_log italic_n - roman_log ( start_ARG italic_C roman_log italic_n end_ARG ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_n end_ARG end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ]

Substituting this back in the previous sum, we have that

𝔼[𝒲(P,P^n)]𝔼delimited-[]𝒲𝑃subscript^𝑃superscript𝑛\displaystyle\mathbb{E}[\mathcal{W}(P,\hat{P}_{n^{\prime}})]blackboard_E [ caligraphic_W ( italic_P , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ] 8[Clogni=2lognlog(Clogn)112in[q11/2iq1/2i]\displaystyle\leq 8\Bigg{[}\sqrt{C\log n}\sum_{i=2}^{\log n-\log(C\log n)-1}% \frac{1}{\sqrt{2^{i}n}}\left[q_{1-1/2^{i}}-q_{1/2^{i}}\right]≤ 8 [ square-root start_ARG italic_C roman_log italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - roman_log ( start_ARG italic_C roman_log italic_n end_ARG ) - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_n end_ARG end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ]
+Clogni=lognlog(Clogn)logn112in[q11/2iq1/2i]+i=logn12i[q11/2iq1/2i]]\displaystyle\hskip 36.135pt+\sqrt{C\log n}\sum_{i=\log n-\log(C\log n)}^{\log n% -1}\frac{1}{\sqrt{2^{i}n}}\left[q_{1-1/2^{i}}-q_{1/2^{i}}\right]+\sum_{i=\log n% }^{\infty}\frac{1}{2^{i}}\left[q_{1-1/2^{i}}-q_{1/2^{i}}\right]\Bigg{]}+ square-root start_ARG italic_C roman_log italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = roman_log italic_n - roman_log ( start_ARG italic_C roman_log italic_n end_ARG ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_n end_ARG end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] + ∑ start_POSTSUBSCRIPT italic_i = roman_log italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] ]
8Clogn[i=2logn112in[q11/2iq1/2i]+i=logn12i[q11/2iq1/2i]]absent8𝐶𝑛delimited-[]superscriptsubscript𝑖2𝑛11superscript2𝑖𝑛delimited-[]subscript𝑞11superscript2𝑖subscript𝑞1superscript2𝑖superscriptsubscript𝑖𝑛1superscript2𝑖delimited-[]subscript𝑞11superscript2𝑖subscript𝑞1superscript2𝑖\displaystyle\leq 8\sqrt{C\log n}\Bigg{[}\sum_{i=2}^{\log n-1}\frac{1}{\sqrt{2% ^{i}n}}\left[q_{1-1/2^{i}}-q_{1/2^{i}}\right]+\sum_{i=\log n}^{\infty}\frac{1}% {2^{i}}\left[q_{1-1/2^{i}}-q_{1/2^{i}}\right]\Bigg{]}≤ 8 square-root start_ARG italic_C roman_log italic_n end_ARG [ ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_n end_ARG end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] + ∑ start_POSTSUBSCRIPT italic_i = roman_log italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] ]
16Clogn[i=2logn112in[q11/2iq1/2i]+i=logn12i[q11/2iq1/2i]]absent16𝐶superscript𝑛delimited-[]superscriptsubscript𝑖2𝑛11superscript2𝑖𝑛delimited-[]subscript𝑞11superscript2𝑖subscript𝑞1superscript2𝑖superscriptsubscript𝑖𝑛1superscript2𝑖delimited-[]subscript𝑞11superscript2𝑖subscript𝑞1superscript2𝑖\displaystyle\leq 16\sqrt{C\log n^{\prime}}\Bigg{[}\sum_{i=2}^{\log n-1}\frac{% 1}{\sqrt{2^{i}n}}\left[q_{1-1/2^{i}}-q_{1/2^{i}}\right]+\sum_{i=\log n}^{% \infty}\frac{1}{2^{i}}\left[q_{1-1/2^{i}}-q_{1/2^{i}}\right]\Bigg{]}≤ 16 square-root start_ARG italic_C roman_log italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG [ ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_n end_ARG end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] + ∑ start_POSTSUBSCRIPT italic_i = roman_log italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] ]

where in the last inequality we use the fact that nnsuperscript𝑛𝑛n^{\prime}\geq\sqrt{n}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≥ square-root start_ARG italic_n end_ARG. Hence, by Lemma 6.11 (which gives a lower bound on 𝒲(P,Q)𝒲𝑃𝑄\mathcal{W}(P,Q)caligraphic_W ( italic_P , italic_Q )) in conjunction with the above equation, we have that 𝒲(P,Q)Clogn𝔼[P,P^n]𝒲𝑃𝑄superscript𝐶superscript𝑛𝔼𝑃subscript^𝑃superscript𝑛\mathcal{W}(P,Q)\geq\frac{C^{\prime}}{\sqrt{\log n^{\prime}}}\mathbb{E}[P,\hat% {P}_{n^{\prime}}]caligraphic_W ( italic_P , italic_Q ) ≥ divide start_ARG italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG roman_log italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_ARG blackboard_E [ italic_P , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] for some sufficiently small constant Csuperscript𝐶C^{\prime}italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Substituting back in Equation 9, we have that with probability at least 0.250.250.250.25 over the draw of two datasets 𝐱Pnsimilar-to𝐱superscript𝑃superscript𝑛{\bf x}\sim P^{n^{\prime}}bold_x ∼ italic_P start_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, 𝐱Qnsimilar-tosuperscript𝐱superscript𝑄superscript𝑛{\bf x}^{\prime}\sim Q^{n^{\prime}}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_Q start_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, and the randomness used by invocations of algorithm A𝐴Aitalic_A we have that

max(𝒲(P,A(𝐱)),𝒲(Q,A(𝐱))12Clogn𝔼[𝒲(P,P^n)]12Clogn𝔼[𝒲(P|q1k,q11k,P^n|q1k,q11k)],\max(\mathcal{W}(P,A({\bf x})),\mathcal{W}(Q,A({\bf x^{\prime}}))\geq\frac{1}{% 2}\frac{C^{\prime}}{\sqrt{\log n^{\prime}}}\mathbb{E}[\mathcal{W}(P,\hat{P}_{n% ^{\prime}})]\geq\frac{1}{2}\frac{C^{\prime}}{\sqrt{\log n^{\prime}}}\mathbb{E}% \left[\mathcal{W}(P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}},\hat{P}_{n^{\prime}}|% _{q_{\frac{1}{k}},q_{1-\frac{1}{k}}})\right],roman_max ( caligraphic_W ( italic_P , italic_A ( bold_x ) ) , caligraphic_W ( italic_Q , italic_A ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG divide start_ARG italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG roman_log italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_ARG blackboard_E [ caligraphic_W ( italic_P , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ] ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG divide start_ARG italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG roman_log italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_ARG blackboard_E [ caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] ,

as required.

6.2 Upper Bound

In this section, we describe an algorithm that achieves the instance optimal rate described in the previous section (up to polylogarithmic factors in some of the terms).

We will be looking at distributions P𝑃Pitalic_P supported on a discrete, ordered interval {a,a+γ,,bγ,b}𝑎𝑎𝛾𝑏𝛾𝑏\{a,a+\gamma,\dots,b-\gamma,b\}{ italic_a , italic_a + italic_γ , … , italic_b - italic_γ , italic_b }. Note that by a simple coupling argument, any continuous distribution Pcontsuperscript𝑃𝑐𝑜𝑛𝑡P^{cont}italic_P start_POSTSUPERSCRIPT italic_c italic_o italic_n italic_t end_POSTSUPERSCRIPT on [a,b]𝑎𝑏[a,b][ italic_a , italic_b ] is at most γ𝛾\gammaitalic_γ away in Wasserstein distance from a distribution on this grid. The dependence on γ𝛾\gammaitalic_γ in our bounds for discrete distributions will be inverse polylogarithmic (or better), and so our algorithms for estimating distributions P𝑃Pitalic_P in the interval {a,a+γ,,bγ,b}𝑎𝑎𝛾𝑏𝛾𝑏\{a,a+\gamma,\dots,b-\gamma,b\}{ italic_a , italic_a + italic_γ , … , italic_b - italic_γ , italic_b } also work to give similar bounds for continuous distributions on [a,b]𝑎𝑏[a,b][ italic_a , italic_b ], up to a small additive factor of γ𝛾\gammaitalic_γ, which can be set to any inverse polynomial in the dataset size without significantly affecting our bounds.

Formally, we will prove the following theorem (See Theorem 6.15 for a more detailed statement).

Theorem 6.14.

Fix ε,β(0,1]𝜀𝛽01\varepsilon,\beta\in(0,1]italic_ε , italic_β ∈ ( 0 , 1 ], a,b𝑎𝑏a,b\in\mathbb{R}italic_a , italic_b ∈ blackboard_R, and γ<ba𝛾𝑏𝑎\gamma<b-a\in\mathbb{R}italic_γ < italic_b - italic_a ∈ blackboard_R such that baγ𝑏𝑎𝛾\frac{b-a}{\gamma}divide start_ARG italic_b - italic_a end_ARG start_ARG italic_γ end_ARG is an integer. Let n>c2log4baβγε𝑛subscript𝑐2superscript4𝑏𝑎𝛽𝛾𝜀n\in\mathbb{N}>c_{2}\frac{\log^{4}\frac{b-a}{\beta\gamma}}{\varepsilon}italic_n ∈ blackboard_N > italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT divide start_ARG roman_log start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG italic_b - italic_a end_ARG start_ARG italic_β italic_γ end_ARG end_ARG start_ARG italic_ε end_ARG for some sufficiently large constant c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. There exists an algorithm A𝐴Aitalic_A that for any distribution P𝑃Pitalic_P on {a,a+γ,a+2γ,,bγ,b}𝑎𝑎𝛾𝑎2𝛾𝑏𝛾𝑏\{a,a+\gamma,a+2\gamma,\dots,b-\gamma,b\}{ italic_a , italic_a + italic_γ , italic_a + 2 italic_γ , … , italic_b - italic_γ , italic_b } satisfies the following. When run with input a random sample 𝐱Pnsimilar-to𝐱superscript𝑃𝑛{\bf x}\sim P^{n}bold_x ∼ italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, A𝐴Aitalic_A outputs a distribution PDPsuperscript𝑃𝐷𝑃P^{DP}italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT such that with probability at least 1β1𝛽1-\beta1 - italic_β over the randomness of 𝐱𝐱{\bf x}bold_x and the algorithm,

𝒲(P,PDP)=O(1k(q11kq1k)+𝒲(P,P|q1k,q11k)+lognβ𝔼[𝒲(P|q1k,q11k,P^n|q1k,q11k)]),𝒲𝑃superscript𝑃𝐷𝑃𝑂1𝑘subscript𝑞11𝑘subscript𝑞1𝑘𝒲𝑃evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘𝑛𝛽𝔼delimited-[]𝒲evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘evaluated-atsubscript^𝑃𝑛subscript𝑞1𝑘subscript𝑞11𝑘\mathcal{W}(P,P^{DP})=O\left(\frac{1}{k}\left(q_{1-\frac{1}{k}}-q_{\frac{1}{k}% }\right)+\mathcal{W}(P,P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}})+\sqrt{\log\frac% {n}{\beta}}\mathbb{E}\left[\mathcal{W}\left(P|_{q_{\frac{1}{k}},q_{1-\frac{1}{% k}}},\hat{P}_{n}|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}}\right)\right]\right),caligraphic_W ( italic_P , italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT ) = italic_O ( divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ( italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT ) + caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + square-root start_ARG roman_log divide start_ARG italic_n end_ARG start_ARG italic_β end_ARG end_ARG blackboard_E [ caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] ) ,

where P^nsubscript^𝑃𝑛\hat{P}_{n}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the empirical distribution on n𝑛nitalic_n samples drawn independently from P𝑃Pitalic_P, qαsubscript𝑞𝛼q_{\alpha}italic_q start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT represents the α𝛼\alphaitalic_α-quantile of distribution P𝑃Pitalic_P, and k=εn4c3log3baβγlognβ𝑘𝜀𝑛4subscript𝑐3superscript3𝑏𝑎𝛽𝛾𝑛𝛽k=\lceil\frac{\varepsilon n}{4c_{3}\log^{3}\frac{b-a}{\beta\gamma}\log\frac{n}% {\beta}}\rceilitalic_k = ⌈ divide start_ARG italic_ε italic_n end_ARG start_ARG 4 italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT roman_log start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT divide start_ARG italic_b - italic_a end_ARG start_ARG italic_β italic_γ end_ARG roman_log divide start_ARG italic_n end_ARG start_ARG italic_β end_ARG end_ARG ⌉ for a sufficiently large constant c3subscript𝑐3c_{3}italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT.

Since kεn/log(n)𝑘𝜀𝑛𝑛k\approx\varepsilon n/\log(n)italic_k ≈ italic_ε italic_n / roman_log ( start_ARG italic_n end_ARG ), this upper bound matches the lower bound in Theorem 6.3 in its dependence on ε𝜀\varepsilonitalic_ε and its dependence on n𝑛nitalic_n (up to logarithmic factors in n𝑛nitalic_n). The algorithm that we will analyze proceeds by estimating sufficiently many quantiles from the empirical distribution and distributing mass evenly between the chosen quantiles. The number of quantiles is chosen carefully to ensure that the estimated α𝛼\alphaitalic_α-quantiles are also approximately α𝛼\alphaitalic_α-quantiles for the empirical distribution (and hence also approximately for the true distribution), and to ensure that the CDF of the output distribution closely tracks the CDF of the empirical distribution. Through a careful analysis, we are able to leverage these properties to give instance optimality guarantees for the accuracy of the algorithm.

6.2.1 Algorithm for density estimation

Algorithm 5 is our algorithm for density estimation, and proceeds by differentially privately estimating sufficiently many quantiles of the distribution and placing equal mass on each of them. We argue that a simple CDF based differentially private quantiles estimator Aquantsubscript𝐴𝑞𝑢𝑎𝑛𝑡A_{quant}italic_A start_POSTSUBSCRIPT italic_q italic_u italic_a italic_n italic_t end_POSTSUBSCRIPT satisfies a specific guarantee that will be key to our analysis. See Appendix E for more details about the quantiles algorithm and formal statements and proofs therein.

Algorithm 5 Algorithm A𝐴Aitalic_A for estimating a distribution on \mathbb{R}blackboard_R
1:Input: 𝐱=(x1,,xn)Pn𝐱subscript𝑥1subscript𝑥𝑛similar-tosuperscript𝑃𝑛{\bf x}=(x_{1},\dots,x_{n})\sim P^{n}bold_x = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∼ italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, privacy parameter ε𝜀\varepsilonitalic_ε, interval end-points a,b𝑎𝑏a,bitalic_a , italic_b, granularity γ𝛾\gammaitalic_γ, access to algorithm Aquantsubscript𝐴𝑞𝑢𝑎𝑛𝑡A_{quant}italic_A start_POSTSUBSCRIPT italic_q italic_u italic_a italic_n italic_t end_POSTSUBSCRIPT
2:Output: Distribution PDPsuperscript𝑃𝐷𝑃P^{DP}italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT on \mathbb{R}blackboard_R.
3:Let k𝑘kitalic_k be set to εn4c3log3baβγlognβ𝜀𝑛4subscript𝑐3superscript3𝑏𝑎𝛽𝛾𝑛𝛽\lceil\frac{\varepsilon n}{4c_{3}\log^{3}\frac{b-a}{\beta\gamma}\log\frac{n}{% \beta}}\rceil⌈ divide start_ARG italic_ε italic_n end_ARG start_ARG 4 italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT roman_log start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT divide start_ARG italic_b - italic_a end_ARG start_ARG italic_β italic_γ end_ARG roman_log divide start_ARG italic_n end_ARG start_ARG italic_β end_ARG end_ARG ⌉ for a sufficiently large constant c3subscript𝑐3c_{3}italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT.
4:Use Algorithm Aquantsubscript𝐴𝑞𝑢𝑎𝑛𝑡A_{quant}italic_A start_POSTSUBSCRIPT italic_q italic_u italic_a italic_n italic_t end_POSTSUBSCRIPT referenced in Theorem E.2 with inputs interval end points a,b𝑎𝑏a,bitalic_a , italic_b, granularity γ𝛾\gammaitalic_γ, 𝐱=(x1,,xn){a,a+γ,,bγ,b}n𝐱subscript𝑥1subscript𝑥𝑛superscript𝑎𝑎𝛾𝑏𝛾𝑏𝑛{\bf x}=(x_{1},\dots,x_{n})\in\{a,a+\gamma,\dots,b-\gamma,b\}^{n}bold_x = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∈ { italic_a , italic_a + italic_γ , … , italic_b - italic_γ , italic_b } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and desired quantile values α={1/2k,3/2k,5/2k,,(2k1)/2k}𝛼12𝑘32𝑘52𝑘2𝑘12𝑘{\bf\alpha}=\{1/2k,3/2k,5/2k,\dots,(2k-1)/2k\}italic_α = { 1 / 2 italic_k , 3 / 2 italic_k , 5 / 2 italic_k , … , ( 2 italic_k - 1 ) / 2 italic_k }, and let the outputs be q~1,q~ksubscript~𝑞1subscript~𝑞𝑘\tilde{q}_{1}\dots,\tilde{q}_{k}over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … , over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.
5:for j[k]𝑗delimited-[]𝑘j\in[k]italic_j ∈ [ italic_k ] do
6:     Set PDP(q~j)=1ksuperscript𝑃𝐷𝑃subscript~𝑞𝑗1𝑘P^{DP}(\tilde{q}_{j})=\frac{1}{k}italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT ( over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_k end_ARG.
7:Output PDPsuperscript𝑃𝐷𝑃P^{DP}italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT.

Observe that Algorithm 5 inherits the privacy of Aquantsubscript𝐴𝑞𝑢𝑎𝑛𝑡A_{quant}italic_A start_POSTSUBSCRIPT italic_q italic_u italic_a italic_n italic_t end_POSTSUBSCRIPT, since it simply postprocesses the quantiles it receives from that subroutine, and hence is also ε𝜀\varepsilonitalic_ε-DP.

Now, we are in a position to state our main theorem, which bounds the Wasserstein distance between the distribution output by our algorithm, and the underlying probability distribution P𝑃Pitalic_P.

Theorem 6.15.

Fix ε,β(0,1]𝜀𝛽01\varepsilon,\beta\in(0,1]italic_ε , italic_β ∈ ( 0 , 1 ], a,b𝑎𝑏a,b\in\mathbb{R}italic_a , italic_b ∈ blackboard_R, and γ<ba𝛾𝑏𝑎\gamma<b-a\in\mathbb{R}italic_γ < italic_b - italic_a ∈ blackboard_R such that baγ𝑏𝑎𝛾\frac{b-a}{\gamma}divide start_ARG italic_b - italic_a end_ARG start_ARG italic_γ end_ARG is an integer. Let n>c2log4baγβεε𝑛subscript𝑐2superscript4𝑏𝑎𝛾𝛽𝜀𝜀n\in\mathbb{N}>c_{2}\frac{\log^{4}\frac{b-a}{\gamma\beta\varepsilon}}{\varepsilon}italic_n ∈ blackboard_N > italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT divide start_ARG roman_log start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG italic_b - italic_a end_ARG start_ARG italic_γ italic_β italic_ε end_ARG end_ARG start_ARG italic_ε end_ARG for some sufficiently large constant c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Let P𝑃Pitalic_P be any distribution supported on {a,a+γ,a+2γ,,bγ,b}𝑎𝑎𝛾𝑎2𝛾𝑏𝛾𝑏\{a,a+\gamma,a+2\gamma,\dots,b-\gamma,b\}{ italic_a , italic_a + italic_γ , italic_a + 2 italic_γ , … , italic_b - italic_γ , italic_b }, and 𝐱Pnsimilar-to𝐱superscript𝑃𝑛{\bf x}\sim P^{n}bold_x ∼ italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT.

Then, Algorithm 5, when given inputs 𝐱𝐱{\bf x}bold_x, privacy parameter ε𝜀\varepsilonitalic_ε, interval end points a,b𝑎𝑏a,bitalic_a , italic_b, and granularity γ𝛾\gammaitalic_γ, outputs a distribution PDPsuperscript𝑃𝐷𝑃P^{DP}italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT such that with probability at least 1O(β)1𝑂𝛽1-O(\beta)1 - italic_O ( italic_β ) over the randomness of 𝐱𝐱{\bf x}bold_x and the algorithm,

𝒲(P,PDP)clogn𝔼[𝒲(P|q1k,q11k,P^n|q1k,q11k]+C′′𝒲(P,P|q1k,q11k)+2k(q11/kq1/k),\mathcal{W}(P,P^{DP})\leq\sqrt{c\log n}\cdot\mathbb{E}\left[\mathcal{W}(P|_{q_% {\frac{1}{k}},q_{1-\frac{1}{k}}},\hat{P}_{n}|_{q_{\frac{1}{k}},q_{1-\frac{1}{k% }}}\right]+C^{\prime\prime}\mathcal{W}(P,P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}% })+\frac{2}{k}\left(q_{1-1/k}-q_{1/k}\right),caligraphic_W ( italic_P , italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT ) ≤ square-root start_ARG italic_c roman_log italic_n end_ARG ⋅ blackboard_E [ caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] + italic_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + divide start_ARG 2 end_ARG start_ARG italic_k end_ARG ( italic_q start_POSTSUBSCRIPT 1 - 1 / italic_k end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT ) ,

where P^nsubscript^𝑃𝑛\hat{P}_{n}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the uniform distribution on 𝐱𝐱{\bf x}bold_x, qαsubscript𝑞𝛼q_{\alpha}italic_q start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT represents the α𝛼\alphaitalic_α-quantile of distribution P𝑃Pitalic_P, c,C′′𝑐superscript𝐶′′c,C^{\prime\prime}italic_c , italic_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT are sufficiently large constants, and k=εn4c3log3baβγlognβ𝑘𝜀𝑛4subscript𝑐3superscript3𝑏𝑎𝛽𝛾𝑛𝛽k=\lceil\frac{\varepsilon n}{4c_{3}\log^{3}\frac{b-a}{\beta\gamma}\log\frac{n}% {\beta}}\rceilitalic_k = ⌈ divide start_ARG italic_ε italic_n end_ARG start_ARG 4 italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT roman_log start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT divide start_ARG italic_b - italic_a end_ARG start_ARG italic_β italic_γ end_ARG roman_log divide start_ARG italic_n end_ARG start_ARG italic_β end_ARG end_ARG ⌉, where c3subscript𝑐3c_{3}italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT is a sufficiently large constant.

We note that using more sophisticated differentially private CDF estimators to estimate quantiles (such as ones in [BNSV15, CLN+23]), we can also obtain a version of the same theorem for approximate differential privacy, with a better dependence on the size of the domain baγ𝑏𝑎𝛾\frac{b-a}{\gamma}divide start_ARG italic_b - italic_a end_ARG start_ARG italic_γ end_ARG (only log(baγ)superscript𝑏𝑎𝛾\log^{*}(\frac{b-a}{\gamma})roman_log start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( divide start_ARG italic_b - italic_a end_ARG start_ARG italic_γ end_ARG ) as opposed to polylog(baγ)𝑝𝑜𝑙𝑦𝑏𝑎𝛾poly\log(\frac{b-a}{\gamma})italic_p italic_o italic_l italic_y roman_log ( start_ARG divide start_ARG italic_b - italic_a end_ARG start_ARG italic_γ end_ARG end_ARG ), where logtsuperscript𝑡\log^{*}troman_log start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_t is the number of times log\logroman_log has to be applied to t𝑡titalic_t to get it to be 1absent1\leq 1≤ 1). 555The theorem would be of the same form as Theorem 6.15, except that Algorithm 5 would be (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-DP, with the lower bound on n𝑛nitalic_n instead being n=Ω(polylogbaγεδlog1/δlog(1/β)ε)𝑛Ωsuperscriptpolylog𝑏𝑎𝛾𝜀𝛿1𝛿1𝛽𝜀n=\Omega\left(\frac{\operatorname{polylog}^{*}\frac{b-a}{\gamma\varepsilon% \delta}\sqrt{\log 1/\delta}\log(1/\beta)}{\varepsilon}\right)italic_n = roman_Ω ( divide start_ARG roman_polylog start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT divide start_ARG italic_b - italic_a end_ARG start_ARG italic_γ italic_ε italic_δ end_ARG square-root start_ARG roman_log 1 / italic_δ end_ARG roman_log ( start_ARG 1 / italic_β end_ARG ) end_ARG start_ARG italic_ε end_ARG ), and k𝑘kitalic_k being set instead to O(εnlogbaγpolylognβ)𝑂𝜀𝑛superscript𝑏𝑎𝛾polylog𝑛𝛽O\left(\frac{\varepsilon n}{\log^{*}\frac{b-a}{\gamma}\operatorname{polylog}% \frac{n}{\beta}}\right)italic_O ( divide start_ARG italic_ε italic_n end_ARG start_ARG roman_log start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT divide start_ARG italic_b - italic_a end_ARG start_ARG italic_γ end_ARG roman_polylog divide start_ARG italic_n end_ARG start_ARG italic_β end_ARG end_ARG ).

To prove Theorem 6.15, we first relate the Wasserstein distance of interest (between the true distribution P𝑃Pitalic_P and the algorithm’s output distribution PDPsuperscript𝑃𝐷𝑃P^{DP}italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT to a quantity related to an appropriately chosen restriction. Let qαsubscript𝑞𝛼q_{\alpha}italic_q start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT represent the α𝛼\alphaitalic_α-quantile of P𝑃Pitalic_P and q^αsubscript^𝑞𝛼\hat{q}_{\alpha}over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT represent the α𝛼\alphaitalic_α-quantile of P^nsubscript^𝑃𝑛\hat{P}_{n}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and q~αsubscript~𝑞𝛼\tilde{q}_{\alpha}over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT represent the α𝛼\alphaitalic_α-quantiles of PDPsuperscript𝑃𝐷𝑃P^{DP}italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT. We also note that all these distributions (and others that will come up in the proof) are bounded distributions over the real line and so we can freely apply the triangle inequality for Wasserstein distance, and the cumulative distribution formula for Wasserstein distance (Lemma 2.3). The proof of the main theorem will follow from the following lemmas (all proved in Appendix F).

Lemma 6.16.

Let C′′>0superscript𝐶′′0C^{\prime\prime}>0italic_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT > 0 be a sufficiently large constant, and let n>0𝑛0n>0italic_n > 0 be sufficiently large. With probability at least 1O(β)1𝑂𝛽1-O(\beta)1 - italic_O ( italic_β ) over the randomness in data samples and Algorithm 5,

𝒲(P,PDP)𝒲(P|q1k,q11k,PDP|q1k,q11k)+C′′𝒲(P,P|q1k,q11k).𝒲𝑃superscript𝑃𝐷𝑃𝒲evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘evaluated-atsuperscript𝑃𝐷𝑃subscript𝑞1𝑘subscript𝑞11𝑘superscript𝐶′′𝒲𝑃evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘\mathcal{W}(P,P^{DP})\leq\mathcal{W}(P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}},P^% {DP}|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}})+C^{\prime\prime}\mathcal{W}(P,P|_{q% _{\frac{1}{k}},q_{1-\frac{1}{k}}}).caligraphic_W ( italic_P , italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT ) ≤ caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + italic_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) .
Lemma 6.17 (Wasserstein in terms of quantiles).

For all datasets 𝐱𝐱{\bf x}bold_x (with data entries in [a,b]𝑎𝑏[a,b][ italic_a , italic_b ]), with probability at least 1β1𝛽1-\beta1 - italic_β over the randomness of Algorithm 5, we have that

𝒲(P^n|q1k,q11k,PDP|q1k,q11k)2k(q11/kq1/k),𝒲evaluated-atsubscript^𝑃𝑛subscript𝑞1𝑘subscript𝑞11𝑘evaluated-atsuperscript𝑃𝐷𝑃subscript𝑞1𝑘subscript𝑞11𝑘2𝑘subscript𝑞11𝑘subscript𝑞1𝑘\mathcal{W}(\hat{P}_{n}|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}},P^{DP}|_{q_{\frac% {1}{k}},q_{1-\frac{1}{k}}})\leq\frac{2}{k}\left(q_{1-1/k}-q_{1/k}\right),caligraphic_W ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ divide start_ARG 2 end_ARG start_ARG italic_k end_ARG ( italic_q start_POSTSUBSCRIPT 1 - 1 / italic_k end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT ) ,

where P^nsubscript^𝑃𝑛\hat{P}_{n}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the uniform distribution over 𝐱𝐱{\bf x}bold_x.

Now, we argue about the concentration of the Wasserstein distance between restrictions of the empirical distribution and restrictions of the true distribution.

Claim 6.18.

Fix β(0,1)𝛽01\beta\in(0,1)italic_β ∈ ( 0 , 1 ) and sufficiently large constants c3,c6subscript𝑐3subscript𝑐6c_{3},c_{6}italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT. Let n>0𝑛0n>0italic_n > 0 be sufficiently large such that n>logn/β𝑛𝑛𝛽n>\log n/\betaitalic_n > roman_log italic_n / italic_β (as in Theorem 6.15). For all k𝑘kitalic_k such that 1k>c3lognβn1𝑘subscript𝑐3𝑛𝛽𝑛\frac{1}{k}>c_{3}\frac{\log\frac{n}{\beta}}{n}divide start_ARG 1 end_ARG start_ARG italic_k end_ARG > italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT divide start_ARG roman_log divide start_ARG italic_n end_ARG start_ARG italic_β end_ARG end_ARG start_ARG italic_n end_ARG, with probability at least 1O(β)1𝑂𝛽1-O(\beta)1 - italic_O ( italic_β ) over the randomness in the data,

𝒲(P|q1k,q11k,P^n|q1k,q11k)c6lognβ𝔼[𝒲(P|q1k,q11k,P^n|q1k,q11k)].𝒲evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘evaluated-atsubscript^𝑃𝑛subscript𝑞1𝑘subscript𝑞11𝑘subscript𝑐6𝑛𝛽𝔼delimited-[]𝒲evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘evaluated-atsubscript^𝑃𝑛subscript𝑞1𝑘subscript𝑞11𝑘\mathcal{W}(P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}},\hat{P}_{n}|_{q_{\frac{1}{k% }},q_{1-\frac{1}{k}}})\leq\sqrt{c_{6}\log\ \frac{n}{\beta}}\cdot\mathbb{E}[% \mathcal{W}(P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}},\hat{P}_{n}|_{q_{\frac{1}{k% }},q_{1-\frac{1}{k}}})].caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ square-root start_ARG italic_c start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT roman_log divide start_ARG italic_n end_ARG start_ARG italic_β end_ARG end_ARG ⋅ blackboard_E [ caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] .

Now, we give the proof of our main theorem.

Theorem 6.15.

Using Lemma 6.16, Claim 6.18 and the triangle inequality, we have that with probability at least 1O(β)1𝑂𝛽1-O(\beta)1 - italic_O ( italic_β ) over the randomness of the data and the algorithm,

𝒲(P,PDP)𝒲𝑃superscript𝑃𝐷𝑃\displaystyle\mathcal{W}(P,P^{DP})caligraphic_W ( italic_P , italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT ) 𝒲(P|q1k,q11k,PDP|q1k,q11k)+C′′𝒲(P,P|q1k,q11k)absent𝒲evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘evaluated-atsuperscript𝑃𝐷𝑃subscript𝑞1𝑘subscript𝑞11𝑘superscript𝐶′′𝒲𝑃evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘\displaystyle\leq\mathcal{W}(P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}},P^{DP}|_{q% _{\frac{1}{k}},q_{1-\frac{1}{k}}})+C^{\prime\prime}\mathcal{W}(P,P|_{q_{\frac{% 1}{k}},q_{1-\frac{1}{k}}})≤ caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + italic_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT )
𝒲(P^n|q1k,q11k,PDP|q1k,q11k)+𝒲(P^n|q1k,q11k,P|q1k,q11k)+C′′𝒲(P,P|q1k,q11k)absent𝒲evaluated-atsubscript^𝑃𝑛subscript𝑞1𝑘subscript𝑞11𝑘evaluated-atsuperscript𝑃𝐷𝑃subscript𝑞1𝑘subscript𝑞11𝑘𝒲evaluated-atsubscript^𝑃𝑛subscript𝑞1𝑘subscript𝑞11𝑘evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘superscript𝐶′′𝒲𝑃evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘\displaystyle\leq\mathcal{W}(\hat{P}_{n}|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}},% P^{DP}|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}})+\mathcal{W}(\hat{P}_{n}|_{q_{% \frac{1}{k}},q_{1-\frac{1}{k}}},P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}})+C^{% \prime\prime}\mathcal{W}(P,P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}})≤ caligraphic_W ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + caligraphic_W ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + italic_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT )
𝒲(P^n|q1k,q11k,PDP|q1k,q11k)+c6lognβ𝔼[𝒲(P^n|q1k,q11k,P|q1k,q11k)]+C′′𝒲(P,P|q1k,q11k)absent𝒲evaluated-atsubscript^𝑃𝑛subscript𝑞1𝑘subscript𝑞11𝑘evaluated-atsuperscript𝑃𝐷𝑃subscript𝑞1𝑘subscript𝑞11𝑘subscript𝑐6𝑛𝛽𝔼delimited-[]𝒲evaluated-atsubscript^𝑃𝑛subscript𝑞1𝑘subscript𝑞11𝑘evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘superscript𝐶′′𝒲𝑃evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘\displaystyle\leq\mathcal{W}(\hat{P}_{n}|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}},% P^{DP}|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}})+\sqrt{c_{6}\log\frac{n}{\beta}}% \mathbb{E}\left[\mathcal{W}(\hat{P}_{n}|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}},P% |_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}})\right]+C^{\prime\prime}\mathcal{W}(P,P|% _{q_{\frac{1}{k}},q_{1-\frac{1}{k}}})≤ caligraphic_W ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + square-root start_ARG italic_c start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT roman_log divide start_ARG italic_n end_ARG start_ARG italic_β end_ARG end_ARG blackboard_E [ caligraphic_W ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] + italic_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT )

Finally, applying Lemma 6.17 and taking a union bound over failure probabilities, we get that with probability at least 1O(β)1𝑂𝛽1-O(\beta)1 - italic_O ( italic_β ) over the randomness of the data and the algorithm,

𝒲(P,PDP)𝒲𝑃superscript𝑃𝐷𝑃\displaystyle\mathcal{W}(P,P^{DP})caligraphic_W ( italic_P , italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT ) 2k(q11/kq1/k)+c6lognβ𝔼[𝒲(P^n|q1k,q11k,P|q1k,q11k)]+C′′𝒲(P,P|q1k,q11k)absent2𝑘subscript𝑞11𝑘subscript𝑞1𝑘subscript𝑐6𝑛𝛽𝔼delimited-[]𝒲evaluated-atsubscript^𝑃𝑛subscript𝑞1𝑘subscript𝑞11𝑘evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘superscript𝐶′′𝒲𝑃evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘\displaystyle\leq\frac{2}{k}\left(q_{1-1/k}-q_{1/k}\right)+\sqrt{c_{6}\log% \frac{n}{\beta}}\mathbb{E}\left[\mathcal{W}(\hat{P}_{n}|_{q_{\frac{1}{k}},q_{1% -\frac{1}{k}}},P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}})\right]+C^{\prime\prime}% \mathcal{W}(P,P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}})≤ divide start_ARG 2 end_ARG start_ARG italic_k end_ARG ( italic_q start_POSTSUBSCRIPT 1 - 1 / italic_k end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT ) + square-root start_ARG italic_c start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT roman_log divide start_ARG italic_n end_ARG start_ARG italic_β end_ARG end_ARG blackboard_E [ caligraphic_W ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] + italic_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT )

as required. ∎

References

  • [AAK21] Ishaq Aden-Ali, Hassan Ashtiani, and Gautam Kamath. On the sample complexity of privately learning unbounded high-dimensional gaussians. In Vitaly Feldman, Katrina Ligett, and Sivan Sabato, editors, Algorithmic Learning Theory, 16-19 March 2021, Virtual Conference, Worldwide, volume 132 of Proceedings of Machine Learning Research, pages 185–216. PMLR, 2021.
  • [AAL23a] Mohammad Afzali, Hassan Ashtiani, and Christopher Liaw. Mixtures of gaussians are privately learnable with a polynomial number of samples. CoRR, abs/2309.03847, 2023.
  • [AAL23b] Jamil Arbas, Hassan Ashtiani, and Christopher Liaw. Polynomial time and private learning of unbounded gaussian mixture models. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 1018–1040. PMLR, 2023.
  • [ABC17] Peyman Afshani, Jérémy Barbay, and Timothy M. Chan. Instance-optimal geometric algorithms. J. ACM, 64(1), mar 2017.
  • [AD20] Hilal Asi and John C. Duchi. Instance-optimality in differential privacy via approximate inverse sensitivity mechanisms. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  • [ADJ+11] Jayadev Acharya, Hirakendu Das, Ashkan Jafarpour, Alon Orlitsky, and Shengjun Pan. Competitive closeness testing. In Sham M. Kakade and Ulrike von Luxburg, editors, COLT 2011 - The 24th Annual Conference on Learning Theory, June 9-11, 2011, Budapest, Hungary, volume 19 of JMLR Proceedings, pages 47–68. JMLR.org, 2011.
  • [ADJ+12] Jayadev Acharya, Hirakendu Das, Ashkan Jafarpour, Alon Orlitsky, Shengjun Pan, and Ananda Theertha Suresh. Competitive classification and closeness testing. In Shie Mannor, Nathan Srebro, and Robert C. Williamson, editors, COLT 2012 - The 25th Annual Conference on Learning Theory, June 25-27, 2012, Edinburgh, Scotland, volume 23 of JMLR Proceedings, pages 22.1–22.18. JMLR.org, 2012.
  • [AJOS13a] Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, and Ananda Theertha Suresh. A competitive test for uniformity of monotone distributions. In Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2013, Scottsdale, AZ, USA, April 29 - May 1, 2013, volume 31 of JMLR Workshop and Conference Proceedings, pages 57–65. JMLR.org, 2013.
  • [AJOS13b] Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, and Ananda Theertha Suresh. Optimal probability estimation with applications to prediction and classification. In Shai Shalev-Shwartz and Ingo Steinwart, editors, COLT 2013 - The 26th Annual Conference on Learning Theory, June 12-14, 2013, Princeton University, NJ, USA, volume 30 of JMLR Workshop and Conference Proceedings, pages 764–796. JMLR.org, 2013.
  • [AKT+23] Daniel Alabi, Pravesh K. Kothari, Pranay Tankala, Prayaag Venkat, and Fred Zhang. Privately estimating a gaussian: Efficient, robust, and optimal. In Barna Saha and Rocco A. Servedio, editors, Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC 2023, Orlando, FL, USA, June 20-23, 2023, pages 483–496. ACM, 2023.
  • [AL22] Hassan Ashtiani and Christopher Liaw. Private and polynomial time algorithms for learning gaussians and beyond. In Po-Ling Loh and Maxim Raginsky, editors, Conference on Learning Theory, 2-5 July 2022, London, UK, volume 178 of Proceedings of Machine Learning Research, pages 1075–1076. PMLR, 2022.
  • [ALMM19] Noga Alon, Roi Livni, Maryanthe Malliaris, and Shay Moran. Private PAC learning implies finite littlestone dimension. In Moses Charikar and Edith Cohen, editors, Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, Phoenix, AZ, USA, June 23-26, 2019, pages 852–860. ACM, 2019.
  • [ASSU24] Maryam Aliakbarpour, Rose Silver, Thomas Steinke, and Jonathan R. Ullman. Differentially private medians and interior points for non-pathological data. In Venkatesan Guruswami, editor, 15th Innovations in Theoretical Computer Science Conference, ITCS 2024, January 30 to February 2, 2024, Berkeley, CA, USA, volume 287 of LIPIcs, pages 3:1–3:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2024.
  • [ASZ17] Jayadev Acharya, Ziteng Sun, and Huanyu Zhang. Differentially private testing of identity and closeness of discrete distributions. CoRR, abs/1707.05128, 2017.
  • [ASZ20] Jayadev Acharya, Ziteng Sun, and Huanyu Zhang. Differentially private assouad, fano, and le cam. CoRR, abs/2004.06830, 2020.
  • [ASZ21] Jayadev Acharya, Ziteng Sun, and Huanyu Zhang. Differentially private Assouad, Fano, and Le Cam. In Vitaly Feldman, Katrina Ligett, and Sivan Sabato, editors, Proceedings of the 32nd International Conference on Algorithmic Learning Theory, volume 132 of Proceedings of Machine Learning Research, pages 48–78. PMLR, 16–19 Mar 2021.
  • [BA20] Victor-Emmanuel Brunel and Marco Avella-Medina. Propose, test, release: Differentially private estimation with high probability. CoRR, abs/2002.08774, 2020.
  • [Bar96] Yair Bartal. Probabilistic approximations of metric spaces and its algorithmic applications. In 37th Annual Symposium on Foundations of Computer Science, FOCS ’96, Burlington, Vermont, USA, 14-16 October, 1996, pages 184–193. IEEE Computer Society, 1996.
  • [BBDS13] Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. Differentially private data analysis of social networks via restricted sensitivity. In Robert D. Kleinberg, editor, Innovations in Theoretical Computer Science, ITCS ’13, Berkeley, CA, USA, January 9-12, 2013, pages 87–96. ACM, 2013.
  • [BG14] Emmanuel Boissard and Thibaut Le Gouic. On the mean speed of convergence of empirical and occupation measures in Wasserstein distance. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 50(2):539 – 563, 2014.
  • [BGS+21] Gavin Brown, Marco Gaboardi, Adam D. Smith, Jonathan R. Ullman, and Lydia Zakynthinou. Covariance-aware private mean estimation without private covariance estimation. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 7950–7964, 2021.
  • [BHS23] Gavin Brown, Samuel B. Hopkins, and Adam Smith. Fast, sample-efficient, affine-invariant private mean and covariance estimation for subgaussian distributions, 2023.
  • [BKM+21] Eugene Bagdasaryan, Peter Kairouz, Stefan Mellem, Adrià Gascón, Kallista Bonawitz, Deborah Estrin, and Marco Gruteser. Towards sparse federated analytics: Location heatmaps under distributed differential privacy with secure aggregation. arXiv preprint arXiv:2111.02356, 2021.
  • [BKSW21] Mark Bun, Gautam Kamath, Thomas Steinke, and Zhiwei Steven Wu. Private hypothesis selection. IEEE Trans. Inf. Theory, 67(3):1981–2000, 2021.
  • [BL19] Sergey G. Bobkov and Michel Ledoux. One-dimensional empirical measures, order statistics, and kantorovich transport distances. Memoirs of the American Mathematical Society, 2019.
  • [BM23] Daniel Bartl and Shahar Mendelson. On a variance dependent Dvoretzky-Kiefer-Wolfowitz inequality. arXiv e-prints, page arXiv:2308.04757, August 2023.
  • [BNNR09] Khanh Do Ba, Huy L. Nguyen, Huy Ngoc Nguyen, and Ronitt Rubinfeld. Sublinear time algorithms for earth mover’s distance. Theory of Computing Systems, 48:428–442, 2009.
  • [BNS16] Amos Beimel, Kobbi Nissim, and Uri Stemmer. Private learning and sanitization: Pure vs. approximate differential privacy. Theory Comput., 12(1):1–61, 2016.
  • [BNSV15] Mark Bun, Kobbi Nissim, Uri Stemmer, and Salil P. Vadhan. Differentially private release and learning of threshold functions. CoRR, abs/1504.07553, 2015.
  • [BS19] Mark Bun and Thomas Steinke. Average-case averages: Private algorithms for smooth sensitivity and mean estimation. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 181–191, 2019.
  • [BSV22] March Boedihardjo, Thomas Strohmer, and Roman Vershynin. Private measures, random walks, and synthetic data, 2022.
  • [BUV18] Mark Bun, Jonathan R. Ullman, and Salil P. Vadhan. Fingerprinting codes and the price of approximate differential privacy. SIAM J. Comput., 47(5):1888–1938, 2018.
  • [BY02] Z. Bar-Yossef. The Complexity of Massive Data Set Computations. University of California, Berkeley, 2002.
  • [Can17] Clément L. Canonne. A short note on distinguishing discrete distributions., 2017.
  • [CB22] Graham Cormode and Akash Bharadwaj. Sample-and-threshold differential privacy: Histograms and applications. In International Conference on Artificial Intelligence and Statistics, pages 1420–1431. PMLR, 2022.
  • [CCD+23] Karan Chadha, Junye Chen, John Duchi, Vitaly Feldman, Hanieh Hashemi, Omid Javidbakht, Audra McMillan, and Kunal Talwar. Differentially private heavy hitter detection using federated analytics, 2023.
  • [CD20] Rachel Cummings and David Durfee. Individual sensitivity preprocessing for data privacy. In Shuchi Chawla, editor, Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 528–547. SIAM, 2020.
  • [CDK17] Bryan Cai, Constantinos Daskalakis, and Gautam Kamath. Priv’it: Private and sample efficient identity testing. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 635–644. PMLR, 2017.
  • [CKM+19] Clément L. Canonne, Gautam Kamath, Audra McMillan, Adam D. Smith, and Jonathan R. Ullman. The structure of optimal private tests for simple hypotheses. In Moses Charikar and Edith Cohen, editors, Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, Phoenix, AZ, USA, June 23-26, 2019, pages 310–321. ACM, 2019.
  • [CL15] T. Tony Cai and Mark G. Low. A framework for estimation of convex functions. Statistica Sinica, 25(2):423–456, 2015.
  • [CLN+23] Edith Cohen, Xin Lyu, Jelani Nelson, Tamás Sarlós, and Uri Stemmer. Optimal differentially private learning of thresholds and quasi-concave optimization. In Barna Saha and Rocco A. Servedio, editors, Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC 2023, Orlando, FL, USA, June 20-23, 2023, pages 472–482. ACM, 2023.
  • [CR12] Guillermo D. Cañas and Lorenzo Rosasco. Learning probability measures with respect to optimal transport metrics. In Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States, pages 2501–2509, 2012.
  • [CSS11] T.-H. Hubert Chan, Elaine Shi, and Dawn Song. Private and continual release of statistics. ACM Trans. Inf. Syst. Secur., 14(3):26:1–26:24, 2011.
  • [CWZ19] T. Tony Cai, Yichen Wang, and Linjun Zhang. The cost of privacy: Optimal rates of convergence for parameter estimation with differential privacy. CoRR, abs/1902.04495, 2019.
  • [CZ13] Shixi Chen and Shuigeng Zhou. Recursive mechanism: towards node differential privacy and unrestricted joins. In Kenneth A. Ross, Divesh Srivastava, and Dimitris Papadias, editors, Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22-27, 2013, pages 653–664. ACM, 2013.
  • [DHS15] Ilias Diakonikolas, Moritz Hardt, and Ludwig Schmidt. Differentially private learning of structured discrete distributions. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 2566–2574, 2015.
  • [DKM+06] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In International Conference on the Theory and Applications of Cryptographic Techniques, EUROCRYPT ’06, pages 486–503, St. Petersburg, Russia, 2006.
  • [DKSS23] Travis Dick, Alex Kulesza, Ziteng Sun, and Ananda Theertha Suresh. Subset-based instance optimality in private estimation. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 7992–8014. PMLR, 2023.
  • [DL91] David L. Donoho and Richard C. Liu. Geometrizing Rates of Convergence, II. The Annals of Statistics, 19(2):633 – 667, 1991.
  • [DL09] Cynthia Dwork and **g Lei. Differential privacy and robust statistics. In Michael Mitzenmacher, editor, Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009, pages 371–380. ACM, 2009.
  • [DLSV23] Trung Dang, Jasper C.H. Lee, Maoyuan Song, and Paul Valiant. Optimality in mean estimation: Beyond worst-case, beyond sub-gaussian, and beyond $1+\alpha$ moments. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  • [DMNS17] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. Journal of Privacy and Confidentiality, 7(3):17–51, 2017.
  • [DNPR10] Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N. Rothblum. Differential privacy under continual observation. In Leonard J. Schulman, editor, Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5-8 June 2010, pages 715–724. ACM, 2010.
  • [DR18] John C. Duchi and Feng Ruan. The right complexity measure in locally private estimation: It is not the fisher information. CoRR, abs/1806.05756, 2018.
  • [DSS11] Steffen Dereich, Michael Scheutzow, and Reik Schottstedt. Constructive quantization: Approximation by empirical measures. Annales De L Institut Henri Poincare-probabilites Et Statistiques, 49:1183–1203, 2011.
  • [Dud69] R. M. Dudley. The speed of mean glivenko-cantelli convergence. The Annals of Mathematical Statistics, 40(1):40–50, 1969.
  • [DY95] Vladimir Dobric and Joseph E. Yukich. Asymptotics for transportation cost in high dimensions. Journal of Theoretical Probability, 8:97–118, 1995.
  • [FG15] Nicolas Fournier and Arnaud Guillin. On the rate of convergence in Wasserstein distance of the empirical measure. Probability Theory and Related Fields, 162(3-4):707, August 2015.
  • [FLN01] Ronald Fagin, Amnon Lotem, and Moni Naor. Optimal aggregation algorithms for middleware. In Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’01, page 102–113, New York, NY, USA, 2001. Association for Computing Machinery.
  • [Fou23] Nicolas Fournier. Convergence of the empirical measure in expected wasserstein distance: non asymptotic explicit bounds in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, 2023.
  • [FRT03] Jittat Fakcharoenphol, Satish Rao, and Kunal Talwar. A tight bound on approximating arbitrary metrics by tree metrics. In Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing, STOC ’03, page 448–455, New York, NY, USA, 2003. Association for Computing Machinery.
  • [GHK+23] Badih Ghazi, Junfeng He, Kai Kohlhoff, Ravi Kumar, Pasin Manurangsi, Vidhya Navalpakkam, and Nachiappan Valliappan. Differentially private heatmaps. In Brian Williams, Yiling Chen, and Jennifer Neville, editors, Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, pages 7696–7704. AAAI Press, 2023.
  • [GJK21] Jennifer Gillenwater, Matthew Joseph, and Alex Kulesza. Differentially private quantiles. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 3713–3722. PMLR, 2021.
  • [GKN20] Tomer Grossman, Ilan Komargodski, and Moni Naor. Instance Complexity and Unlabeled Certificates in the Decision Tree Model. In Thomas Vidick, editor, 11th Innovations in Theoretical Computer Science Conference (ITCS 2020), volume 151 of Leibniz International Proceedings in Informatics (LIPIcs), pages 56:1–56:38, Dagstuhl, Germany, 2020. Schloss Dagstuhl – Leibniz-Zentrum für Informatik.
  • [HKM22] Samuel B. Hopkins, Gautam Kamath, and Mahbod Majid. Efficient mean estimation with pure differential privacy via a sum-of-squares exponential mechanism. In Stefano Leonardi and Anupam Gupta, editors, STOC ’22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20 - 24, 2022, pages 1406–1417. ACM, 2022.
  • [HKMN23] Samuel B. Hopkins, Gautam Kamath, Mahbod Majid, and Shyam Narayanan. Robustness implies privacy in statistical estimation. In Barna Saha and Rocco A. Servedio, editors, Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC 2023, Orlando, FL, USA, June 20-23, 2023, pages 497–506. ACM, 2023.
  • [HLY21] Ziyue Huang, Yuting Liang, and Ke Yi. Instance-optimal mean estimation under differential privacy. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 25993–26004, 2021.
  • [HO19] Yi Hao and Alon Orlitsky. Doubly-competitive distribution estimation. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 2614–2623. PMLR, 2019.
  • [HVZ23] Yiyun He, Roman Vershynin, and Yizhe Zhu. Algorithmically effective differentially private synthetic data, 2023.
  • [KDH23] Rohith Kuditipudi, John C. Duchi, and Saminul Haque. A pretty fast algorithm for adaptive private mean estimation. In Gergely Neu and Lorenzo Rosasco, editors, The Thirty Sixth Annual Conference on Learning Theory, COLT 2023, 12-15 July 2023, Bangalore, India, volume 195 of Proceedings of Machine Learning Research, pages 2511–2551. PMLR, 2023.
  • [KLM+20] Haim Kaplan, Katrina Ligett, Yishay Mansour, Moni Naor, and Uri Stemmer. Privately learning thresholds: Closing the exponential gap. In Jacob D. Abernethy and Shivani Agarwal, editors, Conference on Learning Theory, COLT 2020, 9-12 July 2020, Virtual Event [Graz, Austria], volume 125 of Proceedings of Machine Learning Research, pages 2263–2285. PMLR, 2020.
  • [KLSU19] Gautam Kamath, Jerry Li, Vikrant Singhal, and Jonathan R. Ullman. Privately learning high-dimensional distributions. In Alina Beygelzimer and Daniel Hsu, editors, Conference on Learning Theory, COLT 2019, 25-28 June 2019, Phoenix, AZ, USA, volume 99 of Proceedings of Machine Learning Research, pages 1853–1902. PMLR, 2019.
  • [KMS22a] Gautam Kamath, Argyris Mouzakis, and Vikrant Singhal. New lower bounds for private estimation and a generalized fingerprinting lemma. In NeurIPS, 2022.
  • [KMS+22b] Gautam Kamath, Argyris Mouzakis, Vikrant Singhal, Thomas Steinke, and Jonathan R. Ullman. A private and computationally-efficient estimator for unbounded gaussians. In Po-Ling Loh and Maxim Raginsky, editors, Conference on Learning Theory, 2-5 July 2022, London, UK, volume 178 of Proceedings of Machine Learning Research, pages 544–572. PMLR, 2022.
  • [KMV22] Pravesh Kothari, Pasin Manurangsi, and Ameya Velingker. Private robust estimation by stabilizing convex relaxations. In Po-Ling Loh and Maxim Raginsky, editors, Conference on Learning Theory, 2-5 July 2022, London, UK, volume 178 of Proceedings of Machine Learning Research, pages 723–777. PMLR, 2022.
  • [KNRS13] Shiva Prasad Kasiviswanathan, Kobbi Nissim, Sofya Raskhodnikova, and Adam D. Smith. Analyzing graphs with node differential privacy. In Amit Sahai, editor, Theory of Cryptography - 10th Theory of Cryptography Conference, TCC 2013, Tokyo, Japan, March 3-6, 2013. Proceedings, volume 7785 of Lecture Notes in Computer Science, pages 457–476. Springer, 2013.
  • [KSS22] Haim Kaplan, Shachar Schnapp, and Uri Stemmer. Differentially private approximate quantiles. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato, editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 10751–10761. PMLR, 2022.
  • [KSSU19] Gautam Kamath, Or Sheffet, Vikrant Singhal, and Jonathan R. Ullman. Differentially private algorithms for learning mixtures of separated gaussians. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 168–180, 2019.
  • [KSU20] Gautam Kamath, Vikrant Singhal, and Jonathan R. Ullman. Private mean estimation of heavy-tailed distributions. In Jacob D. Abernethy and Shivani Agarwal, editors, Conference on Learning Theory, COLT 2020, 9-12 July 2020, Virtual Event [Graz, Austria], volume 125 of Proceedings of Machine Learning Research, pages 2204–2235. PMLR, 2020.
  • [KU20] Gautam Kamath and Jonathan R. Ullman. A primer on private statistics. CoRR, abs/2005.00010, 2020.
  • [KV18] Vishesh Karwa and Salil P. Vadhan. Finite sample differentially private confidence intervals. In Anna R. Karlin, editor, 9th Innovations in Theoretical Computer Science Conference, ITCS 2018, January 11-14, 2018, Cambridge, MA, USA, volume 94 of LIPIcs, pages 44:1–44:9. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018.
  • [Lei20] **g Lei. Convergence and concentration of empirical measures under Wasserstein distance in unbounded functional spaces. Bernoulli, 26(1):767 – 798, 2020.
  • [LKO22] Xiyang Liu, Weihao Kong, and Sewoong Oh. Differential privacy and robust statistics in high dimensions. In Po-Ling Loh and Maxim Raginsky, editors, Conference on Learning Theory, 2-5 July 2022, London, UK, volume 178 of Proceedings of Machine Learning Research, pages 1167–1246. PMLR, 2022.
  • [MJT+22] Audra McMillan, Omid Javidbakht, Kunal Talwar, Elliot Briggs, Mike Chatzidakis, Junye Chen, John Duchi, Vitaly Feldman, Yusuf Goren, Michael Hesse, Vojta **a, Anil Katti, Albert Liu, Cheney Lyford, Joey Meyer, Alex Palmer, David Park, Wonhee Park, Gianni Parsa, Paul Pelzl, Rehan Rishi, Congzheng Song, Shan Wang, and Shundong Zhou. Private federated statistics in an interactive setting. arXiv preprint arXiv:2211.10082, 2022.
  • [MSU22] Audra McMillan, Adam D. Smith, and Jonathan R. Ullman. Instance-optimal differentially private estimation. CoRR, abs/2210.15819, 2022.
  • [Nar23] Shyam Narayanan. Better and simpler lower bounds for differentially private statistical estimation. CoRR, abs/2310.06289, 2023.
  • [NRS07] Kobbi Nissim, Sofya Raskhodnikova, and Adam D. Smith. Smooth sensitivity and sampling in private data analysis. In David S. Johnson and Uriel Feige, editors, Proceedings of the 39th Annual ACM Symposium on Theory of Computing, San Diego, California, USA, June 11-13, 2007, pages 75–84. ACM, 2007.
  • [NWB19] Jonathan Niles-Weed and Quentin Berthet. Minimax estimation of smooth densities in wasserstein distance. The Annals of Statistics, 2019.
  • [OS15] Alon Orlitsky and Ananda Theertha Suresh. Competitive distribution estimation: Why is good-turing good. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 2143–2151, 2015.
  • [QYL12] Wahbeh Qardaji, Weining Yang, and Ninghui Li. Differentially private grids for geospatial data. Proceedings - International Conference on Data Engineering, 09 2012.
  • [Rou21] Tim Roughgarden. Beyond the Worst-Case Analysis of Algorithms. Cambridge University Press, 2021.
  • [RS16] Sofya Raskhodnikova and Adam D. Smith. Lipschitz extensions for node-private graph statistics and the generalized exponential mechanism. In Irit Dinur, editor, IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9-11 October 2016, Hyatt Regency, New Brunswick, New Jersey, USA, pages 495–504. IEEE Computer Society, 2016.
  • [Sin23] Vikrant Singhal. A polynomial time, pure differentially private estimator for binary product distributions. CoRR, abs/2304.06787, 2023.
  • [SP19] Shashank Singh and Barnabás Póczos. Minimax distribution estimation in wasserstein distance, 2019.
  • [TCK+22] Eliad Tsfadia, Edith Cohen, Haim Kaplan, Yishay Mansour, and Uri Stemmer. Friendlycore: Practical differentially private aggregation. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato, editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 21828–21863. PMLR, 2022.
  • [vdV97] A. W. van der Vaart. Superefficiency, pages 397–410. Springer New York, New York, NY, 1997.
  • [Vov09] Vladimir Vovk. Superefficiency from the Vantage Point of Computability. Statistical Science, 24(1):73 – 86, 2009.
  • [VV16] Gregory Valiant and Paul Valiant. Instance optimal learning of discrete distributions. In Daniel Wichs and Yishay Mansour, editors, Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016, pages 142–155. ACM, 2016.
  • [WB19] Jonathan Weed and Francis Bach. Sharp asymptotic and finite-sample rates of convergence of empirical measures in wasserstein distance. Bernoulli, 25(4 A):2620–2648, 2019.
  • [Wol65] J. Wolfowitz. Asymptotic efficiency of the maximum likelihood estimator. Theory of Probability & Its Applications, 10(2):247–260, 1965.
  • [ZKM+20] Wennan Zhu, Peter Kairouz, Brendan McMahan, Haicheng Sun, and Wei Li. Federated heavy hitters discovery with differential privacy. In International Conference on Artificial Intelligence and Statistics, pages 3837–3847. PMLR, 2020.
  • [ZXX16] Jun Zhang, Xiaokui Xiao, and Xing Xie. Privtree: A differentially private algorithm for hierarchical decompositions. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD ’16, page 155–170, New York, NY, USA, 2016. Association for Computing Machinery.

Appendix A Preliminaries

A.1 Distribution Distances

A number of other distances between distributions are used in this work.

Definition A.1 (KL𝐾𝐿KLitalic_K italic_L-divergence).

Given two distributions P𝑃Pitalic_P and Q𝑄Qitalic_Q with supp(P)supp(Q)𝑠𝑢𝑝𝑝𝑃𝑠𝑢𝑝𝑝𝑄supp(P)\subseteq supp(Q)italic_s italic_u italic_p italic_p ( italic_P ) ⊆ italic_s italic_u italic_p italic_p ( italic_Q ), the KL divergence KL(P,Q)=tsupp(P)P(t)lnP(t)Q(t)𝐾𝐿𝑃𝑄subscript𝑡𝑠𝑢𝑝𝑝𝑃𝑃𝑡𝑃𝑡𝑄𝑡KL(P,Q)=\sum_{t\in supp(P)}P(t)\ln\frac{P(t)}{Q(t)}italic_K italic_L ( italic_P , italic_Q ) = ∑ start_POSTSUBSCRIPT italic_t ∈ italic_s italic_u italic_p italic_p ( italic_P ) end_POSTSUBSCRIPT italic_P ( italic_t ) roman_ln divide start_ARG italic_P ( italic_t ) end_ARG start_ARG italic_Q ( italic_t ) end_ARG, if P𝑃Pitalic_P and Q𝑄Qitalic_Q are discrete, and KL(P,Q)=t:fP(t)>0fP(t)lnfP(t)fQ(t)dt𝐾𝐿𝑃𝑄subscript:𝑡subscript𝑓𝑃𝑡0subscript𝑓𝑃𝑡subscript𝑓𝑃𝑡subscript𝑓𝑄𝑡𝑑𝑡KL(P,Q)=\int_{t\in\mathbb{R}:f_{P}(t)>0}f_{P}(t)\ln\frac{f_{P}(t)}{f_{Q}(t)}dtitalic_K italic_L ( italic_P , italic_Q ) = ∫ start_POSTSUBSCRIPT italic_t ∈ blackboard_R : italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) > 0 end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) roman_ln divide start_ARG italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) end_ARG italic_d italic_t if P𝑃Pitalic_P and Q𝑄Qitalic_Q are distributions on \mathbb{R}blackboard_R, and have density functions. If supp(P)supp(Q)not-subset-of-or-equals𝑠𝑢𝑝𝑝𝑃𝑠𝑢𝑝𝑝𝑄supp(P)\not\subseteq supp(Q)italic_s italic_u italic_p italic_p ( italic_P ) ⊈ italic_s italic_u italic_p italic_p ( italic_Q ), then KL(P,Q)=𝐾𝐿𝑃𝑄KL(P,Q)=\inftyitalic_K italic_L ( italic_P , italic_Q ) = ∞.

Definition A.2 (Hellinger distance).

Given two distributions P𝑃Pitalic_P and Q𝑄Qitalic_Q, the Hellinger distance H(P,Q)=12PQ2𝐻𝑃𝑄12subscriptnorm𝑃𝑄2H(P,Q)=\frac{1}{\sqrt{2}}\|\sqrt{P}-\sqrt{Q}\|_{2}italic_H ( italic_P , italic_Q ) = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 end_ARG end_ARG ∥ square-root start_ARG italic_P end_ARG - square-root start_ARG italic_Q end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (where we think of P𝑃Pitalic_P and Q𝑄Qitalic_Q as vectors representing the probability masses, and the square root being component-wise.), if P𝑃Pitalic_P and Q𝑄Qitalic_Q are discrete. If P𝑃Pitalic_P and Q𝑄Qitalic_Q are distributions on \mathbb{R}blackboard_R, and have density functions, then H(P,Q)=12t:fP(t)>0(fP(t)fQ(t))2𝑑t𝐻𝑃𝑄12subscript:𝑡subscript𝑓𝑃𝑡0superscriptsubscript𝑓𝑃𝑡subscript𝑓𝑄𝑡2differential-d𝑡H(P,Q)=\frac{1}{\sqrt{2}}\sqrt{\int_{t\in\mathbb{R}:f_{P}(t)>0}(\sqrt{f_{P}(t)% }-\sqrt{f_{Q}(t)})^{2}dt}italic_H ( italic_P , italic_Q ) = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 end_ARG end_ARG square-root start_ARG ∫ start_POSTSUBSCRIPT italic_t ∈ blackboard_R : italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) > 0 end_POSTSUBSCRIPT ( square-root start_ARG italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) end_ARG - square-root start_ARG italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d italic_t end_ARG.

Note that we use H2(P,Q)superscript𝐻2𝑃𝑄H^{2}(P,Q)italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P , italic_Q ) to represent the squared Hellinger distance. Next, we define total variation distance, which will come up in our high-dimensional results.

Definition A.3 (Total Variation distance).

Given two discrete distributions P𝑃Pitalic_P and Q𝑄Qitalic_Q, the Total Variation distance TV(P,Q)=12PQ1,𝑇𝑉𝑃𝑄12subscriptnorm𝑃𝑄1TV(P,Q)=\frac{1}{2}\|P-Q\|_{1},italic_T italic_V ( italic_P , italic_Q ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ italic_P - italic_Q ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , (where we think of P𝑃Pitalic_P and Q𝑄Qitalic_Q as vectors representing the probability masses). More generally, for any two probability measures P𝑃Pitalic_P and Q𝑄Qitalic_Q defined on (Ω,)Ω(\Omega,\mathcal{F})( roman_Ω , caligraphic_F ), the total variation distance is defined as supA|P(A)Q(A)|subscriptsupremum𝐴𝑃𝐴𝑄𝐴\sup_{A\in\mathcal{F}}|P(A)-Q(A)|roman_sup start_POSTSUBSCRIPT italic_A ∈ caligraphic_F end_POSTSUBSCRIPT | italic_P ( italic_A ) - italic_Q ( italic_A ) | where P(A)𝑃𝐴P(A)italic_P ( italic_A ) represents the probability of A𝐴Aitalic_A under measure P𝑃Pitalic_P and likewise for Q𝑄Qitalic_Q.

We use the following relationship between Hellinger distance and KL divergence.

Lemma A.4.

For all distributions P,Q𝑃𝑄P,Qitalic_P , italic_Q such that KL-divergence of P,Q𝑃𝑄P,Qitalic_P , italic_Q is well defined, we have that

H2(P,Q)KL(P,Q),H2(P,Q)TV(P,Q)formulae-sequencesuperscript𝐻2𝑃𝑄𝐾𝐿𝑃𝑄superscript𝐻2𝑃𝑄𝑇𝑉𝑃𝑄H^{2}(P,Q)\leq KL(P,Q),\hskip 5.69054ptH^{2}(P,Q)\leq TV(P,Q)italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P , italic_Q ) ≤ italic_K italic_L ( italic_P , italic_Q ) , italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P , italic_Q ) ≤ italic_T italic_V ( italic_P , italic_Q ) (10)

A.2 Differential Privacy

Lemma A.5 (Post-Processing [DMNS17]).

If Algorithm 𝒜:𝒳n𝒴:𝒜superscript𝒳𝑛𝒴\mathcal{A}:\mathcal{X}^{n}\rightarrow\mathcal{Y}caligraphic_A : caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → caligraphic_Y is (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-differentially private, and :𝒴𝒵:𝒴𝒵\mathcal{B}:\mathcal{Y}\rightarrow\mathcal{Z}caligraphic_B : caligraphic_Y → caligraphic_Z is any randomized function, then the algorithm 𝒜𝒜\mathcal{B}\circ\mathcal{A}caligraphic_B ∘ caligraphic_A is (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-differentially private.

Secondly, differential privacy is robust to adaptive composition.

Lemma A.6 (Composition of (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-differential privacy [DMNS17]).

If 𝒜𝒜\mathcal{A}caligraphic_A is an adaptive composition of m𝑚mitalic_m differentially private algorithms 𝒜1,,𝒜msubscript𝒜1subscript𝒜𝑚\mathcal{A}_{1},\ldots,\mathcal{A}_{m}caligraphic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , caligraphic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, where 𝒜jsubscript𝒜𝑗\mathcal{A}_{j}caligraphic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is (εj,δj)subscript𝜀𝑗subscript𝛿𝑗(\varepsilon_{j},\delta_{j})( italic_ε start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) differentially private, then 𝒜𝒜\mathcal{A}caligraphic_A is (jεj,jδj)subscript𝑗subscript𝜀𝑗subscript𝑗subscript𝛿𝑗\left(\sum_{j}\varepsilon_{j},\sum_{j}\delta_{j}\right)( ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT )-differentially private.

Finally, we discuss the Laplace mechanism, which we will use in one of our algorithms.

Definition A.7 (1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-Sensitivity).

The 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-sensitivity of a function f:𝒳nd:𝑓superscript𝒳𝑛superscript𝑑f:\mathcal{X}^{n}\rightarrow\mathbb{R}^{d}italic_f : caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is

Δf=max𝐱,𝐱𝒳ndham(𝐱,𝐱)1f(𝐱)f(𝐱)1.subscriptΔ𝑓subscript𝐱superscript𝐱superscript𝒳𝑛subscript𝑑𝑎𝑚𝐱superscript𝐱1subscriptnorm𝑓𝐱𝑓superscript𝐱1\Delta_{f}=\max_{\begin{subarray}{c}{\bf x},{\bf x}^{\prime}\in\mathcal{X}^{n}% \\ d_{ham}({\bf x},{\bf x}^{\prime})\leq 1\end{subarray}}\|f({\bf x})-f({\bf x}^{% \prime})\|_{1}.roman_Δ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT start_ARG start_ROW start_CELL bold_x , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_d start_POSTSUBSCRIPT italic_h italic_a italic_m end_POSTSUBSCRIPT ( bold_x , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ≤ 1 end_CELL end_ROW end_ARG end_POSTSUBSCRIPT ∥ italic_f ( bold_x ) - italic_f ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT .
Lemma A.8 (Laplace Mechanism).

Let f:𝒳nd:𝑓superscript𝒳𝑛superscript𝑑f:\mathcal{X}^{n}\rightarrow\mathbb{R}^{d}italic_f : caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be a function with 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-sensitivity ΔfsubscriptΔ𝑓\Delta_{f}roman_Δ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT. Then the Laplacian mechanism is algorithm

𝒜f(𝐱)=f(𝐱)+(Z1,,Zd),subscript𝒜𝑓𝐱𝑓𝐱subscript𝑍1subscript𝑍𝑑\mathcal{A}_{f}({\bf x})=f({\bf x})+(Z_{1},\ldots,Z_{d}),caligraphic_A start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( bold_x ) = italic_f ( bold_x ) + ( italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ,

where ZiLap(Δfε)similar-tosubscript𝑍𝑖𝐿𝑎𝑝subscriptΔ𝑓𝜀Z_{i}\sim Lap\left(\frac{\Delta_{f}}{\varepsilon}\right)italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ italic_L italic_a italic_p ( divide start_ARG roman_Δ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_ARG start_ARG italic_ε end_ARG ) (and Z1,,Zdsubscript𝑍1subscript𝑍𝑑Z_{1},\dots,Z_{d}italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT are mutually independent). Algorithm 𝒜fsubscript𝒜𝑓\mathcal{A}_{f}caligraphic_A start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT is ε𝜀\varepsilonitalic_ε-DP.

Appendix B Experiment Details

Below we describe the experiment referenced in the introduction.

The distribution: We have taken a distribution on [0,999]0999[0,999][ 0 , 999 ], which is concentrated on two points 430430430430 and 440440440440, with p430=13subscript𝑝43013p_{430}=\frac{1}{3}italic_p start_POSTSUBSCRIPT 430 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 3 end_ARG and p440=23subscript𝑝44023p_{440}=\frac{2}{3}italic_p start_POSTSUBSCRIPT 440 end_POSTSUBSCRIPT = divide start_ARG 2 end_ARG start_ARG 3 end_ARG. These algorithms have been run with n=1600𝑛1600n=1600italic_n = 1600 samples from this distribution.

Minimax Optimal Algorithm: The minimax-optimal algorithm here is the algorithm PSMM from  [HVZ23] that considers a fixed partitioning of the interval into Ω(m1d)Ωsuperscript𝑚1𝑑\Omega(m^{\frac{1}{d}})roman_Ω ( italic_m start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_d end_ARG end_POSTSUPERSCRIPT ) equal intervals and places the empirical mass in each interval on an arbitrary point in each interval. Here we consider this algorithm with ε=𝜀\varepsilon=\inftyitalic_ε = ∞, so that no noise is added. We have run it here with K=40𝐾40K=40italic_K = 40 buckets.

Instance-optimal Algorithm: The instance-optimal algorithm finds k𝑘kitalic_k quantiles as in Algorithm 5. In this particular implementation, we used the recursive exponential mechanism of  [KSS22], but we expect other quantile algorithms would work similarly. In this particular case, we use k=10𝑘10k=10italic_k = 10 quantiles with ε=1𝜀1\varepsilon=1italic_ε = 1.

Appendix C Appendix for Section 5

See 5.6

Proof of Theorem 5.6.

Given a distribution P𝑃Pitalic_P, let

𝒜P=argmin𝒜 is ε-DPmaxQ𝒩(P)𝔼DPn[𝒲(P,𝒜(D))]subscriptsuperscript𝒜𝑃subscript𝒜 is 𝜀-DPsubscript𝑄𝒩𝑃subscript𝔼similar-to𝐷superscript𝑃𝑛delimited-[]𝒲𝑃𝒜𝐷\mathcal{A}^{*}_{P}=\arg\min_{\mathcal{A}\text{ is }\varepsilon\text{-DP}}\max% _{Q\in\mathcal{N}(P)}\mathbb{E}_{D\sim P^{n}}[\mathcal{W}(P,\mathcal{A}(D))]caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT caligraphic_A is italic_ε -DP end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_Q ∈ caligraphic_N ( italic_P ) end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_D ∼ italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ caligraphic_W ( italic_P , caligraphic_A ( italic_D ) ) ]

so 𝒩,n,ε(P)=maxQ𝒩(P)𝔼DQn[𝒲(P,𝒜P(D))].subscript𝒩𝑛𝜀𝑃subscript𝑄𝒩𝑃subscript𝔼similar-to𝐷superscript𝑄𝑛delimited-[]𝒲𝑃subscriptsuperscript𝒜𝑃𝐷\mathcal{R}_{\mathcal{N},n,\varepsilon}(P)=\max_{Q\in\mathcal{N}(P)}\mathbb{E}% _{D\sim Q^{n}}[\mathcal{W}(P,\mathcal{A}^{*}_{P}(D))].caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n , italic_ε end_POSTSUBSCRIPT ( italic_P ) = roman_max start_POSTSUBSCRIPT italic_Q ∈ caligraphic_N ( italic_P ) end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_D ∼ italic_Q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ caligraphic_W ( italic_P , caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_D ) ) ] . Let [0pt]delimited-[]0𝑝𝑡\ell\in[0pt]roman_ℓ ∈ [ 0 italic_p italic_t ]. We want to define an algorithm 𝒜Psubscriptsuperscript𝒜subscript𝑃\mathcal{A}^{*}_{P_{\ell}}caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT on the distributions in 𝒩(P)subscript𝒩subscript𝑃\mathcal{N}_{\ell}(P_{\ell})caligraphic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) that achieves maximum error rate 1r𝒩,n,ε(P)1subscript𝑟subscript𝒩𝑛𝜀𝑃\frac{1}{r_{\ell}}\mathcal{R}_{\mathcal{N},n,\varepsilon}(P)divide start_ARG 1 end_ARG start_ARG italic_r start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n , italic_ε end_POSTSUBSCRIPT ( italic_P ). Define a randomised function gPsubscript𝑔𝑃g_{P}italic_g start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT which given a node νsubscript𝜈\nu_{\ell}italic_ν start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT at level \ellroman_ℓ, gP(ν)subscript𝑔𝑃subscript𝜈g_{P}(\nu_{\ell})italic_g start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) is sampled from the distribution P𝑃Pitalic_P restricted to the leaf nodes that are children of νsubscript𝜈\nu_{\ell}italic_ν start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT. Given a set of nodes at level \ellroman_ℓ, define gP(D)subscript𝑔𝑃𝐷g_{P}(D)italic_g start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_D ) to be the set where gDsubscript𝑔𝐷g_{D}italic_g start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT is applied to each set element individually. Then define 𝒜P(D)=(𝒜P(gP(D)))subscriptsuperscript𝒜subscript𝑃𝐷subscriptsubscriptsuperscript𝒜𝑃subscript𝑔𝑃𝐷\mathcal{A}^{*}_{P_{\ell}}(D)=(\mathcal{A}^{*}_{P}(g_{P}(D)))_{\ell}caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_D ) = ( caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_g start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_D ) ) ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT. Since gPsubscript𝑔𝑃g_{P}italic_g start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT is applied individually to each element in D𝐷Ditalic_D, 𝒜Psubscriptsuperscript𝒜subscript𝑃\mathcal{A}^{*}_{P_{\ell}}caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT is ε𝜀\varepsilonitalic_ε-DP.

Given a distribution Q𝒩(P)superscript𝑄subscript𝒩subscript𝑃Q^{\ell}\in\mathcal{N}_{\ell}(P_{\ell})italic_Q start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ∈ caligraphic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ), define a distribution Q𝑄Qitalic_Q on the leaves of the tree as follows:

Q(ν)=Q(ν)P(ν)P(ν),𝑄𝜈superscript𝑄subscript𝜈subscript𝑃subscript𝜈𝑃𝜈Q(\nu)=\frac{Q^{\ell}(\nu_{\ell})}{P_{\ell}(\nu_{\ell})}*P(\nu),italic_Q ( italic_ν ) = divide start_ARG italic_Q start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_ν start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_ARG start_ARG italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_ARG ∗ italic_P ( italic_ν ) ,

where νsubscript𝜈\nu_{\ell}italic_ν start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT is the parent node of ν𝜈\nuitalic_ν at level \ellroman_ℓ. Note Q𝒩(P)𝑄𝒩𝑃Q\in\mathcal{N}(P)italic_Q ∈ caligraphic_N ( italic_P ), gP(Q)=Qsubscript𝑔𝑃superscript𝑄𝑄g_{P}(Q^{\ell})=Qitalic_g start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_Q start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_Q and Q=Qsubscript𝑄superscript𝑄Q_{\ell}=Q^{\ell}italic_Q start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = italic_Q start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT. Now,

TV(Q,𝒜P(D))𝑇𝑉superscript𝑄subscriptsuperscript𝒜subscript𝑃𝐷\displaystyle TV(Q^{\ell},\mathcal{A}^{*}_{P_{\ell}}(D))italic_T italic_V ( italic_Q start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_D ) ) =TV(Q,(𝒜P(gP(D)))\displaystyle=TV(Q_{\ell},(\mathcal{A}^{*}_{P}(g_{P}(D))_{\ell})= italic_T italic_V ( italic_Q start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , ( caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_g start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_D ) ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT )
1r[0pt]rTV(Q,(𝒜P(gP(D)))\displaystyle\leq\frac{1}{r_{\ell}}\sum_{\ell^{\prime}\in[0pt]}r_{\ell^{\prime% }}TV(Q_{\ell^{\prime}},(\mathcal{A}^{*}_{P}(g_{P}(D))_{\ell^{\prime}})≤ divide start_ARG 1 end_ARG start_ARG italic_r start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ [ 0 italic_p italic_t ] end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_T italic_V ( italic_Q start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , ( caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_g start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_D ) ) start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT )
=1r𝒲(Q,𝒜P(gP(D)))absent1subscript𝑟𝒲𝑄subscriptsuperscript𝒜𝑃subscript𝑔𝑃𝐷\displaystyle=\frac{1}{r_{\ell}}\mathcal{W}(Q,\mathcal{A}^{*}_{P}(g_{P}(D)))= divide start_ARG 1 end_ARG start_ARG italic_r start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG caligraphic_W ( italic_Q , caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_g start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_D ) ) )

where the first inequality follows by definition of 𝒜Psubscriptsuperscript𝒜subscript𝑃\mathcal{A}^{*}_{P_{\ell}}caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT and the fact Q=Qsubscript𝑄superscript𝑄Q_{\ell}=Q^{\ell}italic_Q start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = italic_Q start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT. Since gP(Q)=Qsubscript𝑔𝑃superscript𝑄𝑄g_{P}(Q^{\ell})=Qitalic_g start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_Q start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_Q, this implies that for all distributions in 𝒩(P)subscript𝒩subscript𝑃\mathcal{N}_{\ell}(P_{\ell})caligraphic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ),

𝔼DQ[TV(Q,𝒜P(D))]𝔼DQ[1r𝒲(Q,𝒜P(D))]1r𝒩,n,ε(P),subscript𝔼similar-to𝐷superscript𝑄delimited-[]𝑇𝑉superscript𝑄subscriptsuperscript𝒜subscript𝑃𝐷subscript𝔼similar-to𝐷𝑄delimited-[]1subscript𝑟𝒲𝑄subscriptsuperscript𝒜𝑃𝐷1superscript𝑟subscript𝒩𝑛𝜀𝑃\mathbb{E}_{D\sim Q^{\ell}}\left[TV(Q^{\ell},\mathcal{A}^{*}_{P_{\ell}}(D))% \right]\leq\mathbb{E}_{D\sim Q}\left[\frac{1}{r_{\ell}}\mathcal{W}(Q,\mathcal{% A}^{*}_{P}(D))\right]\leq\frac{1}{r^{\ell}}\mathcal{R}_{\mathcal{N},n,% \varepsilon}(P),blackboard_E start_POSTSUBSCRIPT italic_D ∼ italic_Q start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_T italic_V ( italic_Q start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_D ) ) ] ≤ blackboard_E start_POSTSUBSCRIPT italic_D ∼ italic_Q end_POSTSUBSCRIPT [ divide start_ARG 1 end_ARG start_ARG italic_r start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG caligraphic_W ( italic_Q , caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_D ) ) ] ≤ divide start_ARG 1 end_ARG start_ARG italic_r start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT end_ARG caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n , italic_ε end_POSTSUBSCRIPT ( italic_P ) ,

which implies for all levels \ellroman_ℓ, 𝒩,n,ε(P)1r𝒩,n,ε(P)subscriptsubscript𝒩𝑛𝜀subscript𝑃1superscript𝑟subscript𝒩𝑛𝜀𝑃\mathcal{R}_{\mathcal{N}_{\ell},n,\varepsilon}(P_{\ell})\leq\frac{1}{r^{\ell}}% \mathcal{R}_{\mathcal{N},n,\varepsilon}(P)caligraphic_R start_POSTSUBSCRIPT caligraphic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , italic_n , italic_ε end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) ≤ divide start_ARG 1 end_ARG start_ARG italic_r start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT end_ARG caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n , italic_ε end_POSTSUBSCRIPT ( italic_P ) and so we are done. ∎

See 5.8

Proof of Lemma 5.8.

We will follow the proof of Theorem 3 in [ASZ21]. Given an estimator 𝒜𝒜\mathcal{A}caligraphic_A, define a classifier 𝒜superscript𝒜\mathcal{A}^{*}caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT by projecting on the product of hypercubes so

𝒜(X)=argminu(k0×k1×)d(𝒜(X),θ(pu)).superscript𝒜𝑋subscript𝑢subscriptsubscript𝑘0subscriptsubscript𝑘1𝑑𝒜𝑋𝜃subscript𝑝𝑢\mathcal{A}^{*}(X)=\arg\min_{u\in(\mathcal{E}_{k_{0}}\times\mathcal{E}_{k_{1}}% \times\cdots)}d(\mathcal{A}(X),\theta(p_{u})).caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X ) = roman_arg roman_min start_POSTSUBSCRIPT italic_u ∈ ( caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × ⋯ ) end_POSTSUBSCRIPT italic_d ( caligraphic_A ( italic_X ) , italic_θ ( italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) ) .

By the triangle inequality and the definition of 𝒜superscript𝒜\mathcal{A}^{*}caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, for any p𝒱𝑝𝒱p\in\mathcal{V}italic_p ∈ caligraphic_V,

d(θ(p𝒜(X)),θ(p))d(𝒜(X),θ(p𝒜(X)))+d(𝒜(X),θ(p))2d(𝒜(X),θ(p)).𝑑𝜃subscript𝑝superscript𝒜𝑋𝜃𝑝𝑑𝒜𝑋𝜃subscript𝑝superscript𝒜𝑋𝑑𝒜𝑋𝜃𝑝2𝑑𝒜𝑋𝜃𝑝d(\theta(p_{\mathcal{A}^{*}(X)}),\theta(p))\leq d(\mathcal{A}(X),\theta(p_{% \mathcal{A}^{*}(X)}))+d(\mathcal{A}(X),\theta(p))\leq 2d(\mathcal{A}(X),\theta% (p)).italic_d ( italic_θ ( italic_p start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X ) end_POSTSUBSCRIPT ) , italic_θ ( italic_p ) ) ≤ italic_d ( caligraphic_A ( italic_X ) , italic_θ ( italic_p start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X ) end_POSTSUBSCRIPT ) ) + italic_d ( caligraphic_A ( italic_X ) , italic_θ ( italic_p ) ) ≤ 2 italic_d ( caligraphic_A ( italic_X ) , italic_θ ( italic_p ) ) .

Therefore, we can restrict to a lower bound on the performance of DP classifiers:

min𝒜 is (ε,δ)-DPmaxp𝒱𝒜,n(p)12min𝒜 is (ε,δ)-DPmaxp𝒱𝔼Xpn[d(θ(p𝒜(X)),θ(p))].subscript𝒜 is 𝜀𝛿-DPsubscript𝑝𝒱subscript𝒜𝑛𝑝12subscriptsuperscript𝒜 is 𝜀𝛿-DPsubscript𝑝𝒱subscript𝔼similar-to𝑋superscript𝑝𝑛delimited-[]𝑑𝜃subscript𝑝superscript𝒜𝑋𝜃𝑝\min_{\mathcal{A}\text{ is }(\varepsilon,\delta)\text{-DP}}\max_{p\in\mathcal{% V}}\mathcal{R}_{\mathcal{A},n}(p)\geq\frac{1}{2}\min_{\mathcal{A^{*}}\text{ is% }(\varepsilon,\delta)\text{-DP}}\max_{p\in\mathcal{V}}\mathbb{E}_{X\sim p^{n}% }[d(\theta(p_{\mathcal{A}^{*}(X)}),\theta(p))].roman_min start_POSTSUBSCRIPT caligraphic_A is ( italic_ε , italic_δ ) -DP end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_p ∈ caligraphic_V end_POSTSUBSCRIPT caligraphic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_p ) ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_min start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is ( italic_ε , italic_δ ) -DP end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_p ∈ caligraphic_V end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_X ∼ italic_p start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_d ( italic_θ ( italic_p start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X ) end_POSTSUBSCRIPT ) , italic_θ ( italic_p ) ) ] . (11)

Also, for any (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-DP classifier 𝒜superscript𝒜\mathcal{A}^{*}caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT,

maxp𝒱𝔼Xpn[d(θ(p𝒜(X)),θ(p))]subscript𝑝𝒱subscript𝔼similar-to𝑋superscript𝑝𝑛delimited-[]𝑑𝜃subscript𝑝superscript𝒜𝑋𝜃𝑝\displaystyle\max_{p\in\mathcal{V}}\mathbb{E}_{X\sim p^{n}}[d(\theta(p_{% \mathcal{A}^{*}(X)}),\theta(p))]roman_max start_POSTSUBSCRIPT italic_p ∈ caligraphic_V end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_X ∼ italic_p start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_d ( italic_θ ( italic_p start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X ) end_POSTSUBSCRIPT ) , italic_θ ( italic_p ) ) ] 1|𝒱|u(k0×k1×)𝔼Xpun[d(θ(p𝒜(X)),θ(pu))]absent1𝒱subscript𝑢subscriptsubscript𝑘0subscriptsubscript𝑘1subscript𝔼similar-to𝑋superscriptsubscript𝑝𝑢𝑛delimited-[]𝑑𝜃subscript𝑝superscript𝒜𝑋𝜃subscript𝑝𝑢\displaystyle\geq\frac{1}{|\mathcal{V}|}\sum_{u\in(\mathcal{E}_{k_{0}}\times% \mathcal{E}_{k_{1}}\times\cdots)}\mathbb{E}_{X\sim p_{u}^{n}}[d(\theta(p_{% \mathcal{A}^{*}(X)}),\theta(p_{u}))]≥ divide start_ARG 1 end_ARG start_ARG | caligraphic_V | end_ARG ∑ start_POSTSUBSCRIPT italic_u ∈ ( caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × ⋯ ) end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_X ∼ italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_d ( italic_θ ( italic_p start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X ) end_POSTSUBSCRIPT ) , italic_θ ( italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) ) ]
2|𝒱|sτsj=1ksuk0×k1×PrXpun(𝒜(X)jsujs),absent2𝒱subscript𝑠subscript𝜏𝑠superscriptsubscript𝑗1subscript𝑘𝑠subscript𝑢subscriptsubscript𝑘0subscriptsubscript𝑘1subscriptprobability𝑋superscriptsubscript𝑝𝑢𝑛superscript𝒜superscriptsubscript𝑋𝑗𝑠superscriptsubscript𝑢𝑗𝑠\displaystyle\geq\frac{2}{|\mathcal{V}|}\sum_{s}\tau_{s}\sum_{j=1}^{k_{s}}\sum% _{u\in\mathcal{E}_{k_{0}}\times\mathcal{E}_{k_{1}}\times\cdots}\Pr_{X\sum p_{u% }^{n}}(\mathcal{A}^{*}(X)_{j}^{s}\neq u_{j}^{s}),≥ divide start_ARG 2 end_ARG start_ARG | caligraphic_V | end_ARG ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × ⋯ end_POSTSUBSCRIPT roman_Pr start_POSTSUBSCRIPT italic_X ∑ italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ≠ italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ) ,

where the first inequality follows from the fact that the max is greater than the average, and the second follows from assumption (5). For each (s,j)𝑠𝑗(s,j)( italic_s , italic_j ) pair, we divide k0×k1×subscriptsubscript𝑘0subscriptsubscript𝑘1\mathcal{E}_{k_{0}}\times\mathcal{E}_{k_{1}}\times\cdotscaligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × ⋯ into two groups;

maxp𝒱subscript𝑝𝒱\displaystyle\max_{p\in\mathcal{V}}roman_max start_POSTSUBSCRIPT italic_p ∈ caligraphic_V end_POSTSUBSCRIPT 𝔼Xpn[d(θ(p𝒜(X)),θ(p))]subscript𝔼similar-to𝑋superscript𝑝𝑛delimited-[]𝑑𝜃subscript𝑝superscript𝒜𝑋𝜃𝑝\displaystyle\mathbb{E}_{X\sim p^{n}}[d(\theta(p_{\mathcal{A}^{*}(X)}),\theta(% p))]blackboard_E start_POSTSUBSCRIPT italic_X ∼ italic_p start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_d ( italic_θ ( italic_p start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X ) end_POSTSUBSCRIPT ) , italic_θ ( italic_p ) ) ]
2|𝒱|sτsj=1ks[u(k0×k1×)|ujs=+1PrXpun(𝒜(X)jsujs)+u(k0×k1×)|ujs=1PrXpu(𝒜(X)jsujs)]absent2𝒱subscript𝑠subscript𝜏𝑠superscriptsubscript𝑗1subscript𝑘𝑠delimited-[]subscript𝑢conditionalsubscriptsubscript𝑘0subscriptsubscript𝑘1superscriptsubscript𝑢𝑗𝑠1subscriptprobability𝑋superscriptsubscript𝑝𝑢𝑛superscript𝒜superscriptsubscript𝑋𝑗𝑠superscriptsubscript𝑢𝑗𝑠subscript𝑢conditionalsubscriptsubscript𝑘0subscriptsubscript𝑘1superscriptsubscript𝑢𝑗𝑠1subscriptprobabilitysimilar-to𝑋superscriptsubscript𝑝𝑢superscript𝒜superscriptsubscript𝑋𝑗𝑠superscriptsubscript𝑢𝑗𝑠\displaystyle\geq\frac{2}{|\mathcal{V}|}\sum_{s}\tau_{s}\sum_{j=1}^{k_{s}}% \left[\sum_{u\in(\mathcal{E}_{k_{0}}\times\mathcal{E}_{k_{1}}\times\cdots)\;|% \;u_{j}^{s}=+1}\Pr_{X\sum p_{u}^{n}}(\mathcal{A}^{*}(X)_{j}^{s}\neq u_{j}^{s})% +\sum_{u\in(\mathcal{E}_{k_{0}}\times\mathcal{E}_{k_{1}}\times\cdots)\;|\;u_{j% }^{s}=-1}\Pr_{X\sim p_{u}^{*}}(\mathcal{A}^{*}(X)_{j}^{s}\neq u_{j}^{s})\right]≥ divide start_ARG 2 end_ARG start_ARG | caligraphic_V | end_ARG ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_u ∈ ( caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × ⋯ ) | italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT = + 1 end_POSTSUBSCRIPT roman_Pr start_POSTSUBSCRIPT italic_X ∑ italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ≠ italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_u ∈ ( caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × ⋯ ) | italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT = - 1 end_POSTSUBSCRIPT roman_Pr start_POSTSUBSCRIPT italic_X ∼ italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ≠ italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ) ]
2|𝒱|sτsj=1ks[u(k0×k1×)|ujs=+1PrXpun(𝒜(X)jsujs)+u(k0×k1×)|ujs=1PrXpun(𝒜(X)jsujs)]absent2𝒱subscript𝑠subscript𝜏𝑠superscriptsubscript𝑗1subscript𝑘𝑠delimited-[]subscript𝑢conditionalsubscriptsubscript𝑘0subscriptsubscript𝑘1superscriptsubscript𝑢𝑗𝑠1subscriptprobabilitysimilar-to𝑋superscriptsubscript𝑝𝑢𝑛superscript𝒜superscriptsubscript𝑋𝑗𝑠superscriptsubscript𝑢𝑗𝑠subscript𝑢conditionalsubscriptsubscript𝑘0subscriptsubscript𝑘1superscriptsubscript𝑢𝑗𝑠1subscriptprobabilitysimilar-to𝑋superscriptsubscript𝑝𝑢𝑛superscript𝒜superscriptsubscript𝑋𝑗𝑠superscriptsubscript𝑢𝑗𝑠\displaystyle\geq\frac{2}{|\mathcal{V}|}\sum_{s}\tau_{s}\sum_{j=1}^{k_{s}}% \left[\sum_{u\in(\mathcal{E}_{k_{0}}\times\mathcal{E}_{k_{1}}\times\cdots)\;|% \;u_{j}^{s}=+1}\Pr_{X\sim p_{u}^{n}}(\mathcal{A}^{*}(X)_{j}^{s}\neq u_{j}^{s})% +\sum_{u\in(\mathcal{E}_{k_{0}}\times\mathcal{E}_{k_{1}}\times\cdots)\;|\;u_{j% }^{s}=-1}\Pr_{X\sim p_{u}^{n}}(\mathcal{A}^{*}(X)_{j}^{s}\neq u_{j}^{s})\right]≥ divide start_ARG 2 end_ARG start_ARG | caligraphic_V | end_ARG ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_u ∈ ( caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × ⋯ ) | italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT = + 1 end_POSTSUBSCRIPT roman_Pr start_POSTSUBSCRIPT italic_X ∼ italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ≠ italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_u ∈ ( caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × caligraphic_E start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × ⋯ ) | italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT = - 1 end_POSTSUBSCRIPT roman_Pr start_POSTSUBSCRIPT italic_X ∼ italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ≠ italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ) ]
sτsj=1ks(PrXp+(s,j)n(𝒜(X)+1)+PrXp(s,j)n(𝒜(X)1))absentsubscript𝑠subscript𝜏𝑠superscriptsubscript𝑗1subscript𝑘𝑠subscriptprobabilitysimilar-to𝑋superscriptsubscript𝑝𝑠𝑗𝑛superscript𝒜𝑋1subscriptprobabilitysimilar-to𝑋superscriptsubscript𝑝𝑠𝑗𝑛superscript𝒜𝑋1\displaystyle\geq\sum_{s}\tau_{s}\sum_{j=1}^{k_{s}}(\Pr_{X\sim p_{+(s,j)}^{n}}% (\mathcal{A}^{*}(X)\neq+1)+\Pr_{X\sim p_{-(s,j)}^{n}}(\mathcal{A}^{*}(X)\neq-1))≥ ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( roman_Pr start_POSTSUBSCRIPT italic_X ∼ italic_p start_POSTSUBSCRIPT + ( italic_s , italic_j ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X ) ≠ + 1 ) + roman_Pr start_POSTSUBSCRIPT italic_X ∼ italic_p start_POSTSUBSCRIPT - ( italic_s , italic_j ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X ) ≠ - 1 ) )
sτsj=1ks(PrXp+(s,j)n(ϕs,j(X)+1)+PrXp(s,j)n(ϕs,j(X)1)).absentsubscript𝑠subscript𝜏𝑠superscriptsubscript𝑗1subscript𝑘𝑠subscriptprobabilitysimilar-to𝑋superscriptsubscript𝑝𝑠𝑗𝑛subscriptitalic-ϕ𝑠𝑗𝑋1subscriptprobabilitysimilar-to𝑋superscriptsubscript𝑝𝑠𝑗𝑛subscriptitalic-ϕ𝑠𝑗𝑋1\displaystyle\geq\sum_{s}\tau_{s}\sum_{j=1}^{k_{s}}(\Pr_{X\sim p_{+(s,j)}^{n}}% (\phi_{s,j}(X)\neq+1)+\Pr_{X\sim p_{-(s,j)}^{n}}(\phi_{s,j}(X)\neq-1)).≥ ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( roman_Pr start_POSTSUBSCRIPT italic_X ∼ italic_p start_POSTSUBSCRIPT + ( italic_s , italic_j ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ϕ start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT ( italic_X ) ≠ + 1 ) + roman_Pr start_POSTSUBSCRIPT italic_X ∼ italic_p start_POSTSUBSCRIPT - ( italic_s , italic_j ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ϕ start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT ( italic_X ) ≠ - 1 ) ) .

Combining with eqn 11 we have the first statement. Next, since for each pair (s,j)𝑠𝑗(s,j)( italic_s , italic_j ), there exists a coupling (X,Y)𝑋𝑌(X,Y)( italic_X , italic_Y ) between p+(s,j)subscript𝑝𝑠𝑗p_{+(s,j)}italic_p start_POSTSUBSCRIPT + ( italic_s , italic_j ) end_POSTSUBSCRIPT and p(s,j)subscript𝑝𝑠𝑗p_{-(s,j)}italic_p start_POSTSUBSCRIPT - ( italic_s , italic_j ) end_POSTSUBSCRIPT such that 𝔼[dHam(X,Y)]Ds𝔼delimited-[]subscript𝑑𝐻𝑎𝑚𝑋𝑌subscript𝐷𝑠\mathbb{E}[d_{Ham}(X,Y)]\leq D_{s}blackboard_E [ italic_d start_POSTSUBSCRIPT italic_H italic_a italic_m end_POSTSUBSCRIPT ( italic_X , italic_Y ) ] ≤ italic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, we can use the DP version of Le Cam’s method from [ASZ21] to give for any classifier ϕs,jsubscriptitalic-ϕ𝑠𝑗\phi_{s,j}italic_ϕ start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT,

PrXp+(s,j)n(ϕs,j(X)+1)+PrXp(s,j)n(ϕs,j(X)1)12(0.9e10εDs10Dsδ),subscriptprobabilitysimilar-to𝑋superscriptsubscript𝑝𝑠𝑗𝑛subscriptitalic-ϕ𝑠𝑗𝑋1subscriptprobabilitysimilar-to𝑋superscriptsubscript𝑝𝑠𝑗𝑛subscriptitalic-ϕ𝑠𝑗𝑋1120.9superscript𝑒10𝜀subscript𝐷𝑠10subscript𝐷𝑠𝛿\Pr_{X\sim p_{+(s,j)}^{n}}(\phi_{s,j}(X)\neq+1)+\Pr_{X\sim p_{-(s,j)}^{n}}(% \phi_{s,j}(X)\neq-1)\geq\frac{1}{2}(0.9e^{-10\varepsilon D_{s}}-10D_{s}\delta),roman_Pr start_POSTSUBSCRIPT italic_X ∼ italic_p start_POSTSUBSCRIPT + ( italic_s , italic_j ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ϕ start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT ( italic_X ) ≠ + 1 ) + roman_Pr start_POSTSUBSCRIPT italic_X ∼ italic_p start_POSTSUBSCRIPT - ( italic_s , italic_j ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ϕ start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT ( italic_X ) ≠ - 1 ) ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( 0.9 italic_e start_POSTSUPERSCRIPT - 10 italic_ε italic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT - 10 italic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_δ ) ,

which implies the final result. ∎

See 5.12

Proof of Lemma 5.12.

A standard result in the statistics literature states that for any pair of distributions P𝑃Pitalic_P and Q𝑄Qitalic_Q,

minϕ(PrXPn(ϕ(X)=1)+PrXQn(ϕ(X)=1))=12(1TV(Pn,Qn))12(1nKL(P,Q)),subscriptitalic-ϕsubscriptprobabilitysimilar-to𝑋superscript𝑃𝑛italic-ϕ𝑋1subscriptprobabilitysimilar-to𝑋superscript𝑄𝑛italic-ϕ𝑋1121TVsuperscript𝑃𝑛superscript𝑄𝑛121𝑛KL𝑃𝑄\min_{\phi}\left(\Pr_{X\sim P^{n}}(\phi(X)=1)+\Pr_{X\sim Q^{n}}(\phi(X)=-1)% \right)=\frac{1}{2}(1-\text{\rm TV}(P^{n},Q^{n}))\geq\frac{1}{2}(1-\sqrt{n% \text{\rm KL}(P,Q)}),roman_min start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( roman_Pr start_POSTSUBSCRIPT italic_X ∼ italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ϕ ( italic_X ) = 1 ) + roman_Pr start_POSTSUBSCRIPT italic_X ∼ italic_Q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ϕ ( italic_X ) = - 1 ) ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( 1 - TV ( italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ) ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( 1 - square-root start_ARG italic_n KL ( italic_P , italic_Q ) end_ARG ) ,

where the minimum is over all binary classifiers. If P=Bernoulli(pα)𝑃Bernoulli𝑝𝛼P=\texttt{Bernoulli}(p-\alpha)italic_P = Bernoulli ( italic_p - italic_α ) and Q=Bernoulli(p+α)𝑄Bernoulli𝑝𝛼Q=\texttt{Bernoulli}(p+\alpha)italic_Q = Bernoulli ( italic_p + italic_α ) where 0α12L(p)0𝛼12𝐿𝑝0\leq\alpha\leq\frac{1}{2}L(p)0 ≤ italic_α ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_L ( italic_p ) then

KL(Q,P)KL𝑄𝑃\displaystyle\text{\rm KL}(Q,P)KL ( italic_Q , italic_P ) =(p+α)lnp+αpα+(1pα)ln1pα1p+αabsent𝑝𝛼𝑝𝛼𝑝𝛼1𝑝𝛼1𝑝𝛼1𝑝𝛼\displaystyle=(p+\alpha)\ln\frac{p+\alpha}{p-\alpha}+(1-p-\alpha)\ln\frac{1-p-% \alpha}{1-p+\alpha}= ( italic_p + italic_α ) roman_ln divide start_ARG italic_p + italic_α end_ARG start_ARG italic_p - italic_α end_ARG + ( 1 - italic_p - italic_α ) roman_ln divide start_ARG 1 - italic_p - italic_α end_ARG start_ARG 1 - italic_p + italic_α end_ARG
=(p+α)ln(1+2αpα)+(1pα)ln(12α1p+α)absent𝑝𝛼12𝛼𝑝𝛼1𝑝𝛼12𝛼1𝑝𝛼\displaystyle=(p+\alpha)\ln\left(1+\frac{2\alpha}{p-\alpha}\right)+(1-p-\alpha% )\ln\left(1-\frac{2\alpha}{1-p+\alpha}\right)= ( italic_p + italic_α ) roman_ln ( 1 + divide start_ARG 2 italic_α end_ARG start_ARG italic_p - italic_α end_ARG ) + ( 1 - italic_p - italic_α ) roman_ln ( 1 - divide start_ARG 2 italic_α end_ARG start_ARG 1 - italic_p + italic_α end_ARG )
(p+α)2αpα(1pα)2α1p+αabsent𝑝𝛼2𝛼𝑝𝛼1𝑝𝛼2𝛼1𝑝𝛼\displaystyle\leq(p+\alpha)\frac{2\alpha}{p-\alpha}-(1-p-\alpha)\frac{2\alpha}% {1-p+\alpha}≤ ( italic_p + italic_α ) divide start_ARG 2 italic_α end_ARG start_ARG italic_p - italic_α end_ARG - ( 1 - italic_p - italic_α ) divide start_ARG 2 italic_α end_ARG start_ARG 1 - italic_p + italic_α end_ARG
=4α2pα+4α21p+αabsent4superscript𝛼2𝑝𝛼4superscript𝛼21𝑝𝛼\displaystyle=\frac{4\alpha^{2}}{p-\alpha}+\frac{4\alpha^{2}}{1-p+\alpha}= divide start_ARG 4 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_p - italic_α end_ARG + divide start_ARG 4 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_p + italic_α end_ARG
=α2(pα)(1p+α)absentsuperscript𝛼2𝑝𝛼1𝑝𝛼\displaystyle=\frac{\alpha^{2}}{(p-\alpha)(1-p+\alpha)}= divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( italic_p - italic_α ) ( 1 - italic_p + italic_α ) end_ARG
14n.absent14𝑛\displaystyle\leq\frac{1}{4n}.≤ divide start_ARG 1 end_ARG start_ARG 4 italic_n end_ARG .

where the first inequality holds since ln(1+x)<x1𝑥𝑥\ln(1+x)<xroman_ln ( start_ARG 1 + italic_x end_ARG ) < italic_x for x[1,1]𝑥11x\in[-1,1]italic_x ∈ [ - 1 , 1 ] and by assumption 2α/(pα),2α/(1p+α)[0,1]2𝛼𝑝𝛼2𝛼1𝑝𝛼012\alpha/(p-\alpha),2\alpha/(1-p+\alpha)\in[0,1]2 italic_α / ( italic_p - italic_α ) , 2 italic_α / ( 1 - italic_p + italic_α ) ∈ [ 0 , 1 ] and the second follows again because of the constraint on α𝛼\alphaitalic_α. ∎

See 5.14

Lemma 5.14 is an immediate corollary of the following lemma.

Lemma C.1.

For any distribution P𝑃Pitalic_P, if log(n/β)>1𝑛𝛽1\log(n/\beta)>1roman_log ( start_ARG italic_n / italic_β end_ARG ) > 1 then with probability 130ptβ130𝑝𝑡𝛽1-30pt\beta1 - 30 italic_p italic_t italic_β,

𝔚(𝔊P^,𝔊P)[0pt]x[N]min{P(x),1P(x),43P(x)log(n/β)n,43(1P(x))log(n/β)n}𝔚^subscript𝔊𝑃subscript𝔊𝑃subscriptdelimited-[]0𝑝𝑡subscript𝑥delimited-[]subscript𝑁subscript𝑃𝑥1subscript𝑃𝑥43subscript𝑃𝑥𝑛𝛽𝑛431subscript𝑃𝑥𝑛𝛽𝑛\mathfrak{W}(\widehat{\mathfrak{G}_{P}},\mathfrak{G}_{P})\leq\sum_{\ell\in[0pt% ]}\sum_{x\in[N_{\ell}]}\min\left\{P_{\ell}(x),1-P_{\ell}(x),4\sqrt{3\frac{P_{% \ell}(x)\log(n/\beta)}{n}},4\sqrt{3\frac{(1-P_{\ell}(x))\log(n/\beta)}{n}}\right\}fraktur_W ( over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG , fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ) ≤ ∑ start_POSTSUBSCRIPT roman_ℓ ∈ [ 0 italic_p italic_t ] end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_x ∈ [ italic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT roman_min { italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) , 1 - italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) , 4 square-root start_ARG 3 divide start_ARG italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG , 4 square-root start_ARG 3 divide start_ARG ( 1 - italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) ) roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG }
Proof of Lemma 5.14.

We’ll consider each level of the tree individually then use a union bound over all the levels to obtain our final bound. Let (P^)nsubscriptsubscript^𝑃𝑛(\hat{P}_{\ell})_{n}( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be the empirical distribution without truncation. The following conditions are sufficient to ensure that the bounds hold for a single level \ellroman_ℓ:

supν s.t. P(ν)3ln(n/β)n(P)^n(ν)7ln(n/β)nsubscriptsupremum𝜈 s.t. subscript𝑃𝜈3𝑛𝛽𝑛subscript^subscript𝑃𝑛𝜈7𝑛𝛽𝑛\sup_{\nu\text{ s.t. }P_{\ell}(\nu)\leq\frac{3\ln(n/\beta)}{n}}\hat{(P_{\ell})% }_{n}(\nu)\leq\frac{7\ln(n/\beta)}{n}roman_sup start_POSTSUBSCRIPT italic_ν s.t. italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) ≤ divide start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_POSTSUBSCRIPT over^ start_ARG ( italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_ν ) ≤ divide start_ARG 7 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG
supν s.t. P(ν)13ln(n/β)n(P)^n(ν)17ln(n/β)nsubscriptsupremum𝜈 s.t. subscript𝑃𝜈13𝑛𝛽𝑛subscript^subscript𝑃𝑛𝜈17𝑛𝛽𝑛\sup_{\nu\text{ s.t. }P_{\ell}(\nu)\geq 1-\frac{3\ln(n/\beta)}{n}}\hat{(P_{% \ell})}_{n}(\nu)\geq 1-\frac{7\ln(n/\beta)}{n}roman_sup start_POSTSUBSCRIPT italic_ν s.t. italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) ≥ 1 - divide start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_POSTSUBSCRIPT over^ start_ARG ( italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_ν ) ≥ 1 - divide start_ARG 7 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG
(ν s.t. P(ν)[3ln(n/β)n,13ln(n/β)n]),for-all𝜈 s.t. subscript𝑃𝜈3𝑛𝛽𝑛13𝑛𝛽𝑛\forall\left(\nu\text{ s.t. }P_{\ell}(\nu)\in\left[\frac{3\ln(n/\beta)}{n},1-% \frac{3\ln(n/\beta)}{n}\right]\right),\hskip 144.54pt∀ ( italic_ν s.t. italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) ∈ [ divide start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG , 1 - divide start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG ] ) ,
|(P)^n(x)P(ν)|min{3P(ν)ln(n/β)n,3(1P(x))ln(n/β)n}subscript^subscript𝑃𝑛𝑥subscript𝑃𝜈3subscript𝑃𝜈𝑛𝛽𝑛31subscript𝑃𝑥𝑛𝛽𝑛\hskip 72.26999pt|\hat{(P_{\ell})}_{n}(x)-P_{\ell}(\nu)|\leq\min\left\{\sqrt{% \frac{3P_{\ell}(\nu)\ln(n/\beta)}{n}},\sqrt{\frac{3(1-P_{\ell}(x))\ln(n/\beta)% }{n}}\right\}| over^ start_ARG ( italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x ) - italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) | ≤ roman_min { square-root start_ARG divide start_ARG 3 italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG , square-root start_ARG divide start_ARG 3 ( 1 - italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) ) roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG }

We will begin by showing these conditions are sufficient. If P(ν)[3ln(n/β)n,13ln(n/β)n]subscript𝑃𝜈3𝑛𝛽𝑛13𝑛𝛽𝑛P_{\ell}(\nu)\notin[\frac{3\ln(n/\beta)}{n},1-\frac{3\ln(n/\beta)}{n}]italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) ∉ [ divide start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG , 1 - divide start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG ] then these conditions imply that the empirical density for node ν𝜈\nuitalic_ν is truncated, and hence the error that that node is either P(ν)subscript𝑃𝜈P_{\ell}(\nu)italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) or 1P(ν)1subscript𝑃𝜈1-P_{\ell}(\nu)1 - italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) (when P(ν)<1/2subscript𝑃𝜈12P_{\ell}(\nu)<1/2italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) < 1 / 2 and P(ν)>1/2subscript𝑃𝜈12P_{\ell}(\nu)>1/2italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) > 1 / 2, respectively), as required. If P(ν)[3ln(n/β)n,13ln(n/β)n]subscript𝑃𝜈3𝑛𝛽𝑛13𝑛𝛽𝑛P_{\ell}(\nu)\in[\frac{3\ln(n/\beta)}{n},1-\frac{3\ln(n/\beta)}{n}]italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) ∈ [ divide start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG , 1 - divide start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG ] then either the estimate is not truncated and the error is less than min{3P(ν)ln(2n/β)n,3(1P(x))ln(2n/β)n}min{P(ν),1P(ν)}3subscript𝑃𝜈2𝑛𝛽𝑛31subscript𝑃𝑥2𝑛𝛽𝑛subscript𝑃𝜈1subscript𝑃𝜈\min\left\{\sqrt{\frac{3P_{\ell}(\nu)\ln(2n/\beta)}{n}},\sqrt{\frac{3(1-P_{% \ell}(x))\ln(2n/\beta)}{n}}\right\}\leq\min\{P_{\ell}(\nu),1-P_{\ell}(\nu)\}roman_min { square-root start_ARG divide start_ARG 3 italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) roman_ln ( start_ARG 2 italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG , square-root start_ARG divide start_ARG 3 ( 1 - italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) ) roman_ln ( start_ARG 2 italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG } ≤ roman_min { italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) , 1 - italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) }, as required. Or the estimate is truncated and the error is min{P(ν),1P(ν)}subscript𝑃𝜈1subscript𝑃𝜈\min\{P_{\ell}(\nu),1-P_{\ell}(\nu)\}roman_min { italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) , 1 - italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) }. Under the above conditions, if P(ν)1/2subscript𝑃𝜈12P_{\ell}(\nu)\leq 1/2italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) ≤ 1 / 2 then truncation will only occur if

P(ν)3pln(2n/β)n7ln(n/β)n7ln(n/β)n733ln(n/β)n7ln(n/β)n73p=733ln(n/β)np,subscript𝑃𝜈3𝑝2𝑛𝛽𝑛7𝑛𝛽𝑛7𝑛𝛽𝑛733𝑛𝛽𝑛7𝑛𝛽𝑛73𝑝733𝑛𝛽𝑛𝑝P_{\ell}(\nu)-\sqrt{\frac{3p\ln(2n/\beta)}{n}}\leq\frac{7\ln(n/\beta)}{n}\leq% \sqrt{\frac{7\ln(n/\beta)}{n}\frac{7}{3}\frac{3\ln(n/\beta)}{n}}\leq\sqrt{% \frac{7\ln(n/\beta)}{n}\frac{7}{3}p}=\frac{7}{3}\sqrt{\frac{3\ln(n/\beta)}{n}p},italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) - square-root start_ARG divide start_ARG 3 italic_p roman_ln ( start_ARG 2 italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG ≤ divide start_ARG 7 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG ≤ square-root start_ARG divide start_ARG 7 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG divide start_ARG 7 end_ARG start_ARG 3 end_ARG divide start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG ≤ square-root start_ARG divide start_ARG 7 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG divide start_ARG 7 end_ARG start_ARG 3 end_ARG italic_p end_ARG = divide start_ARG 7 end_ARG start_ARG 3 end_ARG square-root start_ARG divide start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG italic_p end_ARG ,

in which case P(nu)43P(ν)ln(n/β)nsubscript𝑃𝑛𝑢43subscript𝑃𝜈𝑛𝛽𝑛P_{\ell}(nu)\leq 4\sqrt{\frac{3P_{\ell}(\nu)\ln(n/\beta)}{n}}italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_n italic_u ) ≤ 4 square-root start_ARG divide start_ARG 3 italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG, as required. Similarly, if P(ν)>1/2subscript𝑃𝜈12P_{\ell}(\nu)>1/2italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) > 1 / 2 then truncation will only occur if 1P(ν)43(1P(ν))ln(n/β)n1subscript𝑃𝜈431subscript𝑃𝜈𝑛𝛽𝑛1-P_{\ell}(\nu)\leq 4\sqrt{\frac{3(1-P_{\ell}(\nu))\ln(n/\beta)}{n}}1 - italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) ≤ 4 square-root start_ARG divide start_ARG 3 ( 1 - italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) ) roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG, as required.

We will now show that these conditions hold simultaneously with probability at least 13β13𝛽1-3\beta1 - 3 italic_β for all the nodes at level \ellroman_ℓ. If P(ν)1ensubscript𝑃𝜈1𝑒𝑛P_{\ell}(\nu)\leq\frac{1}{en}italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) ≤ divide start_ARG 1 end_ARG start_ARG italic_e italic_n end_ARG then using the multiplicative form of Chernoff bound,

Pr((P^)n(ν)3ln(n/β)n)probabilitysubscript^subscript𝑃𝑛𝜈3𝑛𝛽𝑛\displaystyle\Pr((\hat{P_{\ell}})_{n}(\nu)\geq\frac{3\ln(n/\beta)}{n})roman_Pr ( start_ARG ( over^ start_ARG italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_ν ) ≥ divide start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG ) =Pr((P^)n(ν)(1+3ln(n/β)P(ν)n1)P(ν))absentprobabilitysubscript^subscript𝑃𝑛𝜈13𝑛𝛽subscript𝑃𝜈𝑛1subscript𝑃𝜈\displaystyle=\Pr((\hat{P_{\ell}})_{n}(\nu)\geq\left(1+\frac{3\ln(n/\beta)}{P_% {\ell}(\nu)n}-1\right)P_{\ell}(\nu))= roman_Pr ( start_ARG ( over^ start_ARG italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_ν ) ≥ ( 1 + divide start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) italic_n end_ARG - 1 ) italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) end_ARG )
(e3ln(n/β)nP(ν)1(3ln(n/β)nP(ν))3ln(n/β)nP(ν))P(ν)nabsentsuperscriptsuperscript𝑒3𝑛𝛽𝑛subscript𝑃𝜈1superscript3𝑛𝛽𝑛subscript𝑃𝜈3𝑛𝛽𝑛subscript𝑃𝜈subscript𝑃𝜈𝑛\displaystyle\leq\left(\frac{e^{\frac{3\ln(n/\beta)}{nP_{\ell}(\nu)}-1}}{(% \frac{3\ln(n/\beta)}{nP_{\ell}(\nu)})^{\frac{3\ln(n/\beta)}{nP_{\ell}(\nu)}}}% \right)^{P_{\ell}(\nu)n}≤ ( divide start_ARG italic_e start_POSTSUPERSCRIPT divide start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) end_ARG - 1 end_POSTSUPERSCRIPT end_ARG start_ARG ( divide start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) end_ARG end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) italic_n end_POSTSUPERSCRIPT
(enP(ν)3ln(n/β))3ln(n/β)absentsuperscript𝑒𝑛subscript𝑃𝜈3𝑛𝛽3𝑛𝛽\displaystyle\leq\left(\frac{enP_{\ell}(\nu)}{3\ln(n/\beta)}\right)^{3\ln(n/% \beta)}≤ ( divide start_ARG italic_e italic_n italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) end_ARG start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG ) start_POSTSUPERSCRIPT 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_POSTSUPERSCRIPT
P(ν)n(e3ln(n/β))3ln(n/β)(nP(ν))3ln(n/β)1absentsubscript𝑃𝜈𝑛superscript𝑒3𝑛𝛽3𝑛𝛽superscript𝑛subscript𝑃𝜈3𝑛𝛽1\displaystyle\leq P_{\ell}(\nu)n\left(\frac{e}{3\ln(n/\beta)}\right)^{3\ln(n/% \beta)}(nP_{\ell}(\nu))^{3\ln(n/\beta)-1}≤ italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) italic_n ( divide start_ARG italic_e end_ARG start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG ) start_POSTSUPERSCRIPT 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_POSTSUPERSCRIPT ( italic_n italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) ) start_POSTSUPERSCRIPT 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) - 1 end_POSTSUPERSCRIPT

Firstly, since ln(n/β)1𝑛𝛽1\ln(n/\beta)\geq 1roman_ln ( start_ARG italic_n / italic_β end_ARG ) ≥ 1, (e3ln(n/β))3ln(n/β)1superscript𝑒3𝑛𝛽3𝑛𝛽1\left(\frac{e}{3\ln(n/\beta)}\right)^{3\ln(n/\beta)}\leq 1( divide start_ARG italic_e end_ARG start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG ) start_POSTSUPERSCRIPT 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_POSTSUPERSCRIPT ≤ 1. Further, nP(ν)1/e𝑛subscript𝑃𝜈1𝑒nP_{\ell}(\nu)\leq 1/eitalic_n italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) ≤ 1 / italic_e and 3ln(n/β)1ln(n/β)3𝑛𝛽1𝑛𝛽3\ln(n/\beta)-1\geq\ln(n/\beta)3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) - 1 ≥ roman_ln ( start_ARG italic_n / italic_β end_ARG ) so (nP(ν))3ln(n/β)1(1/e)ln(n/β)=β/nsuperscript𝑛subscript𝑃𝜈3𝑛𝛽1superscript1𝑒𝑛𝛽𝛽𝑛(nP_{\ell}(\nu))^{3\ln(n/\beta)-1}\leq(1/e)^{\ln(n/\beta)}=\beta/n( italic_n italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) ) start_POSTSUPERSCRIPT 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) - 1 end_POSTSUPERSCRIPT ≤ ( 1 / italic_e ) start_POSTSUPERSCRIPT roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_POSTSUPERSCRIPT = italic_β / italic_n. Therefore,

Pr((P^)n(ν)3ln(n/β)n)P(ν)β.probabilitysubscript^subscript𝑃𝑛𝜈3𝑛𝛽𝑛subscript𝑃𝜈𝛽\Pr((\hat{P_{\ell}})_{n}(\nu)\geq\frac{3\ln(n/\beta)}{n})\leq P_{\ell}(\nu)\beta.roman_Pr ( start_ARG ( over^ start_ARG italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_ν ) ≥ divide start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG ) ≤ italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) italic_β . (12)

Let 𝒮={x[N]|P(x)<1/(en)}𝒮conditional-set𝑥delimited-[]subscript𝑁subscript𝑃𝑥1𝑒𝑛\mathcal{S}=\{x\in[N_{\ell}]\;|\;P_{\ell}(x)<1/(en)\}caligraphic_S = { italic_x ∈ [ italic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ] | italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) < 1 / ( italic_e italic_n ) } then using a union bound and Eqn (12) we have

Pr(x𝒮 s.t. (P)^n(x)22log(n/β)n)probability𝑥𝒮 s.t. subscript^subscript𝑃𝑛𝑥22𝑛𝛽𝑛\displaystyle\Pr(\exists x\in\mathcal{S}\text{ s.t. }\hat{(P_{\ell})}_{n}(x)% \geq\frac{2\sqrt{2}\log(n/\beta)}{n})roman_Pr ( start_ARG ∃ italic_x ∈ caligraphic_S s.t. over^ start_ARG ( italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x ) ≥ divide start_ARG 2 square-root start_ARG 2 end_ARG roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG ) x𝒮P(ν)ββabsentsubscript𝑥𝒮subscript𝑃𝜈𝛽𝛽\displaystyle\leq\sum_{x\in\mathcal{S}}P_{\ell}(\nu)\beta\leq\beta≤ ∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_S end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) italic_β ≤ italic_β

There exist at most n𝑛nitalic_n elements in [N]delimited-[]subscript𝑁[N_{\ell}][ italic_N start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ] that do not belong in 𝒮𝒮\mathcal{S}caligraphic_S. We will prove that, independently, each of these elements satisfy the required condition with probability 2β/nabsent2𝛽𝑛\leq 2\beta/n≤ 2 italic_β / italic_n then a union bound proves the final result. If P(ν)[3ln(n/β)n,13ln(n/β)n]subscript𝑃𝜈3𝑛𝛽𝑛13𝑛𝛽𝑛P_{\ell}(\nu)\in[\frac{3\ln(n/\beta)}{n},1-\frac{3\ln(n/\beta)}{n}]italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) ∈ [ divide start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG , 1 - divide start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG ] then using the multiplicative form of Chernoff bound (If Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are all i.i.d. and 0<δ<10𝛿10<\delta<10 < italic_δ < 1, then Pr(|i=1nXin𝔼[X1]|δn𝔼[X1])2eδ2n𝔼[X1]/3probabilitysuperscriptsubscript𝑖1𝑛subscript𝑋𝑖𝑛𝔼delimited-[]subscript𝑋1𝛿𝑛𝔼delimited-[]subscript𝑋12superscript𝑒superscript𝛿2𝑛𝔼delimited-[]subscript𝑋13\Pr(|\sum_{i=1}^{n}X_{i}-n\mathbb{E}[X_{1}]|\geq\delta n\mathbb{E}[X_{1}])\leq 2% e^{-\delta^{2}n\mathbb{E}[X_{1}]/3}roman_Pr ( start_ARG | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_n blackboard_E [ italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] | ≥ italic_δ italic_n blackboard_E [ italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] end_ARG ) ≤ 2 italic_e start_POSTSUPERSCRIPT - italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n blackboard_E [ italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] / 3 end_POSTSUPERSCRIPT),

Pr(|(P)^n(x)P(x)|3P(x)log(n/β)n)probabilitysubscript^subscript𝑃𝑛𝑥subscript𝑃𝑥3subscript𝑃𝑥𝑛𝛽𝑛\displaystyle\Pr(|\hat{(P_{\ell})}_{n}(x)-P_{\ell}(x)|\geq\sqrt{\frac{3P_{\ell% }(x)\log(n/\beta)}{n}})roman_Pr ( start_ARG | over^ start_ARG ( italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x ) - italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) | ≥ square-root start_ARG divide start_ARG 3 italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG end_ARG ) =Pr(|(P)^n(x)P(x)|3log(n/β)P(x)nP(x))absentprobabilitysubscript^subscript𝑃𝑛𝑥subscript𝑃𝑥3𝑛𝛽subscript𝑃𝑥𝑛subscript𝑃𝑥\displaystyle=\Pr(|\hat{(P_{\ell})}_{n}(x)-P_{\ell}(x)|\geq\sqrt{\frac{3\log(n% /\beta)}{P_{\ell}(x)n}}P_{\ell}(x))= roman_Pr ( start_ARG | over^ start_ARG ( italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x ) - italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) | ≥ square-root start_ARG divide start_ARG 3 roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) italic_n end_ARG end_ARG italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) end_ARG )
2e(3log(n/β)P(x)n)P(x)n3absent2superscript𝑒3𝑛𝛽subscript𝑃𝑥𝑛subscript𝑃𝑥𝑛3\displaystyle\leq 2e^{\frac{-\left(\frac{3\log(n/\beta)}{P_{\ell}(x)n}\right)P% _{\ell}(x)n}{3}}≤ 2 italic_e start_POSTSUPERSCRIPT divide start_ARG - ( divide start_ARG 3 roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) italic_n end_ARG ) italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x ) italic_n end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT
=2β/n.absent2𝛽𝑛\displaystyle=2\beta/n.= 2 italic_β / italic_n .

Next, if P(ν)3ln(n/β)nsubscript𝑃𝜈3𝑛𝛽𝑛P_{\ell}(\nu)\leq\frac{3\ln(n/\beta)}{n}italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) ≤ divide start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG then using the additive form of Chernoff bound (If Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are all i.i.d. and ε0𝜀0\varepsilon\geq 0italic_ε ≥ 0, then Pr(1ni=1nXi𝔼[X1]+ε)eε2n/(2(p+ε))probability1𝑛superscriptsubscript𝑖1𝑛subscript𝑋𝑖𝔼delimited-[]subscript𝑋1𝜀superscript𝑒superscript𝜀2𝑛2𝑝𝜀\Pr(\frac{1}{n}\sum_{i=1}^{n}X_{i}\geq\mathbb{E}[X_{1}]+\varepsilon)\leq e^{-% \varepsilon^{2}n/(2(p+\varepsilon))}roman_Pr ( start_ARG divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ blackboard_E [ italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] + italic_ε end_ARG ) ≤ italic_e start_POSTSUPERSCRIPT - italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n / ( 2 ( italic_p + italic_ε ) ) end_POSTSUPERSCRIPT)

Pr((P^)n(ν)7ln(n/β)n)probabilitysubscript^subscript𝑃𝑛𝜈7𝑛𝛽𝑛\displaystyle\Pr((\hat{P_{\ell}})_{n}(\nu)\geq\frac{7\ln(n/\beta)}{n})roman_Pr ( start_ARG ( over^ start_ARG italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_ν ) ≥ divide start_ARG 7 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG ) Pr((P^)n(ν)p+(7ln(n/β)np))absentprobabilitysubscript^subscript𝑃𝑛𝜈𝑝7𝑛𝛽𝑛𝑝\displaystyle\leq\Pr((\hat{P_{\ell}})_{n}(\nu)\geq p+(7\frac{\ln(n/\beta)}{n}-% p))≤ roman_Pr ( start_ARG ( over^ start_ARG italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_ν ) ≥ italic_p + ( 7 divide start_ARG roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG - italic_p ) end_ARG )
e(7ln(n/β)np)2n14ln(n/β)nabsentsuperscript𝑒superscript7𝑛𝛽𝑛𝑝2𝑛14𝑛𝛽𝑛\displaystyle\leq e^{-\frac{(7\frac{\ln(n/\beta)}{n}-p)^{2}n}{14\frac{\ln(n/% \beta)}{n}}}≤ italic_e start_POSTSUPERSCRIPT - divide start_ARG ( 7 divide start_ARG roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG 14 divide start_ARG roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG end_POSTSUPERSCRIPT
e(4ln(n/β)n)2n14ln(n/β)nabsentsuperscript𝑒superscript4𝑛𝛽𝑛2𝑛14𝑛𝛽𝑛\displaystyle\leq e^{-\frac{(4\frac{\ln(n/\beta)}{n})^{2}n}{14\frac{\ln(n/% \beta)}{n}}}≤ italic_e start_POSTSUPERSCRIPT - divide start_ARG ( 4 divide start_ARG roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG 14 divide start_ARG roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG end_POSTSUPERSCRIPT
eln(n/β)absentsuperscript𝑒𝑛𝛽\displaystyle\leq e^{-\ln(n/\beta)}≤ italic_e start_POSTSUPERSCRIPT - roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_POSTSUPERSCRIPT
=β/n.absent𝛽𝑛\displaystyle=\beta/n.= italic_β / italic_n .

By symmetry, if P(ν)13ln(n/β)nsubscript𝑃𝜈13𝑛𝛽𝑛P_{\ell}(\nu)\geq 1-\frac{3\ln(n/\beta)}{n}italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) ≥ 1 - divide start_ARG 3 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG then

Pr((P^)n(ν)17ln(n/β)n)β/n.probabilitysubscript^subscript𝑃𝑛𝜈17𝑛𝛽𝑛𝛽𝑛\Pr((\hat{P_{\ell}})_{n}(\nu)\leq 1-\frac{7\ln(n/\beta)}{n})\leq\beta/n.roman_Pr ( start_ARG ( over^ start_ARG italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_ν ) ≤ 1 - divide start_ARG 7 roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG ) ≤ italic_β / italic_n .

See 5.15

Proof of Lemma 5.15.

First notice that if a node ν𝜈\nuitalic_ν is an α𝛼\alphaitalic_α-active node, then all of it’s ancestor nodes are also α𝛼\alphaitalic_α-active. So, it suffices to show that (with high probability) if at any stage a node makes to it Line 7 of Algorithm 3, then if νγP(2κ)𝜈subscript𝛾𝑃2𝜅\nu\notin\gamma_{{P}}\left({2\kappa}\right)italic_ν ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 2 italic_κ ) then 𝔊P^(ν)+𝖫𝖺𝗉(1εn)2κ+log(2/β)εn^subscript𝔊𝑃𝜈𝖫𝖺𝗉1𝜀𝑛2𝜅2𝛽𝜀𝑛\widehat{\mathfrak{G}_{P}}(\nu)+\mathsf{Lap}(\frac{1}{\varepsilon n})\leq 2% \kappa+\frac{\log(2/\beta)}{\varepsilon n}over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ( italic_ν ) + sansserif_Lap ( divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG ) ≤ 2 italic_κ + divide start_ARG roman_log ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG and if νγP(max{2εn+4log(2/β)εn,log(n/β)n})𝜈subscript𝛾𝑃2𝜀𝑛42𝛽𝜀𝑛𝑛𝛽𝑛\nu\in\gamma_{{P}}\left({\max\left\{\frac{2}{\varepsilon n}+4\frac{\log(2/% \beta)}{\varepsilon n},\frac{\log(n/\beta)}{n}\right\}}\right)italic_ν ∈ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( roman_max { divide start_ARG 2 end_ARG start_ARG italic_ε italic_n end_ARG + 4 divide start_ARG roman_log ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG , divide start_ARG roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG } ) then 𝔊P^(ν)+𝖫𝖺𝗉(1εn))>2κ+log(2/β)εn\widehat{\mathfrak{G}_{P}}(\nu)+\mathsf{Lap}(\frac{1}{\varepsilon n}))>2\kappa% +\frac{\log(2/\beta)}{\varepsilon n}over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ( italic_ν ) + sansserif_Lap ( divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG ) ) > 2 italic_κ + divide start_ARG roman_log ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG.

By Lemma 5.14, with probability 130ptβ130𝑝𝑡𝛽1-30pt\beta1 - 30 italic_p italic_t italic_β, all nodes ν𝜈\nuitalic_ν satisfy

|𝔊P^(ν)𝔊P(ν)|min{𝔊P(ν)(1𝔊P(ν)),43𝔊P(ν)(1𝔊P(ν))log(n/β)n}^subscript𝔊𝑃𝜈subscript𝔊𝑃𝜈subscript𝔊𝑃𝜈1subscript𝔊𝑃𝜈43subscript𝔊𝑃𝜈1subscript𝔊𝑃𝜈𝑛𝛽𝑛|\widehat{\mathfrak{G}_{P}}(\nu)-\mathfrak{G}_{P}(\nu)|\leq\min\left\{% \mathfrak{G}_{P}(\nu)(1-\mathfrak{G}_{P}(\nu)),4\sqrt{\frac{3\mathfrak{G}_{P}(% \nu)(1-\mathfrak{G}_{P}(\nu))\log(n/\beta)}{n}}\right\}| over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ( italic_ν ) - fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) | ≤ roman_min { fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) ( 1 - fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) ) , 4 square-root start_ARG divide start_ARG 3 fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) ( 1 - fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) ) roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG }

Further, if one samples X𝑋Xitalic_X independent samples from 𝖫𝖺𝗉(1εn)𝖫𝖺𝗉1𝜀𝑛\mathsf{Lap}(\frac{1}{\varepsilon n})sansserif_Lap ( divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG ) then with probability 1Xβ1𝑋𝛽1-X\beta1 - italic_X italic_β,

sup|𝖫𝖺𝗉(1εn)|ln(2/β)εn.supremum𝖫𝖺𝗉1𝜀𝑛2𝛽𝜀𝑛\sup|\mathsf{Lap}(\frac{1}{\varepsilon n})|\leq\frac{\ln(2/\beta)}{\varepsilon n}.roman_sup | sansserif_Lap ( divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG ) | ≤ divide start_ARG roman_ln ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG .

So conditioning on both these events if xγP(12εn)𝑥subscript𝛾𝑃12𝜀𝑛x\notin\gamma_{{P}}\left({\frac{1}{2\varepsilon n}}\right)italic_x ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ),

𝔊P^(ν)+𝖫𝖺𝗉(1εn)𝔊P(ν)+ln(2/β)εn12εn+ln(2/β)εn,^subscript𝔊𝑃𝜈𝖫𝖺𝗉1𝜀𝑛subscript𝔊𝑃𝜈2𝛽𝜀𝑛12𝜀𝑛2𝛽𝜀𝑛\widehat{\mathfrak{G}_{P}}(\nu)+\mathsf{Lap}(\frac{1}{\varepsilon n})\leq% \mathfrak{G}_{P}(\nu)+\frac{\ln(2/\beta)}{\varepsilon n}\leq\frac{1}{2% \varepsilon n}+\frac{\ln(2/\beta)}{\varepsilon n},over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ( italic_ν ) + sansserif_Lap ( divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG ) ≤ fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) + divide start_ARG roman_ln ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG ≤ divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG + divide start_ARG roman_ln ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG ,

so they will not survive Line 7 of Algorithm 3. If xγP(max{2εn+4log(2/β)εn,192log(n/β)n})𝑥subscript𝛾𝑃2𝜀𝑛42𝛽𝜀𝑛192𝑛𝛽𝑛x\in\gamma_{{P}}\left({\max\{\frac{2}{\varepsilon n}+4\frac{\log(2/\beta)}{% \varepsilon n},\frac{192\log(n/\beta)}{n}\}}\right)italic_x ∈ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( roman_max { divide start_ARG 2 end_ARG start_ARG italic_ε italic_n end_ARG + 4 divide start_ARG roman_log ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG , divide start_ARG 192 roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG } ) then

𝔊P^(ν)+𝖫𝖺𝗉(1εn)^subscript𝔊𝑃𝜈𝖫𝖺𝗉1𝜀𝑛\displaystyle\widehat{\mathfrak{G}_{P}}(\nu)+\mathsf{Lap}(\frac{1}{\varepsilon n})over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ( italic_ν ) + sansserif_Lap ( divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG ) 𝔊P(ν)43𝔊P(ν)log(n/β)nln(2/β)εnabsentsubscript𝔊𝑃𝜈43subscript𝔊𝑃𝜈𝑛𝛽𝑛2𝛽𝜀𝑛\displaystyle\geq\mathfrak{G}_{P}(\nu)-4\sqrt{3\frac{\mathfrak{G}_{P}(\nu)\log% (n/\beta)}{n}}-\frac{\ln(2/\beta)}{\varepsilon n}≥ fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) - 4 square-root start_ARG 3 divide start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG end_ARG - divide start_ARG roman_ln ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG
12𝔊P(ν)log(2/β)εnabsent12subscript𝔊𝑃𝜈2𝛽𝜀𝑛\displaystyle\geq\frac{1}{2}\mathfrak{G}_{P}(\nu)-\frac{\log(2/\beta)}{% \varepsilon n}≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) - divide start_ARG roman_log ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG
1εn+log(2/β)nabsent1𝜀𝑛2𝛽𝑛\displaystyle\geq\frac{1}{\varepsilon n}+\frac{\log(2/\beta)}{n}≥ divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG + divide start_ARG roman_log ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG

Each level has at most 2εn2𝜀𝑛2\varepsilon n2 italic_ε italic_n in γP(12εn)subscript𝛾𝑃12𝜀𝑛\gamma_{{P}}\left({\frac{1}{2\varepsilon n}}\right)italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) so we query at most 4εn4𝜀𝑛4\varepsilon n4 italic_ε italic_n nodes in the tree when running LocateActiveNodes since each node has at most 2 children. Therefore, we can set X=4εn0pt𝑋4𝜀𝑛0𝑝𝑡X=4\varepsilon n0ptitalic_X = 4 italic_ε italic_n 0 italic_p italic_t. ∎

See 5.16

Proof of Lemma 5.16.

The key component of this proof is that any discrepancy between the weight of the nodes on P𝑃Pitalic_P and that assigned by 𝔊P^^subscript𝔊𝑃\widehat{\mathfrak{G}_{P}}over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG was already paid for in 𝒲(P,𝔊P^)𝒲𝑃^subscript𝔊𝑃\mathcal{W}(P,\widehat{\mathfrak{G}_{P}})caligraphic_W ( italic_P , over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ).

𝔚(𝔊P^,𝔊P^|γ^ε)𝔚^subscript𝔊𝑃evaluated-at^subscript𝔊𝑃subscript^𝛾𝜀\displaystyle\mathfrak{W}(\widehat{\mathfrak{G}_{P}},\widehat{\mathfrak{G}_{P}% }|_{\hat{\gamma}_{\varepsilon}})fraktur_W ( over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG , over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG | start_POSTSUBSCRIPT over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) =νγ^εrν|𝔊P^(ν)|absentsubscript𝜈subscript^𝛾𝜀subscript𝑟𝜈^subscript𝔊𝑃𝜈\displaystyle=\sum_{\nu\notin\hat{\gamma}_{\varepsilon}}r_{\nu}|\widehat{% \mathfrak{G}_{P}}(\nu)|= ∑ start_POSTSUBSCRIPT italic_ν ∉ over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT | over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ( italic_ν ) |
νγP(max{2εn+4log(2/β)εn,192log(n/β)n})rν|𝔊P^(ν)|absentsubscript𝜈subscript𝛾𝑃2𝜀𝑛42𝛽𝜀𝑛192𝑛𝛽𝑛subscript𝑟𝜈^subscript𝔊𝑃𝜈\displaystyle\leq\sum_{\nu\notin\gamma_{{P}}\left({\max\{\frac{2}{\varepsilon n% }+4\frac{\log(2/\beta)}{\varepsilon n},\frac{192\log(n/\beta)}{n}\}}\right)}r_% {\nu}|\widehat{\mathfrak{G}_{P}}(\nu)|≤ ∑ start_POSTSUBSCRIPT italic_ν ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( roman_max { divide start_ARG 2 end_ARG start_ARG italic_ε italic_n end_ARG + 4 divide start_ARG roman_log ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG , divide start_ARG 192 roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG } ) end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT | over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ( italic_ν ) |
=νγP(max{2εn+4log(2/β)εn,192log(n/β)n})rν|𝔊P^(ν)𝔊P(ν)+𝔊P(ν)|absentsubscript𝜈subscript𝛾𝑃2𝜀𝑛42𝛽𝜀𝑛192𝑛𝛽𝑛subscript𝑟𝜈^subscript𝔊𝑃𝜈subscript𝔊𝑃𝜈subscript𝔊𝑃𝜈\displaystyle=\sum_{\nu\notin\gamma_{{P}}\left({\max\{\frac{2}{\varepsilon n}+% 4\frac{\log(2/\beta)}{\varepsilon n},\frac{192\log(n/\beta)}{n}\}}\right)}r_{% \nu}|\widehat{\mathfrak{G}_{P}}(\nu)-\mathfrak{G}_{P}(\nu)+\mathfrak{G}_{P}(% \nu)|= ∑ start_POSTSUBSCRIPT italic_ν ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( roman_max { divide start_ARG 2 end_ARG start_ARG italic_ε italic_n end_ARG + 4 divide start_ARG roman_log ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG , divide start_ARG 192 roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG } ) end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT | over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ( italic_ν ) - fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) + fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) |
νγP(max{2εn+4log(2/β)εn,192log(n/β)n})rν|𝔊P^(ν)𝔊P(ν)|absentsubscript𝜈subscript𝛾𝑃2𝜀𝑛42𝛽𝜀𝑛192𝑛𝛽𝑛subscript𝑟𝜈^subscript𝔊𝑃𝜈subscript𝔊𝑃𝜈\displaystyle\leq\sum_{\nu\notin\gamma_{{P}}\left({\max\{\frac{2}{\varepsilon n% }+4\frac{\log(2/\beta)}{\varepsilon n},\frac{192\log(n/\beta)}{n}\}}\right)}r_% {\nu}|\widehat{\mathfrak{G}_{P}}(\nu)-\mathfrak{G}_{P}(\nu)|≤ ∑ start_POSTSUBSCRIPT italic_ν ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( roman_max { divide start_ARG 2 end_ARG start_ARG italic_ε italic_n end_ARG + 4 divide start_ARG roman_log ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG , divide start_ARG 192 roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG } ) end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT | over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ( italic_ν ) - fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) |
+νγP(max{2εn+2log(2/β)n,192log(n/β)n})rν|𝔊P(ν)|subscript𝜈subscript𝛾𝑃2𝜀𝑛22𝛽𝑛192𝑛𝛽𝑛subscript𝑟𝜈subscript𝔊𝑃𝜈\displaystyle\hskip 144.54pt+\sum_{\nu\notin\gamma_{{P}}\left({\max\{\frac{2}{% \varepsilon n}+2\frac{\log(2/\beta)}{n},\frac{192\log(n/\beta)}{n}\}}\right)}r% _{\nu}|\mathfrak{G}_{P}(\nu)|+ ∑ start_POSTSUBSCRIPT italic_ν ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( roman_max { divide start_ARG 2 end_ARG start_ARG italic_ε italic_n end_ARG + 2 divide start_ARG roman_log ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG , divide start_ARG 192 roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG } ) end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT | fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) |
𝔚(𝔊P,𝔊P^)+𝔚(𝔊P,𝔊P|γP(max{2εn+4log(2/β)εn,192log(n/β)n}))absent𝔚subscript𝔊𝑃^subscript𝔊𝑃𝔚subscript𝔊𝑃evaluated-atsubscript𝔊𝑃subscript𝛾𝑃2𝜀𝑛42𝛽𝜀𝑛192𝑛𝛽𝑛\displaystyle\leq\mathfrak{W}(\mathfrak{G}_{P},\widehat{\mathfrak{G}_{P}})+% \mathfrak{W}(\mathfrak{G}_{P},\mathfrak{G}_{P}|_{\gamma_{{P}}\left({\max\{% \frac{2}{\varepsilon n}+4\frac{\log(2/\beta)}{\varepsilon n},\frac{192\log(n/% \beta)}{n}\}}\right)})≤ fraktur_W ( fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT , over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ) + fraktur_W ( fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT , fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( roman_max { divide start_ARG 2 end_ARG start_ARG italic_ε italic_n end_ARG + 4 divide start_ARG roman_log ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG , divide start_ARG 192 roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG } ) end_POSTSUBSCRIPT )

as required. ∎

See 5.17

Proof of Lemma 5.17.

We first note that for any pair of sequences of real values a1,,aksubscript𝑎1subscript𝑎𝑘a_{1},\cdots,a_{k}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and b1,,bksubscript𝑏1subscript𝑏𝑘b_{1},\cdots,b_{k}italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and constant A𝐴Aitalic_A such that iai0subscript𝑖subscript𝑎𝑖0\sum_{i}a_{i}\neq 0∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ 0,

|Aaiaibi||Aaiaiai|+|aibi|=|Aai|+|aibi||Abi|+2|aibi|.𝐴subscript𝑎𝑖subscript𝑎𝑖subscript𝑏𝑖𝐴subscript𝑎𝑖subscript𝑎𝑖subscript𝑎𝑖subscript𝑎𝑖subscript𝑏𝑖𝐴subscript𝑎𝑖subscript𝑎𝑖subscript𝑏𝑖𝐴subscript𝑏𝑖2subscript𝑎𝑖subscript𝑏𝑖\sum|\frac{A}{\sum a_{i}}a_{i}-b_{i}|\leq\sum|\frac{A}{\sum a_{i}}a_{i}-a_{i}|% +|a_{i}-b_{i}|=|A-\sum a_{i}|+\sum|a_{i}-b_{i}|\leq|A-\sum b_{i}|+2\sum|a_{i}-% b_{i}|.∑ | divide start_ARG italic_A end_ARG start_ARG ∑ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ ∑ | divide start_ARG italic_A end_ARG start_ARG ∑ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | + | italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | = | italic_A - ∑ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | + ∑ | italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ | italic_A - ∑ italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | + 2 ∑ | italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | .

Also if ai=0subscript𝑎𝑖0\sum a_{i}=0∑ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 then

|Akbi||Akibik|+|ibikbi|=|Abi|+2|bi|=|Abi|+2|aibi|𝐴𝑘subscript𝑏𝑖𝐴𝑘subscript𝑖subscript𝑏𝑖𝑘subscript𝑖subscript𝑏𝑖𝑘subscript𝑏𝑖𝐴subscript𝑏𝑖2subscript𝑏𝑖𝐴subscript𝑏𝑖2subscript𝑎𝑖subscript𝑏𝑖\sum|\frac{A}{k}-b_{i}|\leq\sum|\frac{A}{k}-\frac{\sum_{i}b_{i}}{k}|+|\frac{% \sum_{i}b_{i}}{k}-b_{i}|=|A-\sum b_{i}|+2\sum|b_{i}|=|A-\sum b_{i}|+2\sum|a_{i% }-b_{i}|∑ | divide start_ARG italic_A end_ARG start_ARG italic_k end_ARG - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ ∑ | divide start_ARG italic_A end_ARG start_ARG italic_k end_ARG - divide start_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_k end_ARG | + | divide start_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_k end_ARG - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | = | italic_A - ∑ italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | + 2 ∑ | italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | = | italic_A - ∑ italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | + 2 ∑ | italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |

Let 𝔊¯superscript¯𝔊\bar{\mathfrak{G}}^{\ell}over¯ start_ARG fraktur_G end_ARG start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT be the function 𝔊¯¯𝔊\bar{\mathfrak{G}}over¯ start_ARG fraktur_G end_ARG after only levels 0,,00,\cdots,\ell0 , ⋯ , roman_ℓ have been updated. So 𝔊¯superscript¯𝔊\bar{\mathfrak{G}}^{\ell}over¯ start_ARG fraktur_G end_ARG start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT matches 𝔊¯1superscript¯𝔊1\bar{\mathfrak{G}}^{\ell-1}over¯ start_ARG fraktur_G end_ARG start_POSTSUPERSCRIPT roman_ℓ - 1 end_POSTSUPERSCRIPT on all levels except \ellroman_ℓ. Let ν𝜈\nuitalic_ν be a node in the \ellroman_ℓth level of the HST. If we suppose the sum is over the normalised children of a node ν𝜈\nuitalic_ν, A=𝔊¯1(ν)𝐴superscript¯𝔊1𝜈A=\bar{\mathfrak{G}}^{\ell-1}(\nu)italic_A = over¯ start_ARG fraktur_G end_ARG start_POSTSUPERSCRIPT roman_ℓ - 1 end_POSTSUPERSCRIPT ( italic_ν ), and for all the children νsuperscript𝜈\nu^{\prime}italic_ν start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of ν𝜈\nuitalic_ν, ai=𝔊(ν)subscript𝑎𝑖𝔊superscript𝜈a_{i}=\mathfrak{G}(\nu^{\prime})italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = fraktur_G ( italic_ν start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) and bi=𝔊P(ν)subscript𝑏𝑖subscript𝔊𝑃superscript𝜈b_{i}=\mathfrak{G}_{P}(\nu^{\prime})italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), we can see that the contribution to the Wasserstein distance by the children increases by an additive factor of |𝔊1(ν)𝔊P(ν)|superscript𝔊1𝜈subscript𝔊𝑃𝜈|\mathfrak{G}^{\ell-1}(\nu)-\mathfrak{G}_{P}(\nu)|| fraktur_G start_POSTSUPERSCRIPT roman_ℓ - 1 end_POSTSUPERSCRIPT ( italic_ν ) - fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) |. Iterating, we can see that

𝒲(P,Projection(𝔊))2=00ptν at level (r+r+1r0pt)|𝔊(ν)𝔊P(ν)|4=00ptν at level r|𝔊(ν)𝔊P(ν)|,𝒲𝑃Projection𝔊2superscriptsubscript00𝑝𝑡subscript𝜈 at level subscript𝑟subscript𝑟1subscript𝑟0𝑝𝑡𝔊𝜈subscript𝔊𝑃𝜈4superscriptsubscript00𝑝𝑡subscript𝜈 at level subscript𝑟𝔊𝜈subscript𝔊𝑃𝜈\mathcal{W}(P,\texttt{Projection}(\mathfrak{G}))\leq 2\sum_{\ell=0}^{0pt}\sum_% {\nu\text{ at level }\ell}(r_{\ell}+r_{\ell+1}\cdots r_{0pt})|\mathfrak{G}(\nu% )-\mathfrak{G}_{P}(\nu)|\leq 4\sum_{\ell=0}^{0pt}\sum_{\nu\text{ at level }% \ell}r_{\ell}|\mathfrak{G}(\nu)-\mathfrak{G}_{P}(\nu)|,caligraphic_W ( italic_P , Projection ( fraktur_G ) ) ≤ 2 ∑ start_POSTSUBSCRIPT roman_ℓ = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 italic_p italic_t end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_ν at level roman_ℓ end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT roman_ℓ + 1 end_POSTSUBSCRIPT ⋯ italic_r start_POSTSUBSCRIPT 0 italic_p italic_t end_POSTSUBSCRIPT ) | fraktur_G ( italic_ν ) - fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) | ≤ 4 ∑ start_POSTSUBSCRIPT roman_ℓ = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 italic_p italic_t end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_ν at level roman_ℓ end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | fraktur_G ( italic_ν ) - fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) | ,

which is 4 times the wasserstein distance.

Appendix D Local Minimality in the High Dimensional Setting

Theorem D.1.

Given any ε>0𝜀0\varepsilon>0italic_ε > 0, and a distribution P𝑃Pitalic_P, and let n=54min{W(0.45εδ),0.6}nsuperscript𝑛54𝑊0.45𝜀𝛿0.6𝑛n^{\prime}=\frac{5}{4\min\{W(\frac{0.45\varepsilon}{\delta}),0.6\}}nitalic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG 5 end_ARG start_ARG 4 roman_min { italic_W ( divide start_ARG 0.45 italic_ε end_ARG start_ARG italic_δ end_ARG ) , 0.6 } end_ARG italic_n, then for all (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-DP algorithms 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, there exists a distribution Q𝒩(P)𝑄𝒩𝑃Q\in\mathcal{N}(P)italic_Q ∈ caligraphic_N ( italic_P ) such that with probability 1(0ptlogn+40ptεn)β10𝑝𝑡𝑛40𝑝𝑡𝜀𝑛𝛽1-(0pt\log n+40pt\varepsilon n)\beta1 - ( 0 italic_p italic_t roman_log italic_n + 40 italic_p italic_t italic_ε italic_n ) italic_β,

𝒲(Q,Qε,n^)O~(𝔼XQn,𝒜(𝒲(𝒜(X),Q))),𝒲𝑄^subscript𝑄𝜀superscript𝑛~𝑂subscript𝔼similar-to𝑋superscript𝑄𝑛superscript𝒜𝒲superscript𝒜𝑋𝑄\mathcal{W}(Q,\hat{Q_{\varepsilon,n^{\prime}}})\leq\tilde{O}(\mathbb{E}_{X\sim Q% ^{n},\mathcal{A^{\prime}}}(\mathcal{W}(\mathcal{A^{\prime}}(X),Q))),caligraphic_W ( italic_Q , over^ start_ARG italic_Q start_POSTSUBSCRIPT italic_ε , italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG ) ≤ over~ start_ARG italic_O end_ARG ( blackboard_E start_POSTSUBSCRIPT italic_X ∼ italic_Q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( caligraphic_W ( caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X ) , italic_Q ) ) ) ,

where Qε,n^^subscript𝑄𝜀superscript𝑛\hat{Q_{\varepsilon,n^{\prime}}}over^ start_ARG italic_Q start_POSTSUBSCRIPT italic_ε , italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG is the output of PrivDensityEstTree(Q)PrivDensityEstTree𝑄\texttt{PrivDensityEstTree}(Q)PrivDensityEstTree ( italic_Q ) with nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT samples.

Proof.

First, let us obtain a slightly simpler upper bound on 𝒲(P,P^ε)𝒲𝑃subscript^𝑃𝜀\mathcal{W}(P,\hat{P}_{\varepsilon})caligraphic_W ( italic_P , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ). From eqn (8) in the proof of Theorem 5.13 we have that for each level \ellroman_ℓ,

𝒲(P,(P^ε))2(𝔚((𝔊P),(𝔊P^))+𝔚((𝔊P^),(𝔊P^|γ^ε))+𝔚((𝔊P^|γ^ε),(𝔊P^n,γ^ε~))),𝒲subscript𝑃subscriptsubscript^𝑃𝜀2𝔚subscriptsubscript𝔊𝑃subscript^subscript𝔊𝑃𝔚subscript^subscript𝔊𝑃subscriptconditional^subscript𝔊𝑃subscript^𝛾𝜀𝔚subscriptconditional^subscript𝔊𝑃subscript^𝛾𝜀subscript~subscript𝔊subscript^𝑃𝑛subscript^𝛾𝜀\mathcal{W}(P_{\ell},(\hat{P}_{\varepsilon})_{\ell})\leq 2\left(\mathfrak{W}((% \mathfrak{G}_{P})_{\ell},(\widehat{\mathfrak{G}_{P}})_{\ell})+\mathfrak{W}((% \widehat{\mathfrak{G}_{P}})_{\ell},(\widehat{\mathfrak{G}_{P}}|{\hat{\gamma}_{% \varepsilon}})_{\ell})+\mathfrak{W}((\widehat{\mathfrak{G}_{P}}|{\hat{\gamma}_% {\varepsilon}})_{\ell},(\widetilde{\mathfrak{G}_{\hat{P}_{n},\hat{\gamma}_{% \varepsilon}}})_{\ell})\right),caligraphic_W ( italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) ≤ 2 ( fraktur_W ( ( fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , ( over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) + fraktur_W ( ( over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , ( over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG | over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) + fraktur_W ( ( over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG | over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , ( over~ start_ARG fraktur_G start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) ) ,

from Lemma 5.15 we have that with probability 10pt(logn+4εn)β10𝑝𝑡𝑛4𝜀𝑛𝛽1-0pt(\log n+4\varepsilon n)\beta1 - 0 italic_p italic_t ( roman_log italic_n + 4 italic_ε italic_n ) italic_β,

γP(max{2εn+4log(2/β)εn,192log(n/β)n})γ^εγP(12εn),subscript𝛾𝑃2𝜀𝑛42𝛽𝜀𝑛192𝑛𝛽𝑛subscript^𝛾𝜀subscript𝛾𝑃12𝜀𝑛\gamma_{{P}}\left({\max\left\{\frac{2}{\varepsilon n}+4\frac{\log(2/\beta)}{% \varepsilon n},\frac{192\log(n/\beta)}{n}\right\}}\right)\subset\hat{\gamma}_{% \varepsilon}\subset\gamma_{{P}}\left({\frac{1}{2\varepsilon n}}\right),italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( roman_max { divide start_ARG 2 end_ARG start_ARG italic_ε italic_n end_ARG + 4 divide start_ARG roman_log ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG , divide start_ARG 192 roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG } ) ⊂ over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ⊂ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) ,

and if one samples 4εn0pt4𝜀𝑛0𝑝𝑡4\varepsilon n0pt4 italic_ε italic_n 0 italic_p italic_t independent samples from 𝖫𝖺𝗉(1εn)𝖫𝖺𝗉1𝜀𝑛\mathsf{Lap}(\frac{1}{\varepsilon n})sansserif_Lap ( divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG ) then we have that with probability 14εn0ptβ14𝜀𝑛0𝑝𝑡𝛽1-4\varepsilon n0pt\beta1 - 4 italic_ε italic_n 0 italic_p italic_t italic_β,

sup|𝖫𝖺𝗉(1εn)|ln(2/β)εn.supremum𝖫𝖺𝗉1𝜀𝑛2𝛽𝜀𝑛\sup|\mathsf{Lap}(\frac{1}{\varepsilon n})|\leq\frac{\ln(2/\beta)}{\varepsilon n}.roman_sup | sansserif_Lap ( divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG ) | ≤ divide start_ARG roman_ln ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG .

Therefore, for all νγ^ε𝜈subscript^𝛾𝜀\nu\notin\hat{\gamma}_{\varepsilon}italic_ν ∉ over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT we have P(ν)max{2εn+4log(2/β)εn,192log(n/β)n}Cln(n/β)εnsubscript𝑃𝜈2𝜀𝑛42𝛽𝜀𝑛192𝑛𝛽𝑛𝐶𝑛𝛽𝜀𝑛P_{\ell}(\nu)\leq\max\left\{\frac{2}{\varepsilon n}+4\frac{\log(2/\beta)}{% \varepsilon n},\frac{192\log(n/\beta)}{n}\right\}\leq C\frac{\ln(n/\beta)}{% \varepsilon n}italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) ≤ roman_max { divide start_ARG 2 end_ARG start_ARG italic_ε italic_n end_ARG + 4 divide start_ARG roman_log ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG , divide start_ARG 192 roman_log ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_n end_ARG } ≤ italic_C divide start_ARG roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG for some constant C𝐶Citalic_C therefore,

𝔚((𝔊P^),(𝔊P^|γ^ε))+𝔚((𝔊P^|γ^ε),(𝔊P^n,γ^ε~))𝔚subscript^subscript𝔊𝑃subscriptconditional^subscript𝔊𝑃subscript^𝛾𝜀𝔚subscriptconditional^subscript𝔊𝑃subscript^𝛾𝜀subscript~subscript𝔊subscript^𝑃𝑛subscript^𝛾𝜀\displaystyle\mathfrak{W}((\widehat{\mathfrak{G}_{P}})_{\ell},(\widehat{% \mathfrak{G}_{P}}|{\hat{\gamma}_{\varepsilon}})_{\ell})+\mathfrak{W}((\widehat% {\mathfrak{G}_{P}}|{\hat{\gamma}_{\varepsilon}})_{\ell},(\widetilde{\mathfrak{% G}_{\hat{P}_{n},\hat{\gamma}_{\varepsilon}}})_{\ell})fraktur_W ( ( over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , ( over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG | over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) + fraktur_W ( ( over^ start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_ARG | over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , ( over~ start_ARG fraktur_G start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) νγ^εP(ν)+νγ^εln(2/β)εnabsentsubscript𝜈subscript^𝛾𝜀subscript𝑃𝜈subscript𝜈subscript^𝛾𝜀2𝛽𝜀𝑛\displaystyle\leq\sum_{\nu\notin\hat{\gamma}_{\varepsilon}}P_{\ell}(\nu)+\sum_% {\nu\in\hat{\gamma}_{\varepsilon}}\frac{\ln(2/\beta)}{\varepsilon n}≤ ∑ start_POSTSUBSCRIPT italic_ν ∉ over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) + ∑ start_POSTSUBSCRIPT italic_ν ∈ over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG roman_ln ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG
νγP(12εn)P(ν)+νγP(12εn)\γε^Cln(n/β)εn+νγ^εln(2/β)εnabsentsubscript𝜈subscript𝛾𝑃12𝜀𝑛subscript𝑃𝜈subscript𝜈\subscript𝛾𝑃12𝜀𝑛^subscript𝛾𝜀𝐶𝑛𝛽𝜀𝑛subscript𝜈subscript^𝛾𝜀2𝛽𝜀𝑛\displaystyle\leq\sum_{\nu\notin\gamma_{{P}}\left({\frac{1}{2\varepsilon n}}% \right)}P_{\ell}(\nu)+\sum_{\nu\in\gamma_{{P}}\left({\frac{1}{2\varepsilon n}}% \right)\backslash\hat{\gamma_{\varepsilon}}}C\frac{\ln(n/\beta)}{\varepsilon n% }+\sum_{\nu\in\hat{\gamma}_{\varepsilon}}\frac{\ln(2/\beta)}{\varepsilon n}≤ ∑ start_POSTSUBSCRIPT italic_ν ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) + ∑ start_POSTSUBSCRIPT italic_ν ∈ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) \ over^ start_ARG italic_γ start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT italic_C divide start_ARG roman_ln ( start_ARG italic_n / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG + ∑ start_POSTSUBSCRIPT italic_ν ∈ over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG roman_ln ( start_ARG 2 / italic_β end_ARG ) end_ARG start_ARG italic_ε italic_n end_ARG
νγP(12εn)P(ν)+Cln(n/β)νγP(12εn)1εn.absentsubscript𝜈subscript𝛾𝑃12𝜀𝑛subscript𝑃𝜈𝐶𝑛𝛽subscript𝜈subscript𝛾𝑃12𝜀𝑛1𝜀𝑛\displaystyle\leq\sum_{\nu\notin\gamma_{{P}}\left({\frac{1}{2\varepsilon n}}% \right)}P_{\ell}(\nu)+C\ln(n/\beta)\sum_{\nu\in\gamma_{{P}}\left({\frac{1}{2% \varepsilon n}}\right)}\frac{1}{\varepsilon n}.≤ ∑ start_POSTSUBSCRIPT italic_ν ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ν ) + italic_C roman_ln ( start_ARG italic_n / italic_β end_ARG ) ∑ start_POSTSUBSCRIPT italic_ν ∈ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG .

For the same reason as in the proof of Theorem 5.13, we can upper bound νγP(12εn)1εnsubscript𝜈subscript𝛾𝑃12𝜀𝑛1𝜀𝑛\sum_{\nu\in\gamma_{{P}}\left({\frac{1}{2\varepsilon n}}\right)}\frac{1}{% \varepsilon n}∑ start_POSTSUBSCRIPT italic_ν ∈ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG by (|γP(12εn)|1)1εnsubscript𝛾𝑃12𝜀𝑛11𝜀𝑛(|\gamma_{{P}}\left({\frac{1}{2\varepsilon n}}\right)|-1)\frac{1}{\varepsilon n}( | italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) | - 1 ) divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG by dealing with the |γP(12εn)|=1subscript𝛾𝑃12𝜀𝑛1|\gamma_{{P}}\left({\frac{1}{2\varepsilon n}}\right)|=1| italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) | = 1 case separately. Therefore,

𝒲(P,P^ε)𝒲𝑃subscript^𝑃𝜀\displaystyle\mathcal{W}(P,\hat{P}_{\varepsilon})caligraphic_W ( italic_P , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT )
2Cln(n/β)(νmin{𝔊P(ν)(1𝔊P(ν)),𝔊P(ν)(1𝔊P(ν))n}+νγP(12εn)𝔊P(ν)+(|γP(12εn)|1)1εn),absent2𝐶𝑛𝛽subscript𝜈subscript𝔊𝑃𝜈1subscript𝔊𝑃𝜈subscript𝔊𝑃𝜈1subscript𝔊𝑃𝜈𝑛subscript𝜈subscript𝛾𝑃12𝜀𝑛subscript𝔊𝑃𝜈subscript𝛾𝑃12𝜀𝑛11𝜀𝑛\displaystyle\leq 2C\ln(n/\beta)\left(\sum_{\nu}\min\left\{\mathfrak{G}_{P}(% \nu)(1-\mathfrak{G}_{P}(\nu)),\sqrt{\frac{\mathfrak{G}_{P}(\nu)(1-\mathfrak{G}% _{P}(\nu))}{n}}\right\}+\sum_{\nu\notin\gamma_{{P}}\left({\frac{1}{2% \varepsilon n}}\right)}\mathfrak{G}_{P}(\nu)+(|\gamma_{{P}}\left({\frac{1}{2% \varepsilon n}}\right)|-1)\frac{1}{\varepsilon n}\right),≤ 2 italic_C roman_ln ( start_ARG italic_n / italic_β end_ARG ) ( ∑ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT roman_min { fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) ( 1 - fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) ) , square-root start_ARG divide start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) ( 1 - fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) ) end_ARG start_ARG italic_n end_ARG end_ARG } + ∑ start_POSTSUBSCRIPT italic_ν ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) end_POSTSUBSCRIPT fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) + ( | italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) | - 1 ) divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG ) ,

Further, by Theorem 5.7 and Theorem 5.6, given ε>0𝜀0\varepsilon>0italic_ε > 0 and δ[0,1]𝛿01\delta\in[0,1]italic_δ ∈ [ 0 , 1 ], let κ=110εnmin{W(0.45εδ),0.6}𝜅110𝜀𝑛𝑊0.45𝜀𝛿0.6\kappa=\frac{1}{10\varepsilon n}\min\{W\left(\frac{0.45\varepsilon}{\delta}% \right),0.6\}italic_κ = divide start_ARG 1 end_ARG start_ARG 10 italic_ε italic_n end_ARG roman_min { italic_W ( divide start_ARG 0.45 italic_ε end_ARG start_ARG italic_δ end_ARG ) , 0.6 } where W(x)𝑊𝑥W(x)italic_W ( italic_x ) is the Lambert W function so W(x)eW(x)=x𝑊𝑥superscript𝑒𝑊𝑥𝑥W(x)e^{W(x)}=xitalic_W ( italic_x ) italic_e start_POSTSUPERSCRIPT italic_W ( italic_x ) end_POSTSUPERSCRIPT = italic_x. Given a distribution P𝑃Pitalic_P, there exists a constant Csuperscript𝐶C^{\prime}italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT such that

𝒩,n,ε(P)CDT(νmin{𝔊P(ν)(1𝔊P(ν)),𝔊P(ν)(1𝔊P(ν))n}+νγP(2κ)𝔊P(ν)+(|γP(2κ)|1)κ)subscript𝒩𝑛𝜀𝑃superscript𝐶subscript𝐷𝑇subscript𝜈subscript𝔊𝑃𝜈1subscript𝔊𝑃𝜈subscript𝔊𝑃𝜈1subscript𝔊𝑃𝜈𝑛subscript𝜈subscript𝛾𝑃2𝜅subscript𝔊𝑃𝜈subscript𝛾𝑃2𝜅1𝜅\mathcal{R}_{\mathcal{N},n,\varepsilon}(P)\geq\frac{C^{\prime}}{D_{T}}\left(% \sum_{\nu}\min\left\{\mathfrak{G}_{P}(\nu)(1-\mathfrak{G}_{P}(\nu)),\sqrt{% \frac{\mathfrak{G}_{P}(\nu)(1-\mathfrak{G}_{P}(\nu))}{n}}\right\}+\sum_{\nu% \notin\gamma_{{P}}\left({2\kappa}\right)}\mathfrak{G}_{P}(\nu)+(|\gamma_{{P}}% \left({2\kappa}\right)|-1)\kappa\right)caligraphic_R start_POSTSUBSCRIPT caligraphic_N , italic_n , italic_ε end_POSTSUBSCRIPT ( italic_P ) ≥ divide start_ARG italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG italic_D start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG ( ∑ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT roman_min { fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) ( 1 - fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) ) , square-root start_ARG divide start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) ( 1 - fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) ) end_ARG start_ARG italic_n end_ARG end_ARG } + ∑ start_POSTSUBSCRIPT italic_ν ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 2 italic_κ ) end_POSTSUBSCRIPT fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) + ( | italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 2 italic_κ ) | - 1 ) italic_κ )

Let Q𝒩(P)𝑄𝒩𝑃Q\in\mathcal{N}(P)italic_Q ∈ caligraphic_N ( italic_P ), then γP(1εn)γQ(12εn)γP(14εn)subscript𝛾𝑃1𝜀𝑛subscript𝛾𝑄12𝜀𝑛subscript𝛾𝑃14𝜀𝑛\gamma_{{P}}\left({\frac{1}{\varepsilon n}}\right)\subset\gamma_{{Q}}\left({% \frac{1}{2\varepsilon n}}\right)\subset\gamma_{{P}}\left({\frac{1}{4% \varepsilon n}}\right)italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG ) ⊂ italic_γ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) ⊂ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 4 italic_ε italic_n end_ARG ) so

νγQ(12εn)𝔊Q(ν)+(|γQ(12εn)|1)1εnsubscript𝜈subscript𝛾𝑄12𝜀𝑛subscript𝔊𝑄𝜈subscript𝛾𝑄12𝜀𝑛11𝜀𝑛\displaystyle\sum_{\nu\notin\gamma_{{Q}}\left({\frac{1}{2\varepsilon n}}\right% )}\mathfrak{G}_{Q}(\nu)+(|\gamma_{{Q}}\left({\frac{1}{2\varepsilon n}}\right)|% -1)\frac{1}{\varepsilon n}∑ start_POSTSUBSCRIPT italic_ν ∉ italic_γ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) end_POSTSUBSCRIPT fraktur_G start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_ν ) + ( | italic_γ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) | - 1 ) divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG =νγP(14εn)𝔊Q(ν)+νγP(14εn)\γQ(12εn)𝔊Q(ν)+(|γQ(12εn)|1)1εnabsentsubscript𝜈subscript𝛾𝑃14𝜀𝑛subscript𝔊𝑄𝜈subscript𝜈\subscript𝛾𝑃14𝜀𝑛subscript𝛾𝑄12𝜀𝑛subscript𝔊𝑄𝜈subscript𝛾𝑄12𝜀𝑛11𝜀𝑛\displaystyle=\sum_{\nu\notin\gamma_{{P}}\left({\frac{1}{4\varepsilon n}}% \right)}\mathfrak{G}_{Q}(\nu)+\sum_{\nu\in\gamma_{{P}}\left({\frac{1}{4% \varepsilon n}}\right)\backslash\gamma_{{Q}}\left({\frac{1}{2\varepsilon n}}% \right)}\mathfrak{G}_{Q}(\nu)+(|\gamma_{{Q}}\left({\frac{1}{2\varepsilon n}}% \right)|-1)\frac{1}{\varepsilon n}= ∑ start_POSTSUBSCRIPT italic_ν ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 4 italic_ε italic_n end_ARG ) end_POSTSUBSCRIPT fraktur_G start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_ν ) + ∑ start_POSTSUBSCRIPT italic_ν ∈ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 4 italic_ε italic_n end_ARG ) \ italic_γ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) end_POSTSUBSCRIPT fraktur_G start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_ν ) + ( | italic_γ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) | - 1 ) divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG
νγP(14εn)2𝔊P(ν)+νγP(14εn)\γQ(12εn)1εn+(|γQ(12εn)|1)1εnabsentsubscript𝜈subscript𝛾𝑃14𝜀𝑛2subscript𝔊𝑃𝜈subscript𝜈\subscript𝛾𝑃14𝜀𝑛subscript𝛾𝑄12𝜀𝑛1𝜀𝑛subscript𝛾𝑄12𝜀𝑛11𝜀𝑛\displaystyle\leq\sum_{\nu\notin\gamma_{{P}}\left({\frac{1}{4\varepsilon n}}% \right)}2\mathfrak{G}_{P}(\nu)+\sum_{\nu\in\gamma_{{P}}\left({\frac{1}{4% \varepsilon n}}\right)\backslash\gamma_{{Q}}\left({\frac{1}{2\varepsilon n}}% \right)}\frac{1}{\varepsilon n}+(|\gamma_{{Q}}\left({\frac{1}{2\varepsilon n}}% \right)|-1)\frac{1}{\varepsilon n}≤ ∑ start_POSTSUBSCRIPT italic_ν ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 4 italic_ε italic_n end_ARG ) end_POSTSUBSCRIPT 2 fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) + ∑ start_POSTSUBSCRIPT italic_ν ∈ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 4 italic_ε italic_n end_ARG ) \ italic_γ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG + ( | italic_γ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n end_ARG ) | - 1 ) divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG
νγP(14εn)2𝔊P(ν)+(|γP(14εn)|1)1εn.absentsubscript𝜈subscript𝛾𝑃14𝜀𝑛2subscript𝔊𝑃𝜈subscript𝛾𝑃14𝜀𝑛11𝜀𝑛\displaystyle\leq\sum_{\nu\notin\gamma_{{P}}\left({\frac{1}{4\varepsilon n}}% \right)}2\mathfrak{G}_{P}(\nu)+(|\gamma_{{P}}\left({\frac{1}{4\varepsilon n}}% \right)|-1)\frac{1}{\varepsilon n}.≤ ∑ start_POSTSUBSCRIPT italic_ν ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 4 italic_ε italic_n end_ARG ) end_POSTSUBSCRIPT 2 fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) + ( | italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 4 italic_ε italic_n end_ARG ) | - 1 ) divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG .

Now, let n=54min{W(0.45εδ),0.6}nnsuperscript𝑛54𝑊0.45𝜀𝛿0.6𝑛𝑛n^{\prime}=\frac{5}{4\min\{W(\frac{0.45\varepsilon}{\delta}),0.6\}}n\geq nitalic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG 5 end_ARG start_ARG 4 roman_min { italic_W ( divide start_ARG 0.45 italic_ε end_ARG start_ARG italic_δ end_ARG ) , 0.6 } end_ARG italic_n ≥ italic_n so for all Q𝒩(P)𝑄𝒩𝑃Q\in\mathcal{N}(P)italic_Q ∈ caligraphic_N ( italic_P ),

𝒲(Q,Qε,n^)𝒲𝑄^subscript𝑄𝜀superscript𝑛\displaystyle\mathcal{W}(Q,\hat{Q_{\varepsilon,n^{\prime}}})caligraphic_W ( italic_Q , over^ start_ARG italic_Q start_POSTSUBSCRIPT italic_ε , italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG )
O~(νmin{𝔊Q(ν)(1𝔊Q(ν)),𝔊Q(ν)(1𝔊Q(ν))n}+νγQ(12εn)𝔊Q(ν)+(|γQ(12εn)|1)1εn)absent~𝑂subscript𝜈subscript𝔊𝑄𝜈1subscript𝔊𝑄𝜈subscript𝔊𝑄𝜈1subscript𝔊𝑄𝜈superscript𝑛subscript𝜈subscript𝛾𝑄12𝜀superscript𝑛subscript𝔊𝑄𝜈subscript𝛾𝑄12𝜀superscript𝑛11𝜀superscript𝑛\displaystyle\leq\tilde{O}\left(\sum_{\nu}\min\left\{\mathfrak{G}_{Q}(\nu)(1-% \mathfrak{G}_{Q}(\nu)),\sqrt{\frac{\mathfrak{G}_{Q}(\nu)(1-\mathfrak{G}_{Q}(% \nu))}{n^{\prime}}}\right\}+\sum_{\nu\notin\gamma_{{Q}}\left({\frac{1}{2% \varepsilon n^{\prime}}}\right)}\mathfrak{G}_{Q}(\nu)+(|\gamma_{{Q}}\left({% \frac{1}{2\varepsilon n^{\prime}}}\right)|-1)\frac{1}{\varepsilon n^{\prime}}\right)≤ over~ start_ARG italic_O end_ARG ( ∑ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT roman_min { fraktur_G start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_ν ) ( 1 - fraktur_G start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_ν ) ) , square-root start_ARG divide start_ARG fraktur_G start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_ν ) ( 1 - fraktur_G start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_ν ) ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_ARG } + ∑ start_POSTSUBSCRIPT italic_ν ∉ italic_γ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) end_POSTSUBSCRIPT fraktur_G start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_ν ) + ( | italic_γ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) | - 1 ) divide start_ARG 1 end_ARG start_ARG italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG )
O~(ν2min{𝔊P(ν)(1𝔊P(ν)),𝔊P(ν)(1𝔊P(ν))n}+νγP(14εn)2𝔊P(ν)+(|γP(14εn)|1)1εn)absent~𝑂subscript𝜈2subscript𝔊𝑃𝜈1subscript𝔊𝑃𝜈subscript𝔊𝑃𝜈1subscript𝔊𝑃𝜈superscript𝑛subscript𝜈subscript𝛾𝑃14𝜀superscript𝑛2subscript𝔊𝑃𝜈subscript𝛾𝑃14𝜀superscript𝑛11𝜀superscript𝑛\displaystyle\leq\tilde{O}\left(\sum_{\nu}2\min\left\{\mathfrak{G}_{P}(\nu)(1-% \mathfrak{G}_{P}(\nu)),\sqrt{\frac{\mathfrak{G}_{P}(\nu)(1-\mathfrak{G}_{P}(% \nu))}{n^{\prime}}}\right\}+\sum_{\nu\notin\gamma_{{P}}\left({\frac{1}{4% \varepsilon n^{\prime}}}\right)}2\mathfrak{G}_{P}(\nu)+(|\gamma_{{P}}\left({% \frac{1}{4\varepsilon n^{\prime}}}\right)|-1)\frac{1}{\varepsilon n^{\prime}}\right)≤ over~ start_ARG italic_O end_ARG ( ∑ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT 2 roman_min { fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) ( 1 - fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) ) , square-root start_ARG divide start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) ( 1 - fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_ARG } + ∑ start_POSTSUBSCRIPT italic_ν ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 4 italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) end_POSTSUBSCRIPT 2 fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) + ( | italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 4 italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) | - 1 ) divide start_ARG 1 end_ARG start_ARG italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG )
=O~(ν2min{𝔊P(ν)(1𝔊P(ν)),𝔊P(ν)(1𝔊P(ν))n}+νγP(2κ)2𝔊P(ν)+(|γP(2κ)|1)1εn)absent~𝑂subscript𝜈2subscript𝔊𝑃𝜈1subscript𝔊𝑃𝜈subscript𝔊𝑃𝜈1subscript𝔊𝑃𝜈𝑛subscript𝜈subscript𝛾𝑃2𝜅2subscript𝔊𝑃𝜈subscript𝛾𝑃2𝜅11𝜀𝑛\displaystyle=\tilde{O}\left(\sum_{\nu}2\min\left\{\mathfrak{G}_{P}(\nu)(1-% \mathfrak{G}_{P}(\nu)),\sqrt{\frac{\mathfrak{G}_{P}(\nu)(1-\mathfrak{G}_{P}(% \nu))}{n}}\right\}+\sum_{\nu\notin\gamma_{{P}}\left({2\kappa}\right)}2% \mathfrak{G}_{P}(\nu)+(|\gamma_{{P}}\left({2\kappa}\right)|-1)\frac{1}{% \varepsilon n}\right)= over~ start_ARG italic_O end_ARG ( ∑ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT 2 roman_min { fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) ( 1 - fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) ) , square-root start_ARG divide start_ARG fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) ( 1 - fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) ) end_ARG start_ARG italic_n end_ARG end_ARG } + ∑ start_POSTSUBSCRIPT italic_ν ∉ italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 2 italic_κ ) end_POSTSUBSCRIPT 2 fraktur_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_ν ) + ( | italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 2 italic_κ ) | - 1 ) divide start_ARG 1 end_ARG start_ARG italic_ε italic_n end_ARG )
O~(min𝒜maxQ𝒩(P)𝔼XQn,𝒜(𝒲(𝒜(X),Q))).absent~𝑂subscriptsuperscript𝒜subscriptsuperscript𝑄𝒩𝑃subscript𝔼similar-to𝑋superscript𝑄𝑛superscript𝒜𝒲superscript𝒜𝑋superscript𝑄\displaystyle\leq\tilde{O}\left(\min_{\mathcal{A}^{\prime}}\max_{Q^{\prime}\in% \mathcal{N}(P)}\mathbb{E}_{X\sim Q^{\prime n},\mathcal{A^{\prime}}}(\mathcal{W% }(\mathcal{A^{\prime}}(X),Q^{\prime}))\right).≤ over~ start_ARG italic_O end_ARG ( roman_min start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_N ( italic_P ) end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_X ∼ italic_Q start_POSTSUPERSCRIPT ′ italic_n end_POSTSUPERSCRIPT , caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( caligraphic_W ( caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X ) , italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ) .

As in Proposition 3.5, since 𝒩(P)𝒩𝑃\mathcal{N}(P)caligraphic_N ( italic_P ) is compact, for all 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, there exists a specific Q𝒩(P)superscript𝑄𝒩𝑃Q^{*}\in\mathcal{N}(P)italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_N ( italic_P ) such that

𝒲(Q,Qε,n^)O~(𝔼X(Q)n,𝒜(𝒲(𝒜(X),Q)))𝒲superscript𝑄^subscriptsuperscript𝑄𝜀superscript𝑛~𝑂subscript𝔼similar-to𝑋superscriptsuperscript𝑄𝑛superscript𝒜𝒲superscript𝒜𝑋superscript𝑄\mathcal{W}(Q^{*},\hat{Q^{*}_{\varepsilon,n^{\prime}}})\leq\tilde{O}(\mathbb{E% }_{X\sim(Q^{*})^{n},\mathcal{A^{\prime}}}(\mathcal{W}(\mathcal{A^{\prime}}(X),% Q^{*})))caligraphic_W ( italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , over^ start_ARG italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ε , italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG ) ≤ over~ start_ARG italic_O end_ARG ( blackboard_E start_POSTSUBSCRIPT italic_X ∼ ( italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( caligraphic_W ( caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X ) , italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) )

Appendix E Differentially Private Quantiles

Estimating appropriately chosen quantiles is the main part of our algorithm for approximating the distribution over \mathbb{R}blackboard_R in Wasserstein distance, and so in this section, we describe some known differentially private algorithms for this task and derive some corollaries that we use extensively in our application. We will use F𝐹Fitalic_F to represent CDF functions, with FPsubscript𝐹𝑃F_{P}italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT representing the CDF of distribution P𝑃Pitalic_P. We start by stating an important theorem on private CDF estimation. This follows from a use of the binary tree mechanism [CSS11, DNPR10]. A version of this theorem for approximate differential privacy is described in a survey by Kamath and Ullman [KU20, Theorem 4.1]. The version presented here for pure differential privacy follows from a very similar argument, except using Laplace Noise instead of Gaussian noise (and basic composition instead of advanced composition to analyze privacy). Their accuracy was also in expectation, but a similar analysis yields a high probability bound, as in the theorem below.

Theorem E.1.

[KU20, Theorem 4.1]

Let ε,β(0,1]𝜀𝛽01\varepsilon,\beta\in(0,1]italic_ε , italic_β ∈ ( 0 , 1 ], let D𝐷Ditalic_D be an ordered, finite domain, and let 𝐱Dn𝐱superscript𝐷𝑛{\bf x}\in D^{n}bold_x ∈ italic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be a dataset. Let P^nsubscript^𝑃𝑛\hat{P}_{n}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be the uniform distribution on 𝐱𝐱{\bf x}bold_x. Then, there exists an ε𝜀\varepsilonitalic_ε-DP algorithm ACDFsuperscript𝐴𝐶𝐷𝐹A^{CDF}italic_A start_POSTSUPERSCRIPT italic_C italic_D italic_F end_POSTSUPERSCRIPT that on input 𝐱𝐱{\bf x}bold_x and the domain D𝐷Ditalic_D outputs a vector G𝐺Gitalic_G over D𝐷Ditalic_D such that with probability at least 1β1𝛽1-\beta1 - italic_β over the randomness of ACDFsuperscript𝐴𝐶𝐷𝐹A^{CDF}italic_A start_POSTSUPERSCRIPT italic_C italic_D italic_F end_POSTSUPERSCRIPT:

GFP^n=O(log3|D|βεn).subscriptnorm𝐺subscript𝐹subscript^𝑃𝑛𝑂superscript3𝐷𝛽𝜀𝑛\|G-F_{\hat{P}_{n}}\|_{\infty}=O\left(\frac{\log^{3}\frac{|D|}{\beta}}{% \varepsilon n}\right).∥ italic_G - italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_O ( divide start_ARG roman_log start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT divide start_ARG | italic_D | end_ARG start_ARG italic_β end_ARG end_ARG start_ARG italic_ε italic_n end_ARG ) .

CDF estimation is intimately related to quantile estimation, and we use the following quantitative statement that will follow from a simple application of Theorem E.1.

Theorem E.2.

Fix any n>0𝑛0n>0italic_n > 0, ε,β(0,1]𝜀𝛽01\varepsilon,\beta\in(0,1]italic_ε , italic_β ∈ ( 0 , 1 ], a,b𝑎𝑏a,b\in\mathbb{R}italic_a , italic_b ∈ blackboard_R, and γ<ba𝛾𝑏𝑎\gamma<b-a\in\mathbb{R}italic_γ < italic_b - italic_a ∈ blackboard_R such that baγ𝑏𝑎𝛾\frac{b-a}{\gamma}divide start_ARG italic_b - italic_a end_ARG start_ARG italic_γ end_ARG is an integer. Let C𝐶Citalic_C be a sufficiently large constant. Then, there exists an ε𝜀\varepsilonitalic_ε-DP algorithm Aquantsubscript𝐴𝑞𝑢𝑎𝑛𝑡A_{quant}italic_A start_POSTSUBSCRIPT italic_q italic_u italic_a italic_n italic_t end_POSTSUBSCRIPT, that on input interval end points a,b𝑎𝑏a,bitalic_a , italic_b, granularity γ𝛾\gammaitalic_γ, 𝐱=(x1,,xn){a,a+γ,,bγ,b}n𝐱subscript𝑥1subscript𝑥𝑛superscript𝑎𝑎𝛾𝑏𝛾𝑏𝑛{\bf x}=(x_{1},\dots,x_{n})\in\{a,a+\gamma,\dots,b-\gamma,b\}^{n}bold_x = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∈ { italic_a , italic_a + italic_γ , … , italic_b - italic_γ , italic_b } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and desired quantile values α(0,1)k𝛼superscript01𝑘{\bf\alpha}\in(0,1)^{k}italic_α ∈ ( 0 , 1 ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, outputs quantiles q~{a,a+γ,,bγ,b}k~𝑞superscript𝑎𝑎𝛾𝑏𝛾𝑏𝑘\tilde{q}\in\{a,a+\gamma,\dots,b-\gamma,b\}^{k}over~ start_ARG italic_q end_ARG ∈ { italic_a , italic_a + italic_γ , … , italic_b - italic_γ , italic_b } start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT such that with probability at least 1β1𝛽1-\beta1 - italic_β over the randomness of Aquantsubscript𝐴𝑞𝑢𝑎𝑛𝑡A_{quant}italic_A start_POSTSUBSCRIPT italic_q italic_u italic_a italic_n italic_t end_POSTSUBSCRIPT, for all r[k]𝑟delimited-[]𝑘r\in[k]italic_r ∈ [ italic_k ],

αrFP^n(q~r)Clog3baβγεn,subscript𝛼𝑟subscript𝐹subscript^𝑃𝑛subscript~𝑞𝑟𝐶superscript3𝑏𝑎𝛽𝛾𝜀𝑛\alpha_{r}-F_{\hat{P}_{n}}(\tilde{q}_{r})\leq C\frac{\log^{3}\frac{b-a}{\beta% \gamma}}{\varepsilon n},italic_α start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) ≤ italic_C divide start_ARG roman_log start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT divide start_ARG italic_b - italic_a end_ARG start_ARG italic_β italic_γ end_ARG end_ARG start_ARG italic_ε italic_n end_ARG ,

and

PryP^n(y<q~r)<αr+Clog3baβγεn,subscriptprobabilitysimilar-to𝑦subscript^𝑃𝑛𝑦subscript~𝑞𝑟subscript𝛼𝑟𝐶superscript3𝑏𝑎𝛽𝛾𝜀𝑛\Pr_{y\sim\hat{P}_{n}}(y<\tilde{q}_{r})<\alpha_{r}+C\frac{\log^{3}\frac{b-a}{% \beta\gamma}}{\varepsilon n},roman_Pr start_POSTSUBSCRIPT italic_y ∼ over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_y < over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) < italic_α start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT + italic_C divide start_ARG roman_log start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT divide start_ARG italic_b - italic_a end_ARG start_ARG italic_β italic_γ end_ARG end_ARG start_ARG italic_ε italic_n end_ARG ,

where P^nsubscript^𝑃𝑛\hat{P}_{n}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the uniform distribution on the entries of 𝐱𝐱{\bf x}bold_x.

Proof.

Algorithm Aquantsubscript𝐴𝑞𝑢𝑎𝑛𝑡A_{quant}italic_A start_POSTSUBSCRIPT italic_q italic_u italic_a italic_n italic_t end_POSTSUBSCRIPT operates by running the algorithm ACDFsuperscript𝐴𝐶𝐷𝐹A^{CDF}italic_A start_POSTSUPERSCRIPT italic_C italic_D italic_F end_POSTSUPERSCRIPT referenced in Theorem E.1 on 𝐱𝐱{\bf x}bold_x and domain {a,a+γ,,bγ,b}𝑎𝑎𝛾𝑏𝛾𝑏\{a,a+\gamma,\dots,b-\gamma,b\}{ italic_a , italic_a + italic_γ , … , italic_b - italic_γ , italic_b }, and postprocessing its outputs to get quantile estimates as follows. For every quantile αrsubscript𝛼𝑟\alpha_{r}italic_α start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT that we are asked to estimate, Aquantsubscript𝐴𝑞𝑢𝑎𝑛𝑡A_{quant}italic_A start_POSTSUBSCRIPT italic_q italic_u italic_a italic_n italic_t end_POSTSUBSCRIPT simply scans the vector G𝐺Gitalic_G output by algorithm ACDFsuperscript𝐴𝐶𝐷𝐹A^{CDF}italic_A start_POSTSUPERSCRIPT italic_C italic_D italic_F end_POSTSUPERSCRIPT in order, and outputs the first domain element whose CDF estimate in G𝐺Gitalic_G crosses αrsubscript𝛼𝑟\alpha_{r}italic_α start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. Conditioned on the accuracy of the CDF estimation algorithm G𝐺Gitalic_G, we have that this output q~rsubscript~𝑞𝑟\tilde{q}_{r}over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT satisfies

αrFP^n(q~r)Clog3baβγεn.subscript𝛼𝑟subscript𝐹subscript^𝑃𝑛subscript~𝑞𝑟𝐶superscript3𝑏𝑎𝛽𝛾𝜀𝑛\alpha_{r}-F_{\hat{P}_{n}}(\tilde{q}_{r})\leq C\frac{\log^{3}\frac{b-a}{\beta% \gamma}}{\varepsilon n}.italic_α start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) ≤ italic_C divide start_ARG roman_log start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT divide start_ARG italic_b - italic_a end_ARG start_ARG italic_β italic_γ end_ARG end_ARG start_ARG italic_ε italic_n end_ARG .

Additionally, since q~rsubscript~𝑞𝑟\tilde{q}_{r}over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT is the first domain element whose estimate in G𝐺Gitalic_G crosses αrsubscript𝛼𝑟\alpha_{r}italic_α start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, we also have that

PryP^n(y<q~r)<αr+Clog3baβγεn.𝑃subscript𝑟similar-to𝑦subscript^𝑃𝑛𝑦subscript~𝑞𝑟subscript𝛼𝑟𝐶superscript3𝑏𝑎𝛽𝛾𝜀𝑛Pr_{y\sim\hat{P}_{n}}(y<\tilde{q}_{r})<\alpha_{r}+C\frac{\log^{3}\frac{b-a}{% \beta\gamma}}{\varepsilon n}.italic_P italic_r start_POSTSUBSCRIPT italic_y ∼ over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_y < over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) < italic_α start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT + italic_C divide start_ARG roman_log start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT divide start_ARG italic_b - italic_a end_ARG start_ARG italic_β italic_γ end_ARG end_ARG start_ARG italic_ε italic_n end_ARG .

Hence, with probability at least 1β1𝛽1-\beta1 - italic_β, we have this property for all r[k]𝑟delimited-[]𝑘r\in[k]italic_r ∈ [ italic_k ]. ∎

We now state a corollary of this theorem that we will use extensively in our presentation.

Corollary E.3.

Fix any ε,β(0,1]𝜀𝛽01\varepsilon,\beta\in(0,1]italic_ε , italic_β ∈ ( 0 , 1 ], a,b𝑎𝑏a,b\in\mathbb{R}italic_a , italic_b ∈ blackboard_R, and γ<ba𝛾𝑏𝑎\gamma<b-a\in\mathbb{R}italic_γ < italic_b - italic_a ∈ blackboard_R such that baγ𝑏𝑎𝛾\frac{b-a}{\gamma}divide start_ARG italic_b - italic_a end_ARG start_ARG italic_γ end_ARG is an integer. Let n>4c2log4(baβγε)ε𝑛4subscript𝑐2superscript4𝑏𝑎𝛽𝛾𝜀𝜀n\in\mathbb{N}>\frac{4c_{2}\log^{4}(\frac{b-a}{\beta\gamma\varepsilon})}{\varepsilon}italic_n ∈ blackboard_N > divide start_ARG 4 italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_log start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ( divide start_ARG italic_b - italic_a end_ARG start_ARG italic_β italic_γ italic_ε end_ARG ) end_ARG start_ARG italic_ε end_ARG, such that k𝑘kitalic_k set to εn4c3log3baβγlognβ𝜀𝑛4subscript𝑐3𝑙𝑜superscript𝑔3𝑏𝑎𝛽𝛾𝑛𝛽\lceil\frac{\varepsilon n}{4c_{3}log^{3}\frac{b-a}{\beta\gamma}\log\frac{n}{% \beta}}\rceil⌈ divide start_ARG italic_ε italic_n end_ARG start_ARG 4 italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_l italic_o italic_g start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT divide start_ARG italic_b - italic_a end_ARG start_ARG italic_β italic_γ end_ARG roman_log divide start_ARG italic_n end_ARG start_ARG italic_β end_ARG end_ARG ⌉ is an integer greater than or equal to 1111, where c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and c3subscript𝑐3c_{3}italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are sufficiently large constants. 666k𝑘kitalic_k is set to be sufficiently small in order to relate the accuracy of the quantiles algorithm to a parameter depending on k𝑘kitalic_k, and n𝑛nitalic_n is set sufficiently large that k𝑘kitalic_k is not less than 1111. The dependence on β𝛽\betaitalic_β comes up in the proof of 6.18.

Then, there exists an ε𝜀\varepsilonitalic_ε-DP algorithm Aquantsubscript𝐴𝑞𝑢𝑎𝑛𝑡A_{quant}italic_A start_POSTSUBSCRIPT italic_q italic_u italic_a italic_n italic_t end_POSTSUBSCRIPT (the same one referenced in Theorem E.2), that on input interval end points a,b𝑎𝑏a,bitalic_a , italic_b, granularity γ𝛾\gammaitalic_γ, 𝐱=(x1,,xn){a,a+γ,,bγ,b}n𝐱subscript𝑥1subscript𝑥𝑛superscript𝑎𝑎𝛾𝑏𝛾𝑏𝑛{\bf x}=(x_{1},\dots,x_{n})\in\{a,a+\gamma,\dots,b-\gamma,b\}^{n}bold_x = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∈ { italic_a , italic_a + italic_γ , … , italic_b - italic_γ , italic_b } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and desired quantile values α={1/2k,3/2k,5/2k,,(2k1)/2k}𝛼12𝑘32𝑘52𝑘2𝑘12𝑘{\bf\alpha}=\{1/2k,3/2k,5/2k,\dots,(2k-1)/2k\}italic_α = { 1 / 2 italic_k , 3 / 2 italic_k , 5 / 2 italic_k , … , ( 2 italic_k - 1 ) / 2 italic_k }, outputs quantiles q~{a,a+γ,,bγ,b}k~𝑞superscript𝑎𝑎𝛾𝑏𝛾𝑏𝑘\tilde{q}\in\{a,a+\gamma,\dots,b-\gamma,b\}^{k}over~ start_ARG italic_q end_ARG ∈ { italic_a , italic_a + italic_γ , … , italic_b - italic_γ , italic_b } start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT such that with probability at least 1β1𝛽1-\beta1 - italic_β, for all r[k]𝑟delimited-[]𝑘r\in[k]italic_r ∈ [ italic_k ],

q^2r12k14kq~rq^2r12k+14k,subscript^𝑞2𝑟12𝑘14𝑘subscript~𝑞𝑟subscript^𝑞2𝑟12𝑘14𝑘\hat{q}_{\frac{2r-1}{2k}-\frac{1}{4k}}\leq\tilde{q}_{r}\leq\hat{q}_{\frac{2r-1% }{2k}+\frac{1}{4k}},over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT divide start_ARG 2 italic_r - 1 end_ARG start_ARG 2 italic_k end_ARG - divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG end_POSTSUBSCRIPT ≤ over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ≤ over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT divide start_ARG 2 italic_r - 1 end_ARG start_ARG 2 italic_k end_ARG + divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG end_POSTSUBSCRIPT ,

where P^nsubscript^𝑃𝑛\hat{P}_{n}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the uniform distribution on the entries of 𝐱𝐱{\bf x}bold_x and for all p(0,1)𝑝01p\in(0,1)italic_p ∈ ( 0 , 1 ), q^psubscript^𝑞𝑝\hat{q}_{p}over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is the p𝑝pitalic_p-quantile of P^nsubscript^𝑃𝑛\hat{P}_{n}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

Proof.

First, note that k𝑘kitalic_k is set such that 14kClog3baβγεn14𝑘𝐶𝑙𝑜superscript𝑔3𝑏𝑎𝛽𝛾𝜀𝑛\frac{1}{4k}\geq C\frac{log^{3}\frac{b-a}{\beta\gamma}}{\varepsilon n}divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG ≥ italic_C divide start_ARG italic_l italic_o italic_g start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT divide start_ARG italic_b - italic_a end_ARG start_ARG italic_β italic_γ end_ARG end_ARG start_ARG italic_ε italic_n end_ARG.

Hence, by Theorem E.2, we have that with probability at least 0.990.990.990.99,

for all r[k]𝑟delimited-[]𝑘r\in[k]italic_r ∈ [ italic_k ],

2r12kFP^n(q~r)14k,2𝑟12𝑘subscript𝐹subscript^𝑃𝑛subscript~𝑞𝑟14𝑘\frac{2r-1}{2k}-F_{\hat{P}_{n}}(\tilde{q}_{r})\leq\frac{1}{4k},divide start_ARG 2 italic_r - 1 end_ARG start_ARG 2 italic_k end_ARG - italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) ≤ divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG ,

and

PryP^n(y<q~r)<2r12k+14k.subscriptprobabilitysimilar-to𝑦subscript^𝑃𝑛𝑦subscript~𝑞𝑟2𝑟12𝑘14𝑘\Pr_{y\sim\hat{P}_{n}}(y<\tilde{q}_{r})<\frac{2r-1}{2k}+\frac{1}{4k}.roman_Pr start_POSTSUBSCRIPT italic_y ∼ over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_y < over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) < divide start_ARG 2 italic_r - 1 end_ARG start_ARG 2 italic_k end_ARG + divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG .

Condition the event above for the rest of the proof. Note that the first equation implies that for all r[k]𝑟delimited-[]𝑘r\in[k]italic_r ∈ [ italic_k ],

PryP^n(yqr~)2r12k14k,subscriptprobabilitysimilar-to𝑦subscript^𝑃𝑛𝑦~subscript𝑞𝑟2𝑟12𝑘14𝑘\Pr_{y\sim\hat{P}_{n}}(y\leq\tilde{q_{r}})\geq\frac{2r-1}{2k}-\frac{1}{4k},roman_Pr start_POSTSUBSCRIPT italic_y ∼ over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_y ≤ over~ start_ARG italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG ) ≥ divide start_ARG 2 italic_r - 1 end_ARG start_ARG 2 italic_k end_ARG - divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG ,

which implies that qr~q^2r12k14k~subscript𝑞𝑟subscript^𝑞2𝑟12𝑘14𝑘\tilde{q_{r}}\geq\hat{q}_{\frac{2r-1}{2k}-\frac{1}{4k}}over~ start_ARG italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG ≥ over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT divide start_ARG 2 italic_r - 1 end_ARG start_ARG 2 italic_k end_ARG - divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG end_POSTSUBSCRIPT.

Next, note that we also have that for all r[k]𝑟delimited-[]𝑘r\in[k]italic_r ∈ [ italic_k ],

PryP^n(y<q~r)<2r12k+14k.subscriptprobabilitysimilar-to𝑦subscript^𝑃𝑛𝑦subscript~𝑞𝑟2𝑟12𝑘14𝑘\Pr_{y\sim\hat{P}_{n}}(y<\tilde{q}_{r})<\frac{2r-1}{2k}+\frac{1}{4k}.roman_Pr start_POSTSUBSCRIPT italic_y ∼ over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_y < over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) < divide start_ARG 2 italic_r - 1 end_ARG start_ARG 2 italic_k end_ARG + divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG .

This implies that for all r[k]𝑟delimited-[]𝑘r\in[k]italic_r ∈ [ italic_k ], qr~q^2r12k+14k~subscript𝑞𝑟subscript^𝑞2𝑟12𝑘14𝑘\tilde{q_{r}}\leq\hat{q}_{\frac{2r-1}{2k}+\frac{1}{4k}}over~ start_ARG italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG ≤ over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT divide start_ARG 2 italic_r - 1 end_ARG start_ARG 2 italic_k end_ARG + divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG end_POSTSUBSCRIPT. ∎

Appendix F Proofs in Section 6

F.1 Omitted Proofs in Section 6.1.1

Proof of Lemma 6.6.

We evaluate the various terms in Theorem 6.5.

We start by evaluating τ(P,Q)=max{max(fP(t)eεfQ(t),0)dt,max(fQ(t)eεfP(t),0)dt\tau(P,Q)=\max\{\int_{\mathbb{R}}\max(f_{P}(t)-e^{\varepsilon}f_{Q}(t),0)dt,% \int_{\mathbb{R}}\max(f_{Q}(t)-e^{\varepsilon}f_{P}(t),0)dtitalic_τ ( italic_P , italic_Q ) = roman_max { ∫ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT roman_max ( italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) , 0 ) italic_d italic_t , ∫ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT roman_max ( italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) - italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) , 0 ) italic_d italic_t }. Consider the first term in the outer maximum. For all t[L(P),q1/k)𝑡𝐿𝑃subscript𝑞1𝑘t\in[L(P),q_{1/k})italic_t ∈ [ italic_L ( italic_P ) , italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT ), we have that fQ(t)=12fP(t)subscript𝑓𝑄𝑡12subscript𝑓𝑃𝑡f_{Q}(t)=\frac{1}{2}f_{P}(t)italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ). For all other t𝑡titalic_t, one can see that the value of the integrand is 00. Hence, the value of the first term is L(P)q1/kmax(fP(t)eε2fP(t),0)𝑑t=max{(1eε2)1k,0}12ksuperscriptsubscript𝐿𝑃subscript𝑞1𝑘subscript𝑓𝑃𝑡superscript𝑒𝜀2subscript𝑓𝑃𝑡0differential-d𝑡1superscript𝑒𝜀21𝑘012𝑘\int_{L(P)}^{q_{1/k}}\max(f_{P}(t)-\frac{e^{\varepsilon}}{2}f_{P}(t),0)dt=\max% \{\left(1-\frac{e^{\varepsilon}}{2}\right)\frac{1}{k},0\}\leq\frac{1}{2k}∫ start_POSTSUBSCRIPT italic_L ( italic_P ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_max ( italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - divide start_ARG italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) , 0 ) italic_d italic_t = roman_max { ( 1 - divide start_ARG italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG ) divide start_ARG 1 end_ARG start_ARG italic_k end_ARG , 0 } ≤ divide start_ARG 1 end_ARG start_ARG 2 italic_k end_ARG. Now, consider the second term in the outer maximum. For all t<q11k𝑡subscript𝑞11𝑘t<q_{1-\frac{1}{k}}italic_t < italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT, the value of the integrand is 00. For all q11ktq1subscript𝑞11𝑘𝑡subscript𝑞1q_{1-\frac{1}{k}}\leq t\leq q_{1}italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT ≤ italic_t ≤ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, the value of the integrand is max{(32eε)fP(t),0}32superscript𝑒𝜀subscript𝑓𝑃𝑡0\max\{\left(\frac{3}{2}-e^{\varepsilon}\right)f_{P}(t),0\}roman_max { ( divide start_ARG 3 end_ARG start_ARG 2 end_ARG - italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT ) italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) , 0 }. Hence, the second term is max{(32eε)1k,0}12k32superscript𝑒𝜀1𝑘012𝑘\max\{\left(\frac{3}{2}-e^{\varepsilon}\right)\frac{1}{k},0\}\leq\frac{1}{2k}roman_max { ( divide start_ARG 3 end_ARG start_ARG 2 end_ARG - italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT ) divide start_ARG 1 end_ARG start_ARG italic_k end_ARG , 0 } ≤ divide start_ARG 1 end_ARG start_ARG 2 italic_k end_ARG. Put together, we get that τ(P,Q)12k𝜏𝑃𝑄12𝑘\tau(P,Q)\leq\frac{1}{2k}italic_τ ( italic_P , italic_Q ) ≤ divide start_ARG 1 end_ARG start_ARG 2 italic_k end_ARG.

When εln2𝜀2\varepsilon\geq\ln 2italic_ε ≥ roman_ln 2, we have that 1eε201superscript𝑒𝜀201-\frac{e^{\varepsilon}}{2}\leq 01 - divide start_ARG italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG ≤ 0, and so we have that the largest value of ε[0,ε]superscript𝜀0𝜀\varepsilon^{\prime}\in[0,\varepsilon]italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ [ 0 , italic_ε ] that makes max(fQ(t)eεfP(t),0)𝑑t=τ(P,Q)=0subscriptsubscript𝑓𝑄𝑡superscript𝑒superscript𝜀subscript𝑓𝑃𝑡0differential-d𝑡𝜏𝑃𝑄0\int_{\mathbb{R}}\max(f_{Q}(t)-e^{\varepsilon^{\prime}}f_{P}(t),0)dt=\tau(P,Q)=0∫ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT roman_max ( italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) - italic_e start_POSTSUPERSCRIPT italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) , 0 ) italic_d italic_t = italic_τ ( italic_P , italic_Q ) = 0, is ε=εsuperscript𝜀𝜀\varepsilon^{\prime}=\varepsilonitalic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_ε. When ε<ln2𝜀2\varepsilon<\ln 2italic_ε < roman_ln 2, we have that the value of εsuperscript𝜀\varepsilon^{\prime}italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT that makes max(fQ(t)eεfP(t),0)𝑑t=max{(32eε)1k,0}=(1eε2)1ksubscriptsubscript𝑓𝑄𝑡superscript𝑒superscript𝜀subscript𝑓𝑃𝑡0differential-d𝑡32superscript𝑒𝜀1𝑘01superscript𝑒𝜀21𝑘\int_{\mathbb{R}}\max(f_{Q}(t)-e^{\varepsilon^{\prime}}f_{P}(t),0)dt=\max\{% \left(\frac{3}{2}-e^{\varepsilon}\right)\frac{1}{k},0\}=\left(1-\frac{e^{% \varepsilon}}{2}\right)\frac{1}{k}∫ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT roman_max ( italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) - italic_e start_POSTSUPERSCRIPT italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) , 0 ) italic_d italic_t = roman_max { ( divide start_ARG 3 end_ARG start_ARG 2 end_ARG - italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT ) divide start_ARG 1 end_ARG start_ARG italic_k end_ARG , 0 } = ( 1 - divide start_ARG italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG ) divide start_ARG 1 end_ARG start_ARG italic_k end_ARG, is ε=ln(1+eε2)superscript𝜀1superscript𝑒𝜀2\varepsilon^{\prime}=\ln\left(\frac{1+e^{\varepsilon}}{2}\right)italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_ln ( divide start_ARG 1 + italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG ).

Finally, we describe the distributions Psuperscript𝑃P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and Qsuperscript𝑄Q^{\prime}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and compute the squared Hellinger distance between them. There are two cases, based on the range of ε𝜀\varepsilonitalic_ε. First, consider εln2𝜀2\varepsilon\geq\ln 2italic_ε ≥ roman_ln 2. First, we calculate P~min{eεQ,P}~𝑃superscript𝑒𝜀𝑄𝑃\tilde{P}\equiv\min\{e^{\varepsilon}Q,P\}over~ start_ARG italic_P end_ARG ≡ roman_min { italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT italic_Q , italic_P }. This value is equal to min{eε/2,1}fP(t)=fP(t)superscript𝑒𝜀21subscript𝑓𝑃𝑡subscript𝑓𝑃𝑡\min\{e^{\varepsilon}/2,1\}f_{P}(t)=f_{P}(t)roman_min { italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT / 2 , 1 } italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) = italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) for t<q1k(P)𝑡subscript𝑞1𝑘𝑃t<q_{\frac{1}{k}}(P)italic_t < italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT ( italic_P ), and is also equal to fP(t)subscript𝑓𝑃𝑡f_{P}(t)italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) for q1ktq1subscript𝑞1𝑘𝑡subscript𝑞1q_{\frac{1}{k}}\leq t\leq q_{1}italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT ≤ italic_t ≤ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Similarly, consider Q~min{eεP,Q}=min{eεP,Q}~𝑄superscript𝑒superscript𝜀𝑃𝑄superscript𝑒𝜀𝑃𝑄\tilde{Q}\equiv\min\{e^{\varepsilon^{\prime}}P,Q\}=\min\{e^{\varepsilon}P,Q\}over~ start_ARG italic_Q end_ARG ≡ roman_min { italic_e start_POSTSUPERSCRIPT italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_P , italic_Q } = roman_min { italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT italic_P , italic_Q }; it is equal to fP(t)2subscript𝑓𝑃𝑡2\frac{f_{P}(t)}{2}divide start_ARG italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG 2 end_ARG for t<q1k(P)𝑡subscript𝑞1𝑘𝑃t<q_{\frac{1}{k}}(P)italic_t < italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT ( italic_P ), and is equal to fP(t)subscript𝑓𝑃𝑡f_{P}(t)italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) for q1ktq11ksubscript𝑞1𝑘𝑡subscript𝑞11𝑘q_{\frac{1}{k}}\leq t\leq q_{1-\frac{1}{k}}italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT ≤ italic_t ≤ italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT. It is also equal to min(eε,32)fP(t)=32fP(t)superscript𝑒𝜀32subscript𝑓𝑃𝑡32subscript𝑓𝑃𝑡\min(e^{\varepsilon},\frac{3}{2})f_{P}(t)=\frac{3}{2}f_{P}(t)roman_min ( italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT , divide start_ARG 3 end_ARG start_ARG 2 end_ARG ) italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) = divide start_ARG 3 end_ARG start_ARG 2 end_ARG italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) for q1ktq1subscript𝑞1𝑘𝑡subscript𝑞1q_{\frac{1}{k}}\leq t\leq q_{1}italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT ≤ italic_t ≤ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Since τ(P,Q)=0𝜏𝑃𝑄0\tau(P,Q)=0italic_τ ( italic_P , italic_Q ) = 0, and by the above calculations, we have that P=Psuperscript𝑃𝑃P^{\prime}=Pitalic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_P, and Q=Qsuperscript𝑄𝑄Q^{\prime}=Qitalic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_Q. Upper bounding the squared Hellinger distance between Psuperscript𝑃P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and Qsuperscript𝑄Q^{\prime}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT by the TV distance (See Lemma A.4), we get that H2(P,Q)=H2(P,Q)TV(P,Q)=12kε2(ln2)ksuperscript𝐻2superscript𝑃superscript𝑄superscript𝐻2𝑃𝑄𝑇𝑉𝑃𝑄12𝑘𝜀22𝑘H^{2}(P^{\prime},Q^{\prime})=H^{2}(P,Q)\leq TV(P,Q)=\frac{1}{2k}\leq\frac{% \varepsilon}{2(\ln 2)k}italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P , italic_Q ) ≤ italic_T italic_V ( italic_P , italic_Q ) = divide start_ARG 1 end_ARG start_ARG 2 italic_k end_ARG ≤ divide start_ARG italic_ε end_ARG start_ARG 2 ( roman_ln 2 ) italic_k end_ARG (where we have used that εln2𝜀2\varepsilon\geq\ln 2italic_ε ≥ roman_ln 2).

Next, consider ε<ln2𝜀2\varepsilon<\ln 2italic_ε < roman_ln 2. First, consider P~min{eεQ,P}~𝑃superscript𝑒𝜀𝑄𝑃\tilde{P}\equiv\min\{e^{\varepsilon}Q,P\}over~ start_ARG italic_P end_ARG ≡ roman_min { italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT italic_Q , italic_P }. This value is equal to min{eε/2,1}fP(t)=eε2fP(t)superscript𝑒𝜀21subscript𝑓𝑃𝑡superscript𝑒𝜀2subscript𝑓𝑃𝑡\min\{e^{\varepsilon}/2,1\}f_{P}(t)=\frac{e^{\varepsilon}}{2}f_{P}(t)roman_min { italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT / 2 , 1 } italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) for t<q1k(P)𝑡subscript𝑞1𝑘𝑃t<q_{\frac{1}{k}}(P)italic_t < italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT ( italic_P ), and is also equal to fP(t)subscript𝑓𝑃𝑡f_{P}(t)italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) for q1ktq1subscript𝑞1𝑘𝑡subscript𝑞1q_{\frac{1}{k}}\leq t\leq q_{1}italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT ≤ italic_t ≤ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Similarly, consider Q~min{eεP,Q}=min{1+eε2P,Q}~𝑄superscript𝑒superscript𝜀𝑃𝑄1superscript𝑒𝜀2𝑃𝑄\tilde{Q}\equiv\min\{e^{\varepsilon^{\prime}}P,Q\}=\min\{\frac{1+e^{% \varepsilon}}{2}P,Q\}over~ start_ARG italic_Q end_ARG ≡ roman_min { italic_e start_POSTSUPERSCRIPT italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_P , italic_Q } = roman_min { divide start_ARG 1 + italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG italic_P , italic_Q }; it is equal to 12fP(t)12subscript𝑓𝑃𝑡\frac{1}{2}f_{P}(t)divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) for t<q1k(P)𝑡subscript𝑞1𝑘𝑃t<q_{\frac{1}{k}}(P)italic_t < italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT ( italic_P ), and is equal to fP(t)subscript𝑓𝑃𝑡f_{P}(t)italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) at q1ktq11ksubscript𝑞1𝑘𝑡subscript𝑞11𝑘q_{\frac{1}{k}}\leq t\leq q_{1-\frac{1}{k}}italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT ≤ italic_t ≤ italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT. It is also equal to min{1+eε2,32}fP(t)=1+eε2fP(t)1superscript𝑒𝜀232subscript𝑓𝑃𝑡1superscript𝑒𝜀2subscript𝑓𝑃𝑡\min\{\frac{1+e^{\varepsilon}}{2},\frac{3}{2}\}f_{P}(t)=\frac{1+e^{\varepsilon% }}{2}f_{P}(t)roman_min { divide start_ARG 1 + italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG , divide start_ARG 3 end_ARG start_ARG 2 end_ARG } italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) = divide start_ARG 1 + italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) at q11ktq1subscript𝑞11𝑘𝑡subscript𝑞1q_{1-\frac{1}{k}}\leq t\leq q_{1}italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT ≤ italic_t ≤ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Note that τ(P,Q)=(1eε2)1k𝜏𝑃𝑄1superscript𝑒𝜀21𝑘\tau(P,Q)=\left(1-\frac{e^{\varepsilon}}{2}\right)\frac{1}{k}italic_τ ( italic_P , italic_Q ) = ( 1 - divide start_ARG italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG ) divide start_ARG 1 end_ARG start_ARG italic_k end_ARG. Psuperscript𝑃P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and Qsuperscript𝑄Q^{\prime}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are the distributions created by normalizing P~~𝑃\tilde{P}over~ start_ARG italic_P end_ARG and Q~~𝑄\tilde{Q}over~ start_ARG italic_Q end_ARG by dividing by a factor of 1τ(P,Q)1𝜏𝑃𝑄1-\tau(P,Q)1 - italic_τ ( italic_P , italic_Q ). Now, we upper bound the squared Hellinger distance between Psuperscript𝑃P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and Qsuperscript𝑄Q^{\prime}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT by the TV distance (See Lemma A.4), to get that H2(P,Q)TV(P,Q)=O(εk)superscript𝐻2superscript𝑃superscript𝑄𝑇𝑉superscript𝑃superscript𝑄𝑂𝜀𝑘H^{2}(P^{\prime},Q^{\prime})\leq TV(P^{\prime},Q^{\prime})=O(\frac{\varepsilon% }{k})italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ≤ italic_T italic_V ( italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_O ( divide start_ARG italic_ε end_ARG start_ARG italic_k end_ARG ).

Substituting into the lower bound for sample complexity of distinguishing P𝑃Pitalic_P and Q𝑄Qitalic_Q, this tells us that for all ε(0,1]𝜀01\varepsilon\in(0,1]italic_ε ∈ ( 0 , 1 ], SCε(P,Q)=Ω(1ε1k)=Ω(k/ε)𝑆subscript𝐶𝜀𝑃𝑄Ω1𝜀1𝑘Ω𝑘𝜀SC_{\varepsilon}(P,Q)=\Omega\left(\frac{1}{\varepsilon\cdot\frac{1}{k}}\right)% =\Omega(k/\varepsilon)italic_S italic_C start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ( italic_P , italic_Q ) = roman_Ω ( divide start_ARG 1 end_ARG start_ARG italic_ε ⋅ divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_ARG ) = roman_Ω ( italic_k / italic_ε ).

Proof of Lemma 6.7.

Note that P𝑃Pitalic_P has bounded expectation (and hence, so does Q𝑄Qitalic_Q). Hence, we can use the following form of the Wasserstein distance:

𝒲(P,Q)=|FP(t)FQ(t)|𝑑t.𝒲𝑃𝑄subscriptsubscript𝐹𝑃𝑡subscript𝐹𝑄𝑡differential-d𝑡\mathcal{W}(P,Q)=\int_{\mathbb{R}}|F_{P}(t)-F_{Q}(t)|dt.caligraphic_W ( italic_P , italic_Q ) = ∫ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t .

Now, given the settings of P𝑃Pitalic_P and Q𝑄Qitalic_Q, we can precisely write the forms of their cumulative distribution function as follows. Note that for L(P)t<q1/k(P)𝐿𝑃𝑡subscript𝑞1𝑘𝑃L(P)\leq t<q_{1/k}(P)italic_L ( italic_P ) ≤ italic_t < italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT ( italic_P ), we have that |FP(t)FQ(t)|=12Fp(t)subscript𝐹𝑃𝑡subscript𝐹𝑄𝑡12subscript𝐹𝑝𝑡|F_{P}(t)-F_{Q}(t)|=\frac{1}{2}F_{p}(t)| italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) | = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_F start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_t ). For q1/ktq11ksubscript𝑞1𝑘𝑡subscript𝑞11𝑘q_{1/k}\leq t\leq q_{1-\frac{1}{k}}italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT ≤ italic_t ≤ italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT, we have |FP(t)FQ(t)|=12ksubscript𝐹𝑃𝑡subscript𝐹𝑄𝑡12𝑘|F_{P}(t)-F_{Q}(t)|=\frac{1}{2k}| italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) | = divide start_ARG 1 end_ARG start_ARG 2 italic_k end_ARG. Finally, for q11ktq1subscript𝑞11𝑘𝑡subscript𝑞1q_{1-\frac{1}{k}}\leq t\leq q_{1}italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT ≤ italic_t ≤ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, we have that FP(t)=11k+q11/ktfP(t)𝑑tsubscript𝐹𝑃𝑡11𝑘superscriptsubscriptsubscript𝑞11𝑘𝑡subscript𝑓𝑃𝑡differential-d𝑡F_{P}(t)=1-\frac{1}{k}+\int_{q_{1-1/k}}^{t}f_{P}(t)dtitalic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) = 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) italic_d italic_t and FQ(t)=132k+32q11/ktfP(t)𝑑tsubscript𝐹𝑄𝑡132𝑘32superscriptsubscriptsubscript𝑞11𝑘𝑡subscript𝑓𝑃𝑡differential-d𝑡F_{Q}(t)=1-\frac{3}{2k}+\frac{3}{2}\int_{q_{1-1/k}}^{t}f_{P}(t)dtitalic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) = 1 - divide start_ARG 3 end_ARG start_ARG 2 italic_k end_ARG + divide start_ARG 3 end_ARG start_ARG 2 end_ARG ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) italic_d italic_t, which gives us that FP(t)FQ(t)=12k12q11/ktfP(t)𝑑t=12[1FP(t)]subscript𝐹𝑃𝑡subscript𝐹𝑄𝑡12𝑘12superscriptsubscriptsubscript𝑞11𝑘𝑡subscript𝑓𝑃𝑡differential-d𝑡12delimited-[]1subscript𝐹𝑃𝑡F_{P}(t)-F_{Q}(t)=\frac{1}{2k}-\frac{1}{2}\int_{q_{1-1/k}}^{t}f_{P}(t)dt=\frac% {1}{2}[1-F_{P}(t)]italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) = divide start_ARG 1 end_ARG start_ARG 2 italic_k end_ARG - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) italic_d italic_t = divide start_ARG 1 end_ARG start_ARG 2 end_ARG [ 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ].

Hence, we have that

𝒲(P,Q)𝒲𝑃𝑄\displaystyle\mathcal{W}(P,Q)caligraphic_W ( italic_P , italic_Q ) =|FP(t)FQ(t)|𝑑tabsentsubscriptsubscript𝐹𝑃𝑡subscript𝐹𝑄𝑡differential-d𝑡\displaystyle=\int_{\mathbb{R}}|F_{P}(t)-F_{Q}(t)|dt= ∫ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t
=12L(P)q1/kFP(t)𝑑t+q1/kq11k|FP(t)FQ(t)|𝑑t+q11kq1|FP(t)FQ(t)|𝑑tabsent12superscriptsubscript𝐿𝑃subscript𝑞1𝑘subscript𝐹𝑃𝑡differential-d𝑡superscriptsubscriptsubscript𝑞1𝑘subscript𝑞11𝑘subscript𝐹𝑃𝑡subscript𝐹𝑄𝑡differential-d𝑡superscriptsubscriptsubscript𝑞11𝑘subscript𝑞1subscript𝐹𝑃𝑡subscript𝐹𝑄𝑡differential-d𝑡\displaystyle=\frac{1}{2}\int_{L(P)}^{q_{1/k}}F_{P}(t)dt+\int_{q_{1/k}}^{q_{1-% \frac{1}{k}}}|F_{P}(t)-F_{Q}(t)|dt+\int_{q_{1-\frac{1}{k}}}^{q_{1}}|F_{P}(t)-F% _{Q}(t)|dt= divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∫ start_POSTSUBSCRIPT italic_L ( italic_P ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t
12L(P)q1/kFP(t)𝑑t+12q11kq1[1FP(t)]𝑑t+12k(q11kq1k)absent12superscriptsubscript𝐿𝑃subscript𝑞1𝑘subscript𝐹𝑃𝑡differential-d𝑡12superscriptsubscriptsubscript𝑞11𝑘subscript𝑞1delimited-[]1subscript𝐹𝑃𝑡differential-d𝑡12𝑘subscript𝑞11𝑘subscript𝑞1𝑘\displaystyle\geq\frac{1}{2}\int_{L(P)}^{q_{1/k}}F_{P}(t)dt+\frac{1}{2}\int_{q% _{1-\frac{1}{k}}}^{q_{1}}[1-F_{P}(t)]dt+\frac{1}{2k}(q_{1-\frac{1}{k}}-q_{% \frac{1}{k}})≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∫ start_POSTSUBSCRIPT italic_L ( italic_P ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) italic_d italic_t + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT [ 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ] italic_d italic_t + divide start_ARG 1 end_ARG start_ARG 2 italic_k end_ARG ( italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT )
=12q11kq1|FP(t)FP|q1k,q11k(t)|𝑑t+12L(P)q1/k|FP(t)FP|q1k,q11k(t)|𝑑t+12k(q11kq1k)absent12superscriptsubscriptsubscript𝑞11𝑘subscript𝑞1subscript𝐹𝑃𝑡subscript𝐹evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘𝑡differential-d𝑡12superscriptsubscript𝐿𝑃subscript𝑞1𝑘subscript𝐹𝑃𝑡subscript𝐹evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘𝑡differential-d𝑡12𝑘subscript𝑞11𝑘subscript𝑞1𝑘\displaystyle=\frac{1}{2}\int_{q_{1-\frac{1}{k}}}^{q_{1}}\Big{|}F_{P}(t)-F_{P|% _{q_{\frac{1}{k}},q_{1-\frac{1}{k}}}}(t)\Big{|}dt+\frac{1}{2}\int_{L(P)}^{q_{1% /k}}\Big{|}F_{P}(t)-F_{P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}}}(t)\Big{|}dt+% \frac{1}{2k}(q_{1-\frac{1}{k}}-q_{\frac{1}{k}})= divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∫ start_POSTSUBSCRIPT italic_L ( italic_P ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t + divide start_ARG 1 end_ARG start_ARG 2 italic_k end_ARG ( italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT )
=12k(q11kq1/k)+12𝒲(P,P|q1k,q11k)absent12𝑘subscript𝑞11𝑘subscript𝑞1𝑘12𝒲𝑃evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘\displaystyle=\frac{1}{2k}(q_{1-\frac{1}{k}}-q_{1/k})+\frac{1}{2}\mathcal{W}(P% ,P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}})= divide start_ARG 1 end_ARG start_ARG 2 italic_k end_ARG ( italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT )

F.2 Omitted proofs in Section 6.1.2

Proof of Lemma 6.10.

The KL divergence is defined as t:fQ(t)>0fP(t)logfP(t)/fQ(t)𝑑tsubscript:𝑡subscript𝑓𝑄𝑡0subscript𝑓𝑃𝑡subscript𝑓𝑃𝑡subscript𝑓𝑄𝑡differential-d𝑡\int_{t:f_{Q}(t)>0}f_{P}(t)\log f_{P}(t)/f_{Q}(t)dt∫ start_POSTSUBSCRIPT italic_t : italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) > 0 end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) roman_log italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) / italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) italic_d italic_t. This can be broken up into a sum over the dyadic quantiles as:

KL(P,Q)𝐾𝐿𝑃𝑄\displaystyle KL(P,Q)italic_K italic_L ( italic_P , italic_Q ) =i=2logn1q1/2iq1/2i1fP(t)logfP(t)fQ(t)dt+q11/2i1q11/2ifP(t)logfP(t)fQ(t)dtabsentsuperscriptsubscript𝑖2𝑛1superscriptsubscriptsubscript𝑞1superscript2𝑖subscript𝑞1superscript2𝑖1subscript𝑓𝑃𝑡subscript𝑓𝑃𝑡subscript𝑓𝑄𝑡𝑑𝑡superscriptsubscriptsubscript𝑞11superscript2𝑖1subscript𝑞11superscript2𝑖subscript𝑓𝑃𝑡subscript𝑓𝑃𝑡subscript𝑓𝑄𝑡𝑑𝑡\displaystyle=\sum_{i=2}^{\log n-1}\int_{q_{1/2^{i}}}^{q_{1/2^{i-1}}}f_{P}(t)% \log\frac{f_{P}(t)}{f_{Q}(t)}dt+\int_{q_{1-1/2^{i-1}}}^{q_{1-1/2^{i}}}f_{P}(t)% \log\frac{f_{P}(t)}{f_{Q}(t)}dt= ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 1 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) roman_log divide start_ARG italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) end_ARG italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) roman_log divide start_ARG italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) end_ARG italic_d italic_t
+i=lognq1/2iq1/2i1fP(t)logfP(t)fQ(t)dt+q11/2i1q11/2ifP(t)logfP(t)fQ(t)dtsuperscriptsubscript𝑖𝑛superscriptsubscriptsubscript𝑞1superscript2𝑖subscript𝑞1superscript2𝑖1subscript𝑓𝑃𝑡subscript𝑓𝑃𝑡subscript𝑓𝑄𝑡𝑑𝑡superscriptsubscriptsubscript𝑞11superscript2𝑖1subscript𝑞11superscript2𝑖subscript𝑓𝑃𝑡subscript𝑓𝑃𝑡subscript𝑓𝑄𝑡𝑑𝑡\displaystyle+\sum_{i=\log n}^{\infty}\int_{q_{1/2^{i}}}^{q_{1/2^{i-1}}}f_{P}(% t)\log\frac{f_{P}(t)}{f_{Q}(t)}dt+\int_{q_{1-1/2^{i-1}}}^{q_{1-1/2^{i}}}f_{P}(% t)\log\frac{f_{P}(t)}{f_{Q}(t)}dt+ ∑ start_POSTSUBSCRIPT italic_i = roman_log italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) roman_log divide start_ARG italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) end_ARG italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) roman_log divide start_ARG italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG italic_f start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) end_ARG italic_d italic_t
=i=2logn1q1/2iq1/2i1fP(t)log11+2indt+q11/2i1q11/2ifP(t)log112indtabsentsuperscriptsubscript𝑖2𝑛1superscriptsubscriptsubscript𝑞1superscript2𝑖subscript𝑞1superscript2𝑖1subscript𝑓𝑃𝑡11superscript2𝑖𝑛𝑑𝑡superscriptsubscriptsubscript𝑞11superscript2𝑖1subscript𝑞11superscript2𝑖subscript𝑓𝑃𝑡11superscript2𝑖𝑛𝑑𝑡\displaystyle=\sum_{i=2}^{\log n-1}\int_{q_{1/2^{i}}}^{q_{1/2^{i-1}}}f_{P}(t)% \log\frac{1}{1+\sqrt{\frac{2^{i}}{n}}}dt+\int_{q_{1-1/2^{i-1}}}^{q_{1-1/2^{i}}% }f_{P}(t)\log\frac{1}{1-\sqrt{\frac{2^{i}}{n}}}dt= ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 1 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) roman_log divide start_ARG 1 end_ARG start_ARG 1 + square-root start_ARG divide start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG end_ARG end_ARG italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) roman_log divide start_ARG 1 end_ARG start_ARG 1 - square-root start_ARG divide start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG end_ARG end_ARG italic_d italic_t
+i=lognq1/2iq1/2i1fP(t)log11+12dt+q11/2i1q11/2ifP(t)log1112dtsuperscriptsubscript𝑖𝑛superscriptsubscriptsubscript𝑞1superscript2𝑖subscript𝑞1superscript2𝑖1subscript𝑓𝑃𝑡1112𝑑𝑡superscriptsubscriptsubscript𝑞11superscript2𝑖1subscript𝑞11superscript2𝑖subscript𝑓𝑃𝑡1112𝑑𝑡\displaystyle+\sum_{i=\log n}^{\infty}\int_{q_{1/2^{i}}}^{q_{1/2^{i-1}}}f_{P}(% t)\log\frac{1}{1+\frac{1}{2}}dt+\int_{q_{1-1/2^{i-1}}}^{q_{1-1/2^{i}}}f_{P}(t)% \log\frac{1}{1-\frac{1}{2}}dt+ ∑ start_POSTSUBSCRIPT italic_i = roman_log italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) roman_log divide start_ARG 1 end_ARG start_ARG 1 + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_ARG italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) roman_log divide start_ARG 1 end_ARG start_ARG 1 - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_ARG italic_d italic_t
=i=2log4n12i[log11+2in+log112in]+i=logn12i[log11+12+log1112]absentsuperscriptsubscript𝑖24𝑛1superscript2𝑖delimited-[]11superscript2𝑖𝑛11superscript2𝑖𝑛superscriptsubscript𝑖𝑛1superscript2𝑖delimited-[]11121112\displaystyle=\sum_{i=2}^{\log 4n}\frac{1}{2^{i}}\left[\log\frac{1}{1+\sqrt{% \frac{2^{i}}{n}}}+\log\frac{1}{1-\sqrt{\frac{2^{i}}{n}}}\right]+\sum_{i=\log n% }^{\infty}\frac{1}{2^{i}}\left[\log\frac{1}{1+\frac{1}{2}}+\log\frac{1}{1-% \frac{1}{2}}\right]= ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log 4 italic_n end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG [ roman_log divide start_ARG 1 end_ARG start_ARG 1 + square-root start_ARG divide start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG end_ARG end_ARG + roman_log divide start_ARG 1 end_ARG start_ARG 1 - square-root start_ARG divide start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG end_ARG end_ARG ] + ∑ start_POSTSUBSCRIPT italic_i = roman_log italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG [ roman_log divide start_ARG 1 end_ARG start_ARG 1 + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_ARG + roman_log divide start_ARG 1 end_ARG start_ARG 1 - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_ARG ]
i=2logn112ilog112in+O(1n)absentsuperscriptsubscript𝑖2𝑛11superscript2𝑖11superscript2𝑖𝑛𝑂1𝑛\displaystyle\leq\sum_{i=2}^{\log n-1}\frac{1}{2^{i}}\log\frac{1}{1-\frac{2^{i% }}{n}}+O\left(\frac{1}{n}\right)≤ ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG roman_log divide start_ARG 1 end_ARG start_ARG 1 - divide start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG end_ARG + italic_O ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG )
i=2logn112i22in+O(1n)absentsuperscriptsubscript𝑖2𝑛11superscript2𝑖2superscript2𝑖𝑛𝑂1𝑛\displaystyle\leq\sum_{i=2}^{\log n-1}\frac{1}{2^{i}}2\frac{2^{i}}{n}+O\left(% \frac{1}{n}\right)≤ ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG 2 divide start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + italic_O ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG )
=O(lognn),absent𝑂𝑛𝑛\displaystyle=O\left(\frac{\log n}{n}\right),= italic_O ( divide start_ARG roman_log italic_n end_ARG start_ARG italic_n end_ARG ) ,

where the third inequality from last is by the fact that the geometric series i=logn12isuperscriptsubscript𝑖𝑛1superscript2𝑖\sum_{i=\log n}^{\infty}\frac{1}{2^{i}}∑ start_POSTSUBSCRIPT italic_i = roman_log italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG converges to O(1n)𝑂1𝑛O(\frac{1}{n})italic_O ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ), the second inequality from last is from the fact that 2in<1/2superscript2𝑖𝑛12\frac{2^{i}}{n}<1/2divide start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG < 1 / 2, and log(1/(1y))<2y11𝑦2𝑦\log(1/(1-y))<2yroman_log ( start_ARG 1 / ( 1 - italic_y ) end_ARG ) < 2 italic_y for 0<y<1/20𝑦120<y<1/20 < italic_y < 1 / 2. ∎

Proof of Lemma 6.11.

First, we recall the definition of the 1-Wasserstein distance in terms of the cumulative distribution function.

𝒲(P,Q)=|FP(t)FQ(t)|𝑑t𝒲𝑃𝑄subscriptsubscript𝐹𝑃𝑡subscript𝐹𝑄𝑡differential-d𝑡\displaystyle\mathcal{W}(P,Q)=\int_{\mathbb{R}}|F_{P}(t)-F_{Q}(t)|dtcaligraphic_W ( italic_P , italic_Q ) = ∫ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t

Fix any 2i<logn12𝑖𝑛12\leq i<\log n-12 ≤ italic_i < roman_log italic_n - 1. Observe that by construction, for all t[q1/2i,q11/2i)𝑡subscript𝑞1superscript2𝑖subscript𝑞11superscript2𝑖t\in[q_{1/2^{i}},q_{1-1/2^{i}})italic_t ∈ [ italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) and for all t[q11/2i1,q11/2i)𝑡subscript𝑞11superscript2𝑖1subscript𝑞11superscript2𝑖t\in[q_{1-1/2^{i-1}},q_{1-1/2^{i}})italic_t ∈ [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ), |FP(t)FQ(t)|j=i+1logn112jn+12j=logn12jsubscript𝐹𝑃𝑡subscript𝐹𝑄𝑡superscriptsubscript𝑗𝑖1𝑛11superscript2𝑗𝑛12superscriptsubscript𝑗𝑛1superscript2𝑗|F_{P}(t)-F_{Q}(t)|\geq\sum_{j=i+1}^{\log n-1}\frac{1}{\sqrt{2^{j}n}}+\frac{1}% {2}\sum_{j=\log n}^{\infty}\frac{1}{2^{j}}| italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) | ≥ ∑ start_POSTSUBSCRIPT italic_j = italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_n end_ARG end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_j = roman_log italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG. Similarly, fix any logn1i<𝑛1𝑖\log n-1\leq i<\inftyroman_log italic_n - 1 ≤ italic_i < ∞. Observe that for all t[q1/2i,q11/2i)𝑡subscript𝑞1superscript2𝑖subscript𝑞11superscript2𝑖t\in[q_{1/2^{i}},q_{1-1/2^{i}})italic_t ∈ [ italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ), and for all t[q11/2i1,q11/2i)𝑡subscript𝑞11superscript2𝑖1subscript𝑞11superscript2𝑖t\in[q_{1-1/2^{i-1}},q_{1-1/2^{i}})italic_t ∈ [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ), we have that |FP(t)FQ(t)|12j=i+112jsubscript𝐹𝑃𝑡subscript𝐹𝑄𝑡12superscriptsubscript𝑗𝑖11superscript2𝑗|F_{P}(t)-F_{Q}(t)|\geq\frac{1}{2}\sum_{j=i+1}^{\infty}\frac{1}{2^{j}}| italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) | ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_j = italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG. Substituting the above bounds in the formula for the Wasserstein distance, we get that

𝒲(P,Q)𝒲𝑃𝑄\displaystyle\mathcal{W}(P,Q)caligraphic_W ( italic_P , italic_Q ) i=2logn2q1/2iq1/2i1[j=i+1logn212jn+12j=logn112j]𝑑t+q11/2i1q11/2i[j=i+1logn212jn+12j=logn112j]𝑑tabsentsuperscriptsubscript𝑖2𝑛2superscriptsubscriptsubscript𝑞1superscript2𝑖subscript𝑞1superscript2𝑖1delimited-[]superscriptsubscript𝑗𝑖1𝑛21superscript2𝑗𝑛12superscriptsubscript𝑗𝑛11superscript2𝑗differential-d𝑡superscriptsubscriptsubscript𝑞11superscript2𝑖1subscript𝑞11superscript2𝑖delimited-[]superscriptsubscript𝑗𝑖1𝑛21superscript2𝑗𝑛12superscriptsubscript𝑗𝑛11superscript2𝑗differential-d𝑡\displaystyle\geq\sum_{i=2}^{\log n-2}\int_{q_{1/2^{i}}}^{q_{1/2^{i-1}}}\left[% \sum_{j=i+1}^{\log n-2}\frac{1}{\sqrt{2^{j}n}}+\frac{1}{2}\sum_{j=\log n-1}^{% \infty}\frac{1}{2^{j}}\right]dt+\int_{q_{1-1/2^{i-1}}}^{q_{1-1/2^{i}}}\left[% \sum_{j=i+1}^{\log n-2}\frac{1}{\sqrt{2^{j}n}}+\frac{1}{2}\sum_{j=\log n-1}^{% \infty}\frac{1}{2^{j}}\right]dt≥ ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 2 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_j = italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 2 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_n end_ARG end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_j = roman_log italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG ] italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_j = italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 2 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_n end_ARG end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_j = roman_log italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG ] italic_d italic_t
+i=logn1q1/2iq1/2i112j=i+112jdt+q11/2i1q11/2i12j=i+112jdtsuperscriptsubscript𝑖𝑛1superscriptsubscriptsubscript𝑞1superscript2𝑖subscript𝑞1superscript2𝑖112superscriptsubscript𝑗𝑖11superscript2𝑗𝑑𝑡superscriptsubscriptsubscript𝑞11superscript2𝑖1subscript𝑞11superscript2𝑖12superscriptsubscript𝑗𝑖11superscript2𝑗𝑑𝑡\displaystyle+\sum_{i=\log n-1}^{\infty}\int_{q_{1/2^{i}}}^{q_{1/2^{i-1}}}% \frac{1}{2}\sum_{j=i+1}^{\infty}\frac{1}{2^{j}}dt+\int_{q_{1-1/2^{i-1}}}^{q_{1% -1/2^{i}}}\frac{1}{2}\sum_{j=i+1}^{\infty}\frac{1}{2^{j}}dt+ ∑ start_POSTSUBSCRIPT italic_i = roman_log italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_j = italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_j = italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG italic_d italic_t

Pulling the summation over j𝑗jitalic_j outside the integral and grou** terms,

𝒲(P,Q)𝒲𝑃𝑄\displaystyle\mathcal{W}(P,Q)caligraphic_W ( italic_P , italic_Q ) i=2logn2[j=i+1logn2q1/2iq1/2i112jn𝑑t+q11/2i1q11/2i12jn𝑑t+12j=logn1q1/2iq1/2i112j𝑑t+q11/2i1q11/2i12j𝑑t]absentsuperscriptsubscript𝑖2𝑛2delimited-[]superscriptsubscript𝑗𝑖1𝑛2superscriptsubscriptsubscript𝑞1superscript2𝑖subscript𝑞1superscript2𝑖11superscript2𝑗𝑛differential-d𝑡superscriptsubscriptsubscript𝑞11superscript2𝑖1subscript𝑞11superscript2𝑖1superscript2𝑗𝑛differential-d𝑡12superscriptsubscript𝑗𝑛1superscriptsubscriptsubscript𝑞1superscript2𝑖subscript𝑞1superscript2𝑖11superscript2𝑗differential-d𝑡superscriptsubscriptsubscript𝑞11superscript2𝑖1subscript𝑞11superscript2𝑖1superscript2𝑗differential-d𝑡\displaystyle\geq\sum_{i=2}^{\log n-2}\Bigg{[}\sum_{j=i+1}^{\log n-2}\int_{q_{% 1/2^{i}}}^{q_{1/2^{i-1}}}\frac{1}{\sqrt{2^{j}n}}dt+\int_{q_{1-1/2^{i-1}}}^{q_{% 1-1/2^{i}}}\frac{1}{\sqrt{2^{j}n}}dt+\frac{1}{2}\sum_{j=\log n-1}^{\infty}\int% _{q_{1/2^{i}}}^{q_{1/2^{i-1}}}\frac{1}{2^{j}}dt+\int_{q_{1-1/2^{i-1}}}^{q_{1-1% /2^{i}}}\frac{1}{2^{j}}dt\Bigg{]}≥ ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 2 end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_j = italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 2 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_n end_ARG end_ARG italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_n end_ARG end_ARG italic_d italic_t + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_j = roman_log italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG italic_d italic_t ]
+12i=logn1j=i+1[q1/2iq1/2i112j𝑑t+q11/2i1q11/2i12j𝑑t]12superscriptsubscript𝑖𝑛1superscriptsubscript𝑗𝑖1delimited-[]superscriptsubscriptsubscript𝑞1superscript2𝑖subscript𝑞1superscript2𝑖11superscript2𝑗differential-d𝑡superscriptsubscriptsubscript𝑞11superscript2𝑖1subscript𝑞11superscript2𝑖1superscript2𝑗differential-d𝑡\displaystyle+\frac{1}{2}\sum_{i=\log n-1}^{\infty}\sum_{j=i+1}^{\infty}\Bigg{% [}\int_{q_{1/2^{i}}}^{q_{1/2^{i-1}}}\frac{1}{2^{j}}dt+\int_{q_{1-1/2^{i-1}}}^{% q_{1-1/2^{i}}}\frac{1}{2^{j}}dt\Bigg{]}+ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_i = roman_log italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT [ ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG italic_d italic_t ]
=i=2logn2[(q1/2i1q1/2i)+(q11/2iq11/2i1)][j=i+1logn212jn+12j=logn112j]absentsuperscriptsubscript𝑖2𝑛2delimited-[]subscript𝑞1superscript2𝑖1subscript𝑞1superscript2𝑖subscript𝑞11superscript2𝑖subscript𝑞11superscript2𝑖1delimited-[]superscriptsubscript𝑗𝑖1𝑛21superscript2𝑗𝑛12superscriptsubscript𝑗𝑛11superscript2𝑗\displaystyle=\sum_{i=2}^{\log n-2}\left[({q_{1/2^{i-1}}}-q_{1/2^{i}})+(q_{1-1% /2^{i}}-{q_{1-1/2^{i-1}}})\right]\left[\sum_{j=i+1}^{\log n-2}\frac{1}{\sqrt{2% ^{j}n}}+\frac{1}{2}\sum_{j=\log n-1}^{\infty}\frac{1}{2^{j}}\right]= ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 2 end_POSTSUPERSCRIPT [ ( italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) + ( italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ] [ ∑ start_POSTSUBSCRIPT italic_j = italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 2 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_n end_ARG end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_j = roman_log italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG ]
+i=logn1[(q1/2i1q1/2i)+(q11/2iq11/2i1)]12j=i+112jsuperscriptsubscript𝑖𝑛1delimited-[]subscript𝑞1superscript2𝑖1subscript𝑞1superscript2𝑖subscript𝑞11superscript2𝑖subscript𝑞11superscript2𝑖112superscriptsubscript𝑗𝑖11superscript2𝑗\displaystyle+\sum_{i=\log n-1}^{\infty}\left[({q_{1/2^{i-1}}}-q_{1/2^{i}})+(q% _{1-1/2^{i}}-{q_{1-1/2^{i-1}}})\right]\frac{1}{2}\sum_{j=i+1}^{\infty}\frac{1}% {2^{j}}+ ∑ start_POSTSUBSCRIPT italic_i = roman_log italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT [ ( italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) + ( italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ] divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_j = italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG

Switching the order of summation (summing over j𝑗jitalic_j first), and grou** terms, we get

𝒲(P,Q)𝒲𝑃𝑄\displaystyle\mathcal{W}(P,Q)caligraphic_W ( italic_P , italic_Q ) j=3logn212jni=2j1[(q1/2i1q1/2i+(q11/2iq11/2i1)]\displaystyle\geq\sum_{j=3}^{\log n-2}\frac{1}{\sqrt{2^{j}n}}\sum_{i=2}^{j-1}% \left[({q_{1/2^{i-1}}}-q_{1/2^{i}}+(q_{1-1/2^{i}}-{q_{1-1/2^{i-1}}})\right]≥ ∑ start_POSTSUBSCRIPT italic_j = 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 2 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_n end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT [ ( italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + ( italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ]
+12j=logn112ji=2j1[(q1/2i1q1/2i+(q11/2iq11/2i1)]\displaystyle+\frac{1}{2}\sum_{j=\log n-1}^{\infty}\frac{1}{2^{j}}\sum_{i=2}^{% j-1}\left[({q_{1/2^{i-1}}}-q_{1/2^{i}}+(q_{1-1/2^{i}}-{q_{1-1/2^{i-1}}})\right]+ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_j = roman_log italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT [ ( italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + ( italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ]

Telesco** the inner sums over i𝑖iitalic_i we get that

𝒲(P,Q)𝒲𝑃𝑄\displaystyle\mathcal{W}(P,Q)caligraphic_W ( italic_P , italic_Q ) j=3logn212jn[q11/2j1q1/2j1]+12j=logn112j[q11/2j1q1/2j1]absentsuperscriptsubscript𝑗3𝑛21superscript2𝑗𝑛delimited-[]subscript𝑞11superscript2𝑗1subscript𝑞1superscript2𝑗112superscriptsubscript𝑗𝑛11superscript2𝑗delimited-[]subscript𝑞11superscript2𝑗1subscript𝑞1superscript2𝑗1\displaystyle\geq\sum_{j=3}^{\log n-2}\frac{1}{\sqrt{2^{j}n}}\left[q_{1-1/2^{j% -1}}-q_{1/2^{j-1}}\right]+\frac{1}{2}\sum_{j=\log n-1}^{\infty}\frac{1}{2^{j}}% \left[q_{1-1/2^{j-1}}-q_{1/2^{j-1}}\right]≥ ∑ start_POSTSUBSCRIPT italic_j = 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 2 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_n end_ARG end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_j = roman_log italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ]

A change of variables (where we now set j𝑗jitalic_j to j1𝑗1j-1italic_j - 1) then gives

𝒲(P,Q)𝒲𝑃𝑄\displaystyle\mathcal{W}(P,Q)caligraphic_W ( italic_P , italic_Q ) 12j=2logn312jn[q11/2jq1/2j]+14j=logn212j[q11/2jq1/2j]absent12superscriptsubscript𝑗2𝑛31superscript2𝑗𝑛delimited-[]subscript𝑞11superscript2𝑗subscript𝑞1superscript2𝑗14superscriptsubscript𝑗𝑛21superscript2𝑗delimited-[]subscript𝑞11superscript2𝑗subscript𝑞1superscript2𝑗\displaystyle\geq\frac{1}{\sqrt{2}}\sum_{j=2}^{\log n-3}\frac{1}{\sqrt{2^{j}n}% }\left[q_{1-1/2^{j}}-q_{1/2^{j}}\right]+\frac{1}{4}\sum_{j=\log n-2}^{\infty}% \frac{1}{2^{j}}\left[q_{1-1/2^{j}}-q_{1/2^{j}}\right]≥ divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_j = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 3 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_n end_ARG end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] + divide start_ARG 1 end_ARG start_ARG 4 end_ARG ∑ start_POSTSUBSCRIPT italic_j = roman_log italic_n - 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ]
14j=2logn112jn[q11/2jq1/2j]+14j=logn12j[q11/2jq1/2j],absent14superscriptsubscript𝑗2𝑛11superscript2𝑗𝑛delimited-[]subscript𝑞11superscript2𝑗subscript𝑞1superscript2𝑗14superscriptsubscript𝑗𝑛1superscript2𝑗delimited-[]subscript𝑞11superscript2𝑗subscript𝑞1superscript2𝑗\displaystyle\geq\frac{1}{4}\sum_{j=2}^{\log n-1}\frac{1}{\sqrt{2^{j}n}}\left[% q_{1-1/2^{j}}-q_{1/2^{j}}\right]+\frac{1}{4}\sum_{j=\log n}^{\infty}\frac{1}{2% ^{j}}\left[q_{1-1/2^{j}}-q_{1/2^{j}}\right],≥ divide start_ARG 1 end_ARG start_ARG 4 end_ARG ∑ start_POSTSUBSCRIPT italic_j = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_n end_ARG end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] + divide start_ARG 1 end_ARG start_ARG 4 end_ARG ∑ start_POSTSUBSCRIPT italic_j = roman_log italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] ,

where the last inequality is by pulling the first two terms from the summation in second term to the summation in the first term, and using the fact that for j=logn2,j=logn1formulae-sequence𝑗𝑛2𝑗𝑛1j=\log n-2,j=\log n-1italic_j = roman_log italic_n - 2 , italic_j = roman_log italic_n - 1, we have that 142j12212jn14superscript2𝑗1221superscript2𝑗𝑛\frac{1}{4\cdot 2^{j}}\geq\frac{1}{2\sqrt{2}}\frac{1}{\sqrt{2^{j}n}}divide start_ARG 1 end_ARG start_ARG 4 ⋅ 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG ≥ divide start_ARG 1 end_ARG start_ARG 2 square-root start_ARG 2 end_ARG end_ARG divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_n end_ARG end_ARG

Proof of Lemma 6.12.

We first state a theorem of Bobkov and Ledoux [BL19].

Theorem F.1 (Theorem 3.5, [BL19]).

There is an absolute constant c>0𝑐0c>0italic_c > 0, such that for all distributions P𝑃Pitalic_P over \mathbb{R}blackboard_R, for every n1𝑛1n\geq 1italic_n ≥ 1,

c(An+Bn)𝔼[𝒲(P,P^n]An+Bn.c(A_{n}+B_{n})\leq\mathbb{E}[\mathcal{W}(P,\hat{P}_{n}]\leq A_{n}+B_{n}.italic_c ( italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ≤ blackboard_E [ caligraphic_W ( italic_P , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] ≤ italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT .

where

An=2F(t)[1F(t)]14nF(t)[1F(t)]𝑑t,subscript𝐴𝑛2subscript𝐹𝑡delimited-[]1𝐹𝑡14𝑛𝐹𝑡delimited-[]1𝐹𝑡differential-d𝑡A_{n}=2\int_{F(t)[1-F(t)]\leq\frac{1}{4n}}F(t)[1-F(t)]dt,italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 2 ∫ start_POSTSUBSCRIPT italic_F ( italic_t ) [ 1 - italic_F ( italic_t ) ] ≤ divide start_ARG 1 end_ARG start_ARG 4 italic_n end_ARG end_POSTSUBSCRIPT italic_F ( italic_t ) [ 1 - italic_F ( italic_t ) ] italic_d italic_t ,

and

Bn=1nF(t)[1F(t)]14nF(t)[1F(t)]𝑑t.subscript𝐵𝑛1𝑛subscript𝐹𝑡delimited-[]1𝐹𝑡14𝑛𝐹𝑡delimited-[]1𝐹𝑡differential-d𝑡B_{n}=\frac{1}{\sqrt{n}}\int_{F(t)[1-F(t)]\geq\frac{1}{4n}}\sqrt{F(t)[1-F(t)]}dt.italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ∫ start_POSTSUBSCRIPT italic_F ( italic_t ) [ 1 - italic_F ( italic_t ) ] ≥ divide start_ARG 1 end_ARG start_ARG 4 italic_n end_ARG end_POSTSUBSCRIPT square-root start_ARG italic_F ( italic_t ) [ 1 - italic_F ( italic_t ) ] end_ARG italic_d italic_t .

Now, we are ready to prove the main theorem. Fix natural number i2𝑖2i\geq 2italic_i ≥ 2. Restricted to tq1/2𝑡subscript𝑞12t\leq q_{1/2}italic_t ≤ italic_q start_POSTSUBSCRIPT 1 / 2 end_POSTSUBSCRIPT, FP(t)(1FP(t))subscript𝐹𝑃𝑡1subscript𝐹𝑃𝑡F_{P}(t)(1-F_{P}(t))italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ( 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ) is an increasing function, and hence for t[q1/2i,q1/2i1]𝑡subscript𝑞1superscript2𝑖subscript𝑞1superscript2𝑖1t\in[q_{1/2^{i}},q_{1/2^{i-1}}]italic_t ∈ [ italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ], we have that FP(t)(1FP(t))12i1[112i1]subscript𝐹𝑃𝑡1subscript𝐹𝑃𝑡1superscript2𝑖1delimited-[]11superscript2𝑖1F_{P}(t)(1-F_{P}(t))\leq\frac{1}{2^{i-1}}[1-\frac{1}{2^{i-1}}]italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ( 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ) ≤ divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_ARG [ 1 - divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_ARG ].

Similarly, restricted to t>q1/2𝑡subscript𝑞12t>q_{1/2}italic_t > italic_q start_POSTSUBSCRIPT 1 / 2 end_POSTSUBSCRIPT, FP(t)(1FP(t))subscript𝐹𝑃𝑡1subscript𝐹𝑃𝑡F_{P}(t)(1-F_{P}(t))italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ( 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ) is a decreasing function, and hence for t[1q1/2i1,q11/2i]𝑡1subscript𝑞1superscript2𝑖1subscript𝑞11superscript2𝑖t\in[1-q_{1/2^{i-1}},q_{1-1/2^{i}}]italic_t ∈ [ 1 - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ], we have that FP(t)(1FP(t))12i1[112i1]subscript𝐹𝑃𝑡1subscript𝐹𝑃𝑡1superscript2𝑖1delimited-[]11superscript2𝑖1F_{P}(t)(1-F_{P}(t))\leq\frac{1}{2^{i-1}}[1-\frac{1}{2^{i-1}}]italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ( 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ) ≤ divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_ARG [ 1 - divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_ARG ].

Using this, we can now upper bound the expected Wasserstein distance between P𝑃Pitalic_P and its empirical distribution using Theorem F.1. Hence, we upper bound the terms Bnsubscript𝐵𝑛B_{n}italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and Ansubscript𝐴𝑛A_{n}italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. We start by upper bounding Bnsubscript𝐵𝑛B_{n}italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Note that for all t[q14n,q114n]𝑡subscript𝑞14𝑛subscript𝑞114𝑛t\not\in[q_{\frac{1}{4n}},q_{1-\frac{1}{4n}}]italic_t ∉ [ italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 4 italic_n end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG 4 italic_n end_ARG end_POSTSUBSCRIPT ], we have that FP(t)(1FP(t))14nsubscript𝐹𝑃𝑡1subscript𝐹𝑃𝑡14𝑛F_{P}(t)(1-F_{P}(t))\leq\frac{1}{4n}italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ( 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ) ≤ divide start_ARG 1 end_ARG start_ARG 4 italic_n end_ARG. Hence,

Bnsubscript𝐵𝑛\displaystyle B_{n}italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT =1nFP(t)[1FP(t)]14nFP(t)[1FP(t)]𝑑tabsent1𝑛subscriptsubscript𝐹𝑃𝑡delimited-[]1subscript𝐹𝑃𝑡14𝑛subscript𝐹𝑃𝑡delimited-[]1subscript𝐹𝑃𝑡differential-d𝑡\displaystyle=\frac{1}{\sqrt{n}}\int_{F_{P}(t)[1-F_{P}(t)]\geq\frac{1}{4n}}% \sqrt{F_{P}(t)[1-F_{P}(t)]}dt= divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ∫ start_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) [ 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ] ≥ divide start_ARG 1 end_ARG start_ARG 4 italic_n end_ARG end_POSTSUBSCRIPT square-root start_ARG italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) [ 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ] end_ARG italic_d italic_t
1nq14nq114nFP(t)[1FP(t)]𝑑tabsent1𝑛superscriptsubscriptsubscript𝑞14𝑛subscript𝑞114𝑛subscript𝐹𝑃𝑡delimited-[]1subscript𝐹𝑃𝑡differential-d𝑡\displaystyle\leq\frac{1}{\sqrt{n}}\int_{q_{\frac{1}{4n}}}^{q_{1-\frac{1}{4n}}% }\sqrt{F_{P}(t)[1-F_{P}(t)]}dt≤ divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 4 italic_n end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG 4 italic_n end_ARG end_POSTSUBSCRIPT end_POSTSUPERSCRIPT square-root start_ARG italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) [ 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ] end_ARG italic_d italic_t
i=2log4n1n[q1/2iq1/2i1FP(t)[1FP(t)]𝑑t+q11/2i1q11/2iFP(t)[1FP(t)]𝑑t]absentsuperscriptsubscript𝑖24𝑛1𝑛delimited-[]superscriptsubscriptsubscript𝑞1superscript2𝑖subscript𝑞1superscript2𝑖1subscript𝐹𝑃𝑡delimited-[]1subscript𝐹𝑃𝑡differential-d𝑡superscriptsubscriptsubscript𝑞11superscript2𝑖1subscript𝑞11superscript2𝑖subscript𝐹𝑃𝑡delimited-[]1subscript𝐹𝑃𝑡differential-d𝑡\displaystyle\leq\sum_{i=2}^{\log 4n}\frac{1}{\sqrt{n}}\left[\int_{q_{1/2^{i}}% }^{q_{1/2^{i-1}}}\sqrt{F_{P}(t)[1-F_{P}(t)]}dt+\int_{q_{1-1/2^{i-1}}}^{q_{1-1/% 2^{i}}}\sqrt{F_{P}(t)[1-F_{P}(t)]}dt\right]≤ ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log 4 italic_n end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG [ ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT square-root start_ARG italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) [ 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ] end_ARG italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT square-root start_ARG italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) [ 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ] end_ARG italic_d italic_t ]
i=2log4n1nq1/2iq1/2i112i1[112i1]𝑑t+q11/2i1q11/2i12i1[112i1]𝑑tabsentsuperscriptsubscript𝑖24𝑛1𝑛superscriptsubscriptsubscript𝑞1superscript2𝑖subscript𝑞1superscript2𝑖11superscript2𝑖1delimited-[]11superscript2𝑖1differential-d𝑡superscriptsubscriptsubscript𝑞11superscript2𝑖1subscript𝑞11superscript2𝑖1superscript2𝑖1delimited-[]11superscript2𝑖1differential-d𝑡\displaystyle\leq\sum_{i=2}^{\log 4n}\frac{1}{\sqrt{n}}\int_{q_{1/2^{i}}}^{q_{% 1/2^{i-1}}}\sqrt{\frac{1}{2^{i-1}}\left[1-\frac{1}{2^{i-1}}\right]}dt+\int_{q_% {1-1/2^{i-1}}}^{q_{1-1/2^{i}}}\sqrt{\frac{1}{2^{i-1}}\left[1-\frac{1}{2^{i-1}}% \right]}dt≤ ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log 4 italic_n end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT square-root start_ARG divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_ARG [ 1 - divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_ARG ] end_ARG italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT square-root start_ARG divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_ARG [ 1 - divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_ARG ] end_ARG italic_d italic_t
=i=2log4n1n12i1[112i1][q1/2i1q1/2i+q11/2iq11/2i1]absentsuperscriptsubscript𝑖24𝑛1𝑛1superscript2𝑖1delimited-[]11superscript2𝑖1delimited-[]subscript𝑞1superscript2𝑖1subscript𝑞1superscript2𝑖subscript𝑞11superscript2𝑖subscript𝑞11superscript2𝑖1\displaystyle=\sum_{i=2}^{\log 4n}\frac{1}{\sqrt{n}}\sqrt{\frac{1}{2^{i-1}}% \left[1-\frac{1}{2^{i-1}}\right]}\left[q_{1/2^{i-1}}-q_{1/2^{i}}+q_{1-1/2^{i}}% -q_{1-1/2^{i-1}}\right]= ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log 4 italic_n end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG square-root start_ARG divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_ARG [ 1 - divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_ARG ] end_ARG [ italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ]
i=2log4n22in[q11/2iq1/2i]absentsuperscriptsubscript𝑖24𝑛2superscript2𝑖𝑛delimited-[]subscript𝑞11superscript2𝑖subscript𝑞1superscript2𝑖\displaystyle\leq\sum_{i=2}^{\log 4n}\frac{2}{\sqrt{2^{i}n}}\left[q_{1-1/2^{i}% }-q_{1/2^{i}}\right]≤ ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log 4 italic_n end_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_n end_ARG end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ]
=i=2logn122in[q11/2iq1/2i]+i=lognlog4n22in[q11/2iq1/2i]absentsuperscriptsubscript𝑖2𝑛12superscript2𝑖𝑛delimited-[]subscript𝑞11superscript2𝑖subscript𝑞1superscript2𝑖superscriptsubscript𝑖𝑛4𝑛2superscript2𝑖𝑛delimited-[]subscript𝑞11superscript2𝑖subscript𝑞1superscript2𝑖\displaystyle=\sum_{i=2}^{\log n-1}\frac{2}{\sqrt{2^{i}n}}\left[q_{1-1/2^{i}}-% q_{1/2^{i}}\right]+\sum_{i=\log n}^{\log 4n}\frac{2}{\sqrt{2^{i}n}}\left[q_{1-% 1/2^{i}}-q_{1/2^{i}}\right]= ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 1 end_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_n end_ARG end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] + ∑ start_POSTSUBSCRIPT italic_i = roman_log italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log 4 italic_n end_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_n end_ARG end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ]
i=2logn122in[q11/2iq1/2i]+i=lognlog4n42i[q11/2iq1/2i],absentsuperscriptsubscript𝑖2𝑛12superscript2𝑖𝑛delimited-[]subscript𝑞11superscript2𝑖subscript𝑞1superscript2𝑖superscriptsubscript𝑖𝑛4𝑛4superscript2𝑖delimited-[]subscript𝑞11superscript2𝑖subscript𝑞1superscript2𝑖\displaystyle\leq\sum_{i=2}^{\log n-1}\frac{2}{\sqrt{2^{i}n}}\left[q_{1-1/2^{i% }}-q_{1/2^{i}}\right]+\sum_{i=\log n}^{\log 4n}\frac{4}{2^{i}}\left[q_{1-1/2^{% i}}-q_{1/2^{i}}\right],≤ ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log italic_n - 1 end_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG square-root start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_n end_ARG end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] + ∑ start_POSTSUBSCRIPT italic_i = roman_log italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log 4 italic_n end_POSTSUPERSCRIPT divide start_ARG 4 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] ,

where the last inequality is because for ilog(4n)𝑖4𝑛i\leq\log(4n)italic_i ≤ roman_log ( start_ARG 4 italic_n end_ARG ), we have that 1n42i1𝑛4superscript2𝑖\frac{1}{n}\leq\frac{4}{2^{i}}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ≤ divide start_ARG 4 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG.

Next, we bound Ansubscript𝐴𝑛A_{n}italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Note that for all tq1/2n𝑡subscript𝑞12𝑛t\geq q_{1/2n}italic_t ≥ italic_q start_POSTSUBSCRIPT 1 / 2 italic_n end_POSTSUBSCRIPT and for all tq112n𝑡subscript𝑞112𝑛t\leq q_{1-\frac{1}{2n}}italic_t ≤ italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG end_POSTSUBSCRIPT, we have that FP(t)(1FP(t))14nnot-less-than-or-equalssubscript𝐹𝑃𝑡1subscript𝐹𝑃𝑡14𝑛F_{P}(t)(1-F_{P}(t))\not\leq\frac{1}{4n}italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ( 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ) ≰ divide start_ARG 1 end_ARG start_ARG 4 italic_n end_ARG. Hence,

Ansubscript𝐴𝑛\displaystyle A_{n}italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT =2FP(t)[1FP(t)]14nFP(t)[1FP(t)]𝑑tabsent2subscriptsubscript𝐹𝑃𝑡delimited-[]1subscript𝐹𝑃𝑡14𝑛subscript𝐹𝑃𝑡delimited-[]1subscript𝐹𝑃𝑡differential-d𝑡\displaystyle=2\int_{F_{P}(t)[1-F_{P}(t)]\leq\frac{1}{4n}}F_{P}(t)[1-F_{P}(t)]dt= 2 ∫ start_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) [ 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ] ≤ divide start_ARG 1 end_ARG start_ARG 4 italic_n end_ARG end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) [ 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ] italic_d italic_t
2[q12nFP(t)[1FP(t)]𝑑t+q112nFP(t)[1FP(t)]𝑑t]absent2delimited-[]superscriptsubscriptsubscript𝑞12𝑛subscript𝐹𝑃𝑡delimited-[]1subscript𝐹𝑃𝑡differential-d𝑡superscriptsubscriptsubscript𝑞112𝑛subscript𝐹𝑃𝑡delimited-[]1subscript𝐹𝑃𝑡differential-d𝑡\displaystyle\leq 2\left[\int_{-\infty}^{q_{\frac{1}{2n}}}F_{P}(t)[1-F_{P}(t)]% dt+\int_{q_{1-\frac{1}{2n}}}^{\infty}F_{P}(t)[1-F_{P}(t)]dt\right]≤ 2 [ ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) [ 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ] italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) [ 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ] italic_d italic_t ]
=i=1+log2n2[q1/2iq1/2i1FP(t)[1FP(t)]𝑑t+q11/2i1q11/2iFP(t)[1FP(t)]𝑑t]absentsuperscriptsubscript𝑖12𝑛2delimited-[]superscriptsubscriptsubscript𝑞1superscript2𝑖subscript𝑞1superscript2𝑖1subscript𝐹𝑃𝑡delimited-[]1subscript𝐹𝑃𝑡differential-d𝑡superscriptsubscriptsubscript𝑞11superscript2𝑖1subscript𝑞11superscript2𝑖subscript𝐹𝑃𝑡delimited-[]1subscript𝐹𝑃𝑡differential-d𝑡\displaystyle=\sum_{i=1+\log 2n}^{\infty}2\left[\int_{q_{1/2^{i}}}^{q_{1/2^{i-% 1}}}F_{P}(t)[1-F_{P}(t)]dt+\int_{q_{1-1/2^{i-1}}}^{q_{1-1/2^{i}}}F_{P}(t)[1-F_% {P}(t)]dt\right]= ∑ start_POSTSUBSCRIPT italic_i = 1 + roman_log 2 italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT 2 [ ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) [ 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ] italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) [ 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ] italic_d italic_t ]
i=1+log2n2[q1/2iq1/2i112i1[112i1]𝑑t+q11/2i1q11/2i12i1[112i1]𝑑t]absentsuperscriptsubscript𝑖12𝑛2delimited-[]superscriptsubscriptsubscript𝑞1superscript2𝑖subscript𝑞1superscript2𝑖11superscript2𝑖1delimited-[]11superscript2𝑖1differential-d𝑡superscriptsubscriptsubscript𝑞11superscript2𝑖1subscript𝑞11superscript2𝑖1superscript2𝑖1delimited-[]11superscript2𝑖1differential-d𝑡\displaystyle\leq\sum_{i=1+\log 2n}^{\infty}2\left[\int_{q_{1/2^{i}}}^{q_{1/2^% {i-1}}}\frac{1}{2^{i-1}}\left[1-\frac{1}{2^{i-1}}\right]dt+\int_{q_{1-1/2^{i-1% }}}^{q_{1-1/2^{i}}}\frac{1}{2^{i-1}}\left[1-\frac{1}{2^{i-1}}\right]dt\right]≤ ∑ start_POSTSUBSCRIPT italic_i = 1 + roman_log 2 italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT 2 [ ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_ARG [ 1 - divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_ARG ] italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_ARG [ 1 - divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_ARG ] italic_d italic_t ]
=i=1+log2n22i1[112i1][q1/2i1q1/2i+q11/2iq11/2i1]absentsuperscriptsubscript𝑖12𝑛2superscript2𝑖1delimited-[]11superscript2𝑖1delimited-[]subscript𝑞1superscript2𝑖1subscript𝑞1superscript2𝑖subscript𝑞11superscript2𝑖subscript𝑞11superscript2𝑖1\displaystyle=\sum_{i=1+\log 2n}^{\infty}\frac{2}{2^{i-1}}\left[1-\frac{1}{2^{% i-1}}\right]\left[q_{1/2^{i-1}}-q_{1/2^{i}}+q_{1-1/2^{i}}-q_{1-1/2^{i-1}}\right]= ∑ start_POSTSUBSCRIPT italic_i = 1 + roman_log 2 italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_ARG [ 1 - divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_ARG ] [ italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ]
i=1+log2n42i[q11/2iq1/2i]absentsuperscriptsubscript𝑖12𝑛4superscript2𝑖delimited-[]subscript𝑞11superscript2𝑖subscript𝑞1superscript2𝑖\displaystyle\leq\sum_{i=1+\log 2n}^{\infty}\frac{4}{2^{i}}\left[q_{1-1/2^{i}}% -q_{1/2^{i}}\right]≤ ∑ start_POSTSUBSCRIPT italic_i = 1 + roman_log 2 italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 4 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG [ italic_q start_POSTSUBSCRIPT 1 - 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ]

Then, using the upper bound in Theorem F.1, substituting in the bounds for Ansubscript𝐴𝑛A_{n}italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and Bnsubscript𝐵𝑛B_{n}italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, and simplifying, we get the claim. ∎

Proof of Claim 6.13.

By the definition of Wasserstein distance and restrictions of distributions, we have that

𝒲(P^n|q1k,q11k,P|q1k,q11k)𝒲evaluated-atsubscript^𝑃𝑛subscript𝑞1𝑘subscript𝑞11𝑘evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘\displaystyle\mathcal{W}(\hat{P}_{n}|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}},P|_{% q_{\frac{1}{k}},q_{1-\frac{1}{k}}})caligraphic_W ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) =ab|FP^n|q1k,q11k(t)FP|q1k,q11k(t)|𝑑tabsentsuperscriptsubscript𝑎𝑏subscript𝐹evaluated-atsubscript^𝑃𝑛subscript𝑞1𝑘subscript𝑞11𝑘𝑡subscript𝐹evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘𝑡differential-d𝑡\displaystyle=\int_{a}^{b}\left|F_{\hat{P}_{n}|_{q_{\frac{1}{k}},q_{1-\frac{1}% {k}}}}(t)-F_{P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}}}(t)\right|dt= ∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t
=q1kq11k|FP^n|q1k,q11k(t)FP|q1k,q11k(t)|𝑑tabsentsuperscriptsubscriptsubscript𝑞1𝑘subscript𝑞11𝑘subscript𝐹evaluated-atsubscript^𝑃𝑛subscript𝑞1𝑘subscript𝑞11𝑘𝑡subscript𝐹evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘𝑡differential-d𝑡\displaystyle=\int_{q_{\frac{1}{k}}}^{q_{1-\frac{1}{k}}}\left|F_{\hat{P}_{n}|_% {q_{\frac{1}{k}},q_{1-\frac{1}{k}}}}(t)-F_{P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k% }}}}(t)\right|dt= ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t
=q1kq11k|FP^n(t)FP(t)|𝑑t𝒲(P,Pn^)absentsuperscriptsubscriptsubscript𝑞1𝑘subscript𝑞11𝑘subscript𝐹subscript^𝑃𝑛𝑡subscript𝐹𝑃𝑡differential-d𝑡𝒲𝑃^subscript𝑃𝑛\displaystyle=\int_{q_{\frac{1}{k}}}^{q_{1-\frac{1}{k}}}\left|F_{\hat{P}_{n}}(% t)-F_{P}(t)\right|dt\leq\mathcal{W}(P,\hat{P_{n}})= ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t ≤ caligraphic_W ( italic_P , over^ start_ARG italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG )

F.3 Omitted Proofs in Section 6.2

Before going into the proofs, we state the standard Chernoff concentration bound that we will use multiple times.

Theorem F.2 (Binomial Concentration).

Let XBin(n,p)similar-to𝑋𝐵𝑖𝑛𝑛𝑝X\sim Bin(n,p)italic_X ∼ italic_B italic_i italic_n ( italic_n , italic_p ) with expectation μ=np𝜇𝑛𝑝\mu=npitalic_μ = italic_n italic_p, and 0<δ<10𝛿10<\delta<10 < italic_δ < 1. Then,

Pr(|Xμ|δμ)2eδ2μ3.probability𝑋𝜇𝛿𝜇2superscript𝑒superscript𝛿2𝜇3\Pr(|X-\mu|\geq\delta\mu)\leq 2e^{\frac{-\delta^{2}\mu}{3}}.roman_Pr ( start_ARG | italic_X - italic_μ | ≥ italic_δ italic_μ end_ARG ) ≤ 2 italic_e start_POSTSUPERSCRIPT divide start_ARG - italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_μ end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT .
Proof of Lemma 6.16.
𝒲(P,PDP)𝒲𝑃superscript𝑃𝐷𝑃\displaystyle\mathcal{W}(P,P^{DP})caligraphic_W ( italic_P , italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT ) =t|FP(t)FPDP(t)|𝑑tabsentsubscript𝑡subscript𝐹𝑃𝑡subscript𝐹superscript𝑃𝐷𝑃𝑡differential-d𝑡\displaystyle=\int_{t}|F_{P}(t)-F_{P^{DP}}(t)|dt= ∫ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t (13)
t=aq1/k|FP(t)FPDP(t)|𝑑t+t=q1/kq11/k|FP(t)FPDP(t)|𝑑t+t=q11/kb|FP(t)FPDP(t)|𝑑tabsentsuperscriptsubscript𝑡𝑎subscript𝑞1𝑘subscript𝐹𝑃𝑡subscript𝐹superscript𝑃𝐷𝑃𝑡differential-d𝑡superscriptsubscript𝑡subscript𝑞1𝑘subscript𝑞11𝑘subscript𝐹𝑃𝑡subscript𝐹superscript𝑃𝐷𝑃𝑡differential-d𝑡superscriptsubscript𝑡subscript𝑞11𝑘𝑏subscript𝐹𝑃𝑡subscript𝐹superscript𝑃𝐷𝑃𝑡differential-d𝑡\displaystyle\leq\int_{t=a}^{q_{1/k}}|F_{P}(t)-F_{P^{DP}}(t)|dt+\int_{t=q_{1/k% }}^{q_{1-1/k}}|F_{P}(t)-F_{P^{DP}}(t)|dt+\int_{t=q_{1-1/k}}^{b}|F_{P}(t)-F_{P^% {DP}}(t)|dt≤ ∫ start_POSTSUBSCRIPT italic_t = italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_t = italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_t = italic_q start_POSTSUBSCRIPT 1 - 1 / italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t (14)

Note that for all t[q1/k,q11/k]𝑡subscript𝑞1𝑘subscript𝑞11𝑘t\in[q_{1/k},q_{1-1/k}]italic_t ∈ [ italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - 1 / italic_k end_POSTSUBSCRIPT ], we have that the cumulative distribution functions of P𝑃Pitalic_P and its restricted version are identical and likewise for PDPsuperscript𝑃𝐷𝑃P^{DP}italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT. Additionally, the cumulative density functions for the restricted versions of the two distributions are identical to each other outside of this interval. Hence, we can simplify the middle term in the RHS of the inequality above as follows:

t=q1/kq11/k(P)|FP(t)FPDP(t)|𝑑t=𝒲(P|q1k,q11k,PDP|q1k,q11k)superscriptsubscript𝑡subscript𝑞1𝑘subscript𝑞11𝑘𝑃subscript𝐹𝑃𝑡subscript𝐹superscript𝑃𝐷𝑃𝑡differential-d𝑡𝒲evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘evaluated-atsuperscript𝑃𝐷𝑃subscript𝑞1𝑘subscript𝑞11𝑘\displaystyle\int_{t=q_{1/k}}^{q_{1-1/k}(P)}|F_{P}(t)-F_{P^{DP}}(t)|dt=% \mathcal{W}(P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}},P^{DP}|_{q_{\frac{1}{k}},q_% {1-\frac{1}{k}}})∫ start_POSTSUBSCRIPT italic_t = italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - 1 / italic_k end_POSTSUBSCRIPT ( italic_P ) end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t = caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT )

Next, we reason about the remaining terms.

Consider the term t=aq1/k|FP(t)FPDP(t)|𝑑tsuperscriptsubscript𝑡𝑎subscript𝑞1𝑘subscript𝐹𝑃𝑡subscript𝐹superscript𝑃𝐷𝑃𝑡differential-d𝑡\int_{t=a}^{q_{1/k}}|F_{P}(t)-F_{P^{DP}}(t)|dt∫ start_POSTSUBSCRIPT italic_t = italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t. First, condition on the event in Theorem E.3 (on the accuracy of the private quantiles for the empirical distribution), which tells us that with probability at least 1β1𝛽1-\beta1 - italic_β, we have for all r[k]𝑟delimited-[]𝑘r\in[k]italic_r ∈ [ italic_k ], that

q^2r12k14kq~2r12kq^2r12k+14k,subscript^𝑞2𝑟12𝑘14𝑘subscript~𝑞2𝑟12𝑘subscript^𝑞2𝑟12𝑘14𝑘\hat{q}_{\frac{2r-1}{2k}-\frac{1}{4k}}\leq\tilde{q}_{\frac{2r-1}{2k}}\leq\hat{% q}_{\frac{2r-1}{2k}+\frac{1}{4k}},over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT divide start_ARG 2 italic_r - 1 end_ARG start_ARG 2 italic_k end_ARG - divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG end_POSTSUBSCRIPT ≤ over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT divide start_ARG 2 italic_r - 1 end_ARG start_ARG 2 italic_k end_ARG end_POSTSUBSCRIPT ≤ over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT divide start_ARG 2 italic_r - 1 end_ARG start_ARG 2 italic_k end_ARG + divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG end_POSTSUBSCRIPT , (15)

which implies in particular that q^1/4kq~1/2kq^3/4ksubscript^𝑞14𝑘subscript~𝑞12𝑘subscript^𝑞34𝑘\hat{q}_{1/4k}\leq\tilde{q}_{1/2k}\leq\hat{q}_{3/4k}over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 1 / 4 italic_k end_POSTSUBSCRIPT ≤ over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 1 / 2 italic_k end_POSTSUBSCRIPT ≤ over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 3 / 4 italic_k end_POSTSUBSCRIPT.

Next, we argue that q^1/4kq1/8ksubscript^𝑞14𝑘subscript𝑞18𝑘\hat{q}_{1/4k}\geq q_{1/8k}over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 1 / 4 italic_k end_POSTSUBSCRIPT ≥ italic_q start_POSTSUBSCRIPT 1 / 8 italic_k end_POSTSUBSCRIPT with high probability. By the definition of quantiles, we have that PryP(y<q1/8k)<18k𝑃subscript𝑟similar-to𝑦𝑃𝑦subscript𝑞18𝑘18𝑘Pr_{y\sim P}(y<q_{1/8k})<\frac{1}{8k}italic_P italic_r start_POSTSUBSCRIPT italic_y ∼ italic_P end_POSTSUBSCRIPT ( italic_y < italic_q start_POSTSUBSCRIPT 1 / 8 italic_k end_POSTSUBSCRIPT ) < divide start_ARG 1 end_ARG start_ARG 8 italic_k end_ARG. The number of entries in the dataset 𝐱𝐱{\bf x}bold_x less than q1/8ksubscript𝑞18𝑘q_{1/8k}italic_q start_POSTSUBSCRIPT 1 / 8 italic_k end_POSTSUBSCRIPT is hence a Binomial with mean less than n8k𝑛8𝑘\frac{n}{8k}divide start_ARG italic_n end_ARG start_ARG 8 italic_k end_ARG, and hence, we have by Theorem F.2 (with δ𝛿\deltaitalic_δ set to 0.9)0.9)0.9 ) that with probability at least 1β1𝛽1-\beta1 - italic_β, the number of entries in the dataset less than q1/8ksubscript𝑞18𝑘q_{1/8k}italic_q start_POSTSUBSCRIPT 1 / 8 italic_k end_POSTSUBSCRIPT is at most 1.9n8k<n4k1.9𝑛8𝑘𝑛4𝑘1.9\frac{n}{8k}<\frac{n}{4k}1.9 divide start_ARG italic_n end_ARG start_ARG 8 italic_k end_ARG < divide start_ARG italic_n end_ARG start_ARG 4 italic_k end_ARG, which means the total mass less than q18ksubscript𝑞18𝑘q_{\frac{1}{8k}}italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 8 italic_k end_ARG end_POSTSUBSCRIPT in the empirical distribution is less than 14k14𝑘\frac{1}{4k}divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG. This implies that q^1/4kq1/8ksubscript^𝑞14𝑘subscript𝑞18𝑘\hat{q}_{1/4k}\geq q_{1/8k}over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 1 / 4 italic_k end_POSTSUBSCRIPT ≥ italic_q start_POSTSUBSCRIPT 1 / 8 italic_k end_POSTSUBSCRIPT by the definition of quantiles.

Additionally, note that for all t<q1/k𝑡subscript𝑞1𝑘t<q_{1/k}italic_t < italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT, FP(t)<1ksubscript𝐹𝑃𝑡1𝑘F_{P}(t)<\frac{1}{k}italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) < divide start_ARG 1 end_ARG start_ARG italic_k end_ARG. The number of entries in the dataset 𝐱𝐱{\bf x}bold_x that are less than q1/ksubscript𝑞1𝑘q_{1/k}italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT is hence a Binomial with success probability less than 1k1𝑘\frac{1}{k}divide start_ARG 1 end_ARG start_ARG italic_k end_ARG. By Theorem F.2, we can again argue that with probability at least 1β1𝛽1-\beta1 - italic_β, there is a constant csuperscript𝑐c^{\prime}italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT such that the total mass of the empirical distribution on values less than q1/ksubscript𝑞1𝑘q_{1/k}italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT is less than cksuperscript𝑐𝑘\frac{c^{\prime}}{k}divide start_ARG italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG italic_k end_ARG. Hence, q1/kq^c/ksubscript𝑞1𝑘subscript^𝑞superscript𝑐𝑘q_{1/k}\leq\hat{q}_{c^{\prime}/k}italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT ≤ over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / italic_k end_POSTSUBSCRIPT. This implies by Equation 15, that q1/kq~c/ksubscript𝑞1𝑘subscript~𝑞𝑐𝑘q_{1/k}\leq\tilde{q}_{c/k}italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT ≤ over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_c / italic_k end_POSTSUBSCRIPT for some constant c𝑐citalic_c. Hence, for all t<q1/k𝑡subscript𝑞1𝑘t<q_{1/k}italic_t < italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT, we have that FPDP(t)cksubscript𝐹superscript𝑃𝐷𝑃𝑡𝑐𝑘F_{P^{DP}}(t)\leq\frac{c}{k}italic_F start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) ≤ divide start_ARG italic_c end_ARG start_ARG italic_k end_ARG.

Hence, taking a union bound, with probability at least 1O(β)1𝑂𝛽1-O(\beta)1 - italic_O ( italic_β ),

t=aq1/k|FP(t)FPDP(t)|𝑑tsuperscriptsubscript𝑡𝑎subscript𝑞1𝑘subscript𝐹𝑃𝑡subscript𝐹superscript𝑃𝐷𝑃𝑡differential-d𝑡\displaystyle\int_{t=a}^{q_{1/k}}|F_{P}(t)-F_{P^{DP}}(t)|dt∫ start_POSTSUBSCRIPT italic_t = italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t =t=aq~1/2k|FP(t)FPDP(t)|𝑑t+q~1/2kq1/k|FP(t)FPDP(t)|𝑑tabsentsuperscriptsubscript𝑡𝑎subscript~𝑞12𝑘subscript𝐹𝑃𝑡subscript𝐹superscript𝑃𝐷𝑃𝑡differential-d𝑡superscriptsubscriptsubscript~𝑞12𝑘subscript𝑞1𝑘subscript𝐹𝑃𝑡subscript𝐹superscript𝑃𝐷𝑃𝑡differential-d𝑡\displaystyle=\int_{t=a}^{\tilde{q}_{1/2k}}|F_{P}(t)-F_{P^{DP}}(t)|dt+\int_{% \tilde{q}_{1/2k}}^{q_{1/k}}|F_{P}(t)-F_{P^{DP}}(t)|dt= ∫ start_POSTSUBSCRIPT italic_t = italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 1 / 2 italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t + ∫ start_POSTSUBSCRIPT over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 1 / 2 italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t
t=aq~1/2k|FP(t)FP|q1k,q11k(t)|𝑑t+q1/8kq1/k|FP(t)ck|𝑑tabsentsuperscriptsubscript𝑡𝑎subscript~𝑞12𝑘subscript𝐹𝑃𝑡subscript𝐹evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘𝑡differential-d𝑡superscriptsubscriptsubscript𝑞18𝑘subscript𝑞1𝑘subscript𝐹𝑃𝑡𝑐𝑘differential-d𝑡\displaystyle\leq\int_{t=a}^{\tilde{q}_{1/2k}}|F_{P}(t)-F_{P|_{q_{\frac{1}{k}}% ,q_{1-\frac{1}{k}}}}(t)|dt+\int_{q_{1/8k}}^{q_{1/k}}|F_{P}(t)-\frac{c}{k}|dt≤ ∫ start_POSTSUBSCRIPT italic_t = italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 1 / 2 italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 / 8 italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - divide start_ARG italic_c end_ARG start_ARG italic_k end_ARG | italic_d italic_t
t=aq~1/2k|FP(x)FP|q1k,q11k(t)|𝑑t+q1/8kq1/k|FP(t)8cFP(t)|𝑑tabsentsuperscriptsubscript𝑡𝑎subscript~𝑞12𝑘subscript𝐹𝑃𝑥subscript𝐹evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘𝑡differential-d𝑡superscriptsubscriptsubscript𝑞18𝑘subscript𝑞1𝑘subscript𝐹𝑃𝑡8𝑐subscript𝐹𝑃𝑡differential-d𝑡\displaystyle\leq\int_{t=a}^{\tilde{q}_{1/2k}}|F_{P}(x)-F_{P|_{q_{\frac{1}{k}}% ,q_{1-\frac{1}{k}}}}(t)|dt+\int_{q_{1/8k}}^{q_{1/k}}|F_{P}(t)-8cF_{P}(t)|dt≤ ∫ start_POSTSUBSCRIPT italic_t = italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 1 / 2 italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_x ) - italic_F start_POSTSUBSCRIPT italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 / 8 italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - 8 italic_c italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t
(18c)[t=aq~1/2k|FP(t)FP|q1k,q11k(t)|𝑑t+q1/8kq1/k|FP(t)|𝑑t]absent18𝑐delimited-[]superscriptsubscript𝑡𝑎subscript~𝑞12𝑘subscript𝐹𝑃𝑡subscript𝐹evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘𝑡differential-d𝑡superscriptsubscriptsubscript𝑞18𝑘subscript𝑞1𝑘subscript𝐹𝑃𝑡differential-d𝑡\displaystyle\leq(1-8c)\left[\int_{t=a}^{\tilde{q}_{1/2k}}|F_{P}(t)-F_{P|_{q_{% \frac{1}{k}},q_{1-\frac{1}{k}}}}(t)|dt+\int_{q_{1/8k}}^{q_{1/k}}|F_{P}(t)|dt\right]≤ ( 1 - 8 italic_c ) [ ∫ start_POSTSUBSCRIPT italic_t = italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 1 / 2 italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 / 8 italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t ]
(18c)[t=aq~1/2k|FP(t)FP|q1k,q11k(t)|𝑑t+q1/8kq1/k|FP(t)FP|q1k,q11k(t)|𝑑t]absent18𝑐delimited-[]superscriptsubscript𝑡𝑎subscript~𝑞12𝑘subscript𝐹𝑃𝑡subscript𝐹evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘𝑡differential-d𝑡superscriptsubscriptsubscript𝑞18𝑘subscript𝑞1𝑘subscript𝐹𝑃𝑡subscript𝐹evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘𝑡differential-d𝑡\displaystyle\leq(1-8c)\left[\int_{t=a}^{\tilde{q}_{1/2k}}|F_{P}(t)-F_{P|_{q_{% \frac{1}{k}},q_{1-\frac{1}{k}}}}(t)|dt+\int_{q_{1/8k}}^{q_{1/k}}|F_{P}(t)-F_{P% |_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}}}(t)|dt\right]≤ ( 1 - 8 italic_c ) [ ∫ start_POSTSUBSCRIPT italic_t = italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 1 / 2 italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 / 8 italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t ]
2(18c)𝒲(P,P|q1k,q11k)absent218𝑐𝒲𝑃evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘\displaystyle\leq 2(1-8c)\mathcal{W}(P,P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}})≤ 2 ( 1 - 8 italic_c ) caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT )

By a symmetric argument, we also have that with probability at least 1O(β)1𝑂𝛽1-O(\beta)1 - italic_O ( italic_β ),

t=q11/kb|FP(t)FPDP(t)|𝑑t2(18c)𝒲(P,P|q1k,q11k).superscriptsubscript𝑡subscript𝑞11𝑘𝑏subscript𝐹𝑃𝑡subscript𝐹superscript𝑃𝐷𝑃𝑡differential-d𝑡218𝑐𝒲𝑃evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘\int_{t=q_{1-1/k}}^{b}|F_{P}(t)-F_{P^{DP}}(t)|dt\leq 2(1-8c)\mathcal{W}(P,P|_{% q_{\frac{1}{k}},q_{1-\frac{1}{k}}}).∫ start_POSTSUBSCRIPT italic_t = italic_q start_POSTSUBSCRIPT 1 - 1 / italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t ≤ 2 ( 1 - 8 italic_c ) caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) .

Taking a union bound to ensure that all terms in Equation 14 are bounded as required, the proof is complete. ∎

Proof of Lemma 6.17.

First, we condition on the event in Corollary E.3 (on the accuracy of differentially private quantile estimates) that for all r[k]𝑟delimited-[]𝑘r\in[k]italic_r ∈ [ italic_k ],

q^2r12k14kq~rq^2r12k+14k,subscript^𝑞2𝑟12𝑘14𝑘subscript~𝑞𝑟subscript^𝑞2𝑟12𝑘14𝑘\hat{q}_{\frac{2r-1}{2k}-\frac{1}{4k}}\leq\tilde{q}_{r}\leq\hat{q}_{\frac{2r-1% }{2k}+\frac{1}{4k}},over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT divide start_ARG 2 italic_r - 1 end_ARG start_ARG 2 italic_k end_ARG - divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG end_POSTSUBSCRIPT ≤ over~ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ≤ over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT divide start_ARG 2 italic_r - 1 end_ARG start_ARG 2 italic_k end_ARG + divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG end_POSTSUBSCRIPT ,

note that this event happens with probability at least 1β1𝛽1-\beta1 - italic_β over the randomness of the algorithm.

Observe that this implies that FDPsubscript𝐹𝐷𝑃F_{DP}italic_F start_POSTSUBSCRIPT italic_D italic_P end_POSTSUBSCRIPT increases by 1k1𝑘\frac{1}{k}divide start_ARG 1 end_ARG start_ARG italic_k end_ARG somewhere in the range [q^2r12k14k,q^2r12k+14k]subscript^𝑞2𝑟12𝑘14𝑘subscript^𝑞2𝑟12𝑘14𝑘[\hat{q}_{\frac{2r-1}{2k}-\frac{1}{4k}},\hat{q}_{\frac{2r-1}{2k}+\frac{1}{4k}}][ over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT divide start_ARG 2 italic_r - 1 end_ARG start_ARG 2 italic_k end_ARG - divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG end_POSTSUBSCRIPT , over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT divide start_ARG 2 italic_r - 1 end_ARG start_ARG 2 italic_k end_ARG + divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG end_POSTSUBSCRIPT ] (for all r[k]𝑟delimited-[]𝑘r\in[k]italic_r ∈ [ italic_k ]) and remains constant outside these intervals.

Now, we show that for all t[a,b]𝑡𝑎𝑏t\in[a,b]italic_t ∈ [ italic_a , italic_b ], we have that |FPDP(t)FP^n(t)|2ksubscript𝐹superscript𝑃𝐷𝑃𝑡subscript𝐹subscript^𝑃𝑛𝑡2𝑘|F_{P^{DP}}(t)-F_{\hat{P}_{n}}(t)|\leq\frac{2}{k}| italic_F start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | ≤ divide start_ARG 2 end_ARG start_ARG italic_k end_ARG.

If there exists t[a,q^14k)𝑡𝑎subscript^𝑞14𝑘t\in[a,\hat{q}_{\frac{1}{4k}})italic_t ∈ [ italic_a , over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG end_POSTSUBSCRIPT ), we have that FPDP(t)=0subscript𝐹subscript𝑃𝐷𝑃𝑡0F_{P_{DP}}(t)=0italic_F start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_D italic_P end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) = 0, and FP^n(t)14ksubscript𝐹subscript^𝑃𝑛𝑡14𝑘F_{\hat{P}_{n}}(t)\leq\frac{1}{4k}italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) ≤ divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG, which implies that |FDP(t)FP^n(t)|14ksubscript𝐹𝐷𝑃𝑡subscript𝐹subscript^𝑃𝑛𝑡14𝑘|F_{DP}(t)-F_{\hat{P}_{n}}(t)|\leq\frac{1}{4k}| italic_F start_POSTSUBSCRIPT italic_D italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | ≤ divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG. If there exists no such t𝑡titalic_t, then we have that a=q^14k𝑎subscript^𝑞14𝑘a=\hat{q}_{\frac{1}{4k}}italic_a = over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG end_POSTSUBSCRIPT, and the corresponding interval collapses to a single point (which will fall in another interval considered below).

Next, fix any r[k1]𝑟delimited-[]𝑘1r\in[k-1]italic_r ∈ [ italic_k - 1 ]. Note that if there exists t[q^2r12k14k,q^2r+12k14k)𝑡subscript^𝑞2𝑟12𝑘14𝑘subscript^𝑞2𝑟12𝑘14𝑘t\in[\hat{q}_{\frac{2r-1}{2k}-\frac{1}{4k}},\hat{q}_{\frac{2r+1}{2k}-\frac{1}{% 4k}})italic_t ∈ [ over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT divide start_ARG 2 italic_r - 1 end_ARG start_ARG 2 italic_k end_ARG - divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG end_POSTSUBSCRIPT , over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT divide start_ARG 2 italic_r + 1 end_ARG start_ARG 2 italic_k end_ARG - divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG end_POSTSUBSCRIPT ), we have for all such t𝑡titalic_t that r1kFDP(t)<rk𝑟1𝑘subscript𝐹𝐷𝑃𝑡𝑟𝑘\frac{r-1}{k}\leq F_{DP}(t)<\frac{r}{k}divide start_ARG italic_r - 1 end_ARG start_ARG italic_k end_ARG ≤ italic_F start_POSTSUBSCRIPT italic_D italic_P end_POSTSUBSCRIPT ( italic_t ) < divide start_ARG italic_r end_ARG start_ARG italic_k end_ARG, and 2r12k14kFP^n(t)2r+12k+14k2𝑟12𝑘14𝑘subscript𝐹subscript^𝑃𝑛𝑡2𝑟12𝑘14𝑘\frac{2r-1}{2k}-\frac{1}{4k}\leq F_{\hat{P}_{n}}(t)\leq\frac{2r+1}{2k}+\frac{1% }{4k}divide start_ARG 2 italic_r - 1 end_ARG start_ARG 2 italic_k end_ARG - divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG ≤ italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) ≤ divide start_ARG 2 italic_r + 1 end_ARG start_ARG 2 italic_k end_ARG + divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG. This implies that for all such t𝑡titalic_t, |FDP(t)FP^n(t)|2ksubscript𝐹𝐷𝑃𝑡subscript𝐹subscript^𝑃𝑛𝑡2𝑘|F_{DP}(t)-F_{\hat{P}_{n}}(t)|\leq\frac{2}{k}| italic_F start_POSTSUBSCRIPT italic_D italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | ≤ divide start_ARG 2 end_ARG start_ARG italic_k end_ARG. If there exists no such t𝑡titalic_t, then we have that q^2r12k14k=q^2r+12k14ksubscript^𝑞2𝑟12𝑘14𝑘subscript^𝑞2𝑟12𝑘14𝑘\hat{q}_{\frac{2r-1}{2k}-\frac{1}{4k}}=\hat{q}_{\frac{2r+1}{2k}-\frac{1}{4k}}over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT divide start_ARG 2 italic_r - 1 end_ARG start_ARG 2 italic_k end_ARG - divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG end_POSTSUBSCRIPT = over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT divide start_ARG 2 italic_r + 1 end_ARG start_ARG 2 italic_k end_ARG - divide start_ARG 1 end_ARG start_ARG 4 italic_k end_ARG end_POSTSUBSCRIPT, and this r𝑟ritalic_r is not relevant since the corresponding interval collapses to a single point (that is considered in another interval).

Finally, for t[q^2k12k,b]𝑡subscript^𝑞2𝑘12𝑘𝑏t\in[\hat{q}_{\frac{2k-1}{2k}},b]italic_t ∈ [ over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT divide start_ARG 2 italic_k - 1 end_ARG start_ARG 2 italic_k end_ARG end_POSTSUBSCRIPT , italic_b ], we have that FPDP(t)11ksubscript𝐹subscript𝑃𝐷𝑃𝑡11𝑘F_{P_{DP}}(t)\geq 1-\frac{1}{k}italic_F start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_D italic_P end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) ≥ 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG, and FP^n(t)112ksubscript𝐹subscript^𝑃𝑛𝑡112𝑘F_{\hat{P}_{n}}(t)\geq 1-\frac{1}{2k}italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) ≥ 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_k end_ARG, so we have that |FDP(t)FP^n(t)|1ksubscript𝐹𝐷𝑃𝑡subscript𝐹subscript^𝑃𝑛𝑡1𝑘|F_{DP}(t)-F_{\hat{P}_{n}}(t)|\leq\frac{1}{k}| italic_F start_POSTSUBSCRIPT italic_D italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | ≤ divide start_ARG 1 end_ARG start_ARG italic_k end_ARG.

Note that every t[a,b]𝑡𝑎𝑏t\in[a,b]italic_t ∈ [ italic_a , italic_b ] is considered in some interval above and hence we have shown that for all t[a,b]𝑡𝑎𝑏t\in[a,b]italic_t ∈ [ italic_a , italic_b ], we have that |FPDP(t)FP^n(t)|2ksubscript𝐹superscript𝑃𝐷𝑃𝑡subscript𝐹subscript^𝑃𝑛𝑡2𝑘|F_{P^{DP}}(t)-F_{\hat{P}_{n}}(t)|\leq\frac{2}{k}| italic_F start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | ≤ divide start_ARG 2 end_ARG start_ARG italic_k end_ARG.

Finally, using the formula for Wasserstein distance (and the definition of a restriction), we have that

𝒲(P^n|q1k,q11k,PDP|q1k,q11k)𝒲evaluated-atsubscript^𝑃𝑛subscript𝑞1𝑘subscript𝑞11𝑘evaluated-atsuperscript𝑃𝐷𝑃subscript𝑞1𝑘subscript𝑞11𝑘\displaystyle\mathcal{W}(\hat{P}_{n}|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}},P^{% DP}|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}})caligraphic_W ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) =ab|FP^n|q1k,q11k(t)FPDP|q1k,q11k(t)|𝑑tabsentsuperscriptsubscript𝑎𝑏subscript𝐹evaluated-atsubscript^𝑃𝑛subscript𝑞1𝑘subscript𝑞11𝑘𝑡subscript𝐹evaluated-atsuperscript𝑃𝐷𝑃subscript𝑞1𝑘subscript𝑞11𝑘𝑡differential-d𝑡\displaystyle=\int_{a}^{b}\left|F_{\hat{P}_{n}|_{q_{\frac{1}{k}},q_{1-\frac{1}% {k}}}}(t)-F_{P^{DP}|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}}}(t)\right|dt= ∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t (16)
=q1kq11k|FP(t)FPDP(t)|𝑑tabsentsuperscriptsubscriptsubscript𝑞1𝑘subscript𝑞11𝑘subscript𝐹𝑃𝑡subscript𝐹superscript𝑃𝐷𝑃𝑡differential-d𝑡\displaystyle=\int_{q_{\frac{1}{k}}}^{q_{1-\frac{1}{k}}}\left|F_{P}(t)-F_{P^{% DP}}(t)\right|dt= ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t (17)
q1kq11k2k𝑑tabsentsuperscriptsubscriptsubscript𝑞1𝑘subscript𝑞11𝑘2𝑘differential-d𝑡\displaystyle\leq\int_{q_{\frac{1}{k}}}^{q_{1-\frac{1}{k}}}\frac{2}{k}dt≤ ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG italic_k end_ARG italic_d italic_t (18)
2k(q11/kq1/k)absent2𝑘subscript𝑞11𝑘subscript𝑞1𝑘\displaystyle\leq\frac{2}{k}\left(q_{1-1/k}-q_{1/k}\right)≤ divide start_ARG 2 end_ARG start_ARG italic_k end_ARG ( italic_q start_POSTSUBSCRIPT 1 - 1 / italic_k end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT ) (19)

Before the proof of Claim 6.18, we state the following variance-dependent version of the DKW inequality that uniformly bounds the absolute difference in CDFs between the true and empirical distribution.

Theorem F.3 (See for example Theorem 1.2 in [BM23]).

Fix n>0𝑛0n>0italic_n > 0. There are absolute constants c0,c1subscript𝑐0subscript𝑐1c_{0},c_{1}italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT such that for all Δc0loglognnΔsubscript𝑐0𝑛𝑛\Delta\geq\frac{c_{0}\log\log n}{n}roman_Δ ≥ divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT roman_log roman_log italic_n end_ARG start_ARG italic_n end_ARG,

Pr[supt:FP(t)(1FP(t))Δ|FP(t)FP^n(t)|ΔF(t)(1F(t)]2ec1Δn\displaystyle\Pr\left[\sup_{t:F_{P}(t)(1-F_{P}(t))\geq\Delta}\Big{|}F_{P}(t)-F% _{\hat{P}_{n}}(t)\Big{|}\geq\sqrt{\Delta\cdot{F(t)(1-F(t)}}\right]\leq 2e^{-c_% {1}\Delta n}roman_Pr [ roman_sup start_POSTSUBSCRIPT italic_t : italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ( 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ) ≥ roman_Δ end_POSTSUBSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | ≥ square-root start_ARG roman_Δ ⋅ italic_F ( italic_t ) ( 1 - italic_F ( italic_t ) end_ARG ] ≤ 2 italic_e start_POSTSUPERSCRIPT - italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_Δ italic_n end_POSTSUPERSCRIPT

We also state the following lemma on Binomial random variables, which is a simple consequence of a Lemma by Bobkov and Ledoux [BL19].

Lemma F.4 (Lemma 3.8 in [BL19]).

Let Sn=i=1nηisubscript𝑆𝑛superscriptsubscript𝑖1𝑛subscript𝜂𝑖S_{n}=\sum_{i=1}^{n}\eta_{i}italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be the sum of n𝑛nitalic_n independent Bernoulli random variables with Pr[ηi=1]=pprobabilitysubscript𝜂𝑖1𝑝\Pr[\eta_{i}=1]=proman_Pr [ italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 ] = italic_p and Pr[ηi=0]=q=1pprobabilitysubscript𝜂𝑖0𝑞1𝑝\Pr[\eta_{i}=0]=q=1-proman_Pr [ italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 ] = italic_q = 1 - italic_p (for all i𝑖iitalic_i). Also assume p[1n,11n]𝑝1𝑛11𝑛p\in[\frac{1}{n},1-\frac{1}{n}]italic_p ∈ [ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG , 1 - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ]. Then, for some sufficiently small constant c𝑐citalic_c,

cnpq𝔼[|Snnp|]npq𝑐𝑛𝑝𝑞𝔼delimited-[]subscript𝑆𝑛𝑛𝑝𝑛𝑝𝑞\displaystyle c\sqrt{npq}\leq\mathbb{E}[|S_{n}-np|]\leq\sqrt{npq}italic_c square-root start_ARG italic_n italic_p italic_q end_ARG ≤ blackboard_E [ | italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_n italic_p | ] ≤ square-root start_ARG italic_n italic_p italic_q end_ARG
Proof of Claim 6.18.

Now, by the formula for Wasserstein distance, the definition of restriction, and Fubini’s theorem, we have that

𝔼[𝒲(P|q1k,q11k,P^n|q1k,q11k)]=𝔼[q1kq11k|FP(t)FP^n(t)|𝑑t]=q1kq11k𝔼[|FP(t)FP^n(t)|]𝑑t𝔼delimited-[]𝒲evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘evaluated-atsubscript^𝑃𝑛subscript𝑞1𝑘subscript𝑞11𝑘𝔼delimited-[]superscriptsubscriptsubscript𝑞1𝑘subscript𝑞11𝑘subscript𝐹𝑃𝑡subscript𝐹subscript^𝑃𝑛𝑡differential-d𝑡superscriptsubscriptsubscript𝑞1𝑘subscript𝑞11𝑘𝔼delimited-[]subscript𝐹𝑃𝑡subscript𝐹subscript^𝑃𝑛𝑡differential-d𝑡\displaystyle\mathbb{E}[\mathcal{W}(P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}},% \hat{P}_{n}|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}})]=\mathbb{E}\Big{[}\int_{q_{% \frac{1}{k}}}^{q_{1-\frac{1}{k}}}\Big{|}F_{P}(t)-F_{\hat{P}_{n}}(t)\Big{|}dt% \Big{]}=\int_{q_{\frac{1}{k}}}^{q_{1-\frac{1}{k}}}\mathbb{E}\Big{[}\Big{|}F_{P% }(t)-F_{\hat{P}_{n}}(t)\Big{|}\Big{]}dtblackboard_E [ caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] = blackboard_E [ ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t ] = ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUPERSCRIPT blackboard_E [ | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | ] italic_d italic_t

By Lemma F.4, using the fact that FP^n(t)=i=1n1[xit]subscript𝐹subscript^𝑃𝑛𝑡superscriptsubscript𝑖1𝑛1delimited-[]subscript𝑥𝑖𝑡F_{\hat{P}_{n}}(t)=\sum_{i=1}^{n}\mathrm{1}[x_{i}\leq t]italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT 1 [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_t ], where each term in the sum is an independent Bernoulli random variable with expectation FP(t)subscript𝐹𝑃𝑡F_{P}(t)italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ), with q1kt<q11ksubscript𝑞1𝑘𝑡subscript𝑞11𝑘q_{\frac{1}{k}}\leq t<q_{1-\frac{1}{k}}italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT ≤ italic_t < italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT (ensuring that the conditions of the lemma are met), we get that 𝔼[|FP(t)FP^n(t)|]cFP(t)[1FP(t)]n𝔼delimited-[]subscript𝐹𝑃𝑡subscript𝐹subscript^𝑃𝑛𝑡𝑐subscript𝐹𝑃𝑡delimited-[]1subscript𝐹𝑃𝑡𝑛\mathbb{E}\Big{[}\Big{|}F_{P}(t)-F_{\hat{P}_{n}}(t)\Big{|}\Big{]}\geq c\sqrt{% \frac{F_{P}(t)[1-F_{P}(t)]}{n}}blackboard_E [ | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | ] ≥ italic_c square-root start_ARG divide start_ARG italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) [ 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ] end_ARG start_ARG italic_n end_ARG end_ARG, which gives

𝔼[𝒲(P|q1k,q11k,P^n|q1k,q11k)]𝔼delimited-[]𝒲evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘evaluated-atsubscript^𝑃𝑛subscript𝑞1𝑘subscript𝑞11𝑘\displaystyle\mathbb{E}[\mathcal{W}(P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}},% \hat{P}_{n}|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}})]blackboard_E [ caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] cq1kq11kFP(t)[1FP(t)]n𝑑tabsent𝑐superscriptsubscriptsubscript𝑞1𝑘subscript𝑞11𝑘subscript𝐹𝑃𝑡delimited-[]1subscript𝐹𝑃𝑡𝑛differential-d𝑡\displaystyle\geq c\int_{q_{\frac{1}{k}}}^{q_{1-\frac{1}{k}}}\sqrt{\frac{F_{P}% (t)[1-F_{P}(t)]}{n}}dt≥ italic_c ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUPERSCRIPT square-root start_ARG divide start_ARG italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) [ 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ] end_ARG start_ARG italic_n end_ARG end_ARG italic_d italic_t

Now, consider the random variable 𝒲(P|q1k,q11k,P^n|q1k,q11k)𝒲evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘evaluated-atsubscript^𝑃𝑛subscript𝑞1𝑘subscript𝑞11𝑘\mathcal{W}(P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}},\hat{P}_{n}|_{q_{\frac{1}{k% }},q_{1-\frac{1}{k}}})caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ). Note that 1kc3lognβn1𝑘subscript𝑐3𝑛𝛽𝑛\frac{1}{k}\geq\frac{c_{3}\log\frac{n}{\beta}}{n}divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ≥ divide start_ARG italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT roman_log divide start_ARG italic_n end_ARG start_ARG italic_β end_ARG end_ARG start_ARG italic_n end_ARG (for an appropriately chosen c3subscript𝑐3c_{3}italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT), and so we are in the regime where we can apply Theorem F.3 for an appropriately chosen ΔΔ\Deltaroman_Δ.

In particular, we have that for t[q1k,q11k)𝑡subscript𝑞1𝑘subscript𝑞11𝑘t\in[q_{\frac{1}{k}},q_{1-\frac{1}{k}})italic_t ∈ [ italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT ), FP(t)[1k,11k)subscript𝐹𝑃𝑡1𝑘11𝑘F_{P}(t)\in[\frac{1}{k},1-\frac{1}{k})italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ∈ [ divide start_ARG 1 end_ARG start_ARG italic_k end_ARG , 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ).

Setting Δ=lognβc1nΔ𝑛𝛽subscript𝑐1𝑛\Delta=\frac{\log\frac{n}{\beta}}{c_{1}n}roman_Δ = divide start_ARG roman_log divide start_ARG italic_n end_ARG start_ARG italic_β end_ARG end_ARG start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n end_ARG, we have that Δc0loglognnΔsubscript𝑐0𝑛𝑛\Delta\geq c_{0}\frac{\log\log n}{n}roman_Δ ≥ italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT divide start_ARG roman_log roman_log italic_n end_ARG start_ARG italic_n end_ARG, and Δ12kΔ12𝑘\Delta\leq\frac{1}{2k}roman_Δ ≤ divide start_ARG 1 end_ARG start_ARG 2 italic_k end_ARG (the second inequality for sufficiently large c3subscript𝑐3c_{3}italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT). In particular, this implies for t[q1k,q11k)𝑡subscript𝑞1𝑘subscript𝑞11𝑘t\in[q_{\frac{1}{k}},q_{1-\frac{1}{k}})italic_t ∈ [ italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT ), FP(t)[2Δ,12Δ)subscript𝐹𝑃𝑡2Δ12ΔF_{P}(t)\in[2\Delta,1-2\Delta)italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ∈ [ 2 roman_Δ , 1 - 2 roman_Δ ), which implies that FP(t)(1FP(t))Δsubscript𝐹𝑃𝑡1subscript𝐹𝑃𝑡ΔF_{P}(t)(1-F_{P}(t))\geq\Deltaitalic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ( 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ) ≥ roman_Δ, as long as n>c4lognβ𝑛subscript𝑐4𝑛𝛽n>c_{4}\log\frac{n}{\beta}italic_n > italic_c start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT roman_log divide start_ARG italic_n end_ARG start_ARG italic_β end_ARG for some sufficiently large constant c4subscript𝑐4c_{4}italic_c start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT.

Now, using Theorem F.3, we have that with probability at least 12ec1lognβc1nn1O(β)12superscript𝑒subscript𝑐1𝑛𝛽subscript𝑐1𝑛𝑛1𝑂𝛽1-2e^{-c_{1}\frac{\log\frac{n}{\beta}}{c_{1}n}n}\geq 1-O(\beta)1 - 2 italic_e start_POSTSUPERSCRIPT - italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT divide start_ARG roman_log divide start_ARG italic_n end_ARG start_ARG italic_β end_ARG end_ARG start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n end_ARG italic_n end_POSTSUPERSCRIPT ≥ 1 - italic_O ( italic_β ),

supt[q1k,q11k)|FP(t)FP^n(t)|lognβc1nFP(t)(1FP(t))subscriptsupremum𝑡subscript𝑞1𝑘subscript𝑞11𝑘subscript𝐹𝑃𝑡subscript𝐹subscript^𝑃𝑛𝑡𝑛𝛽subscript𝑐1𝑛subscript𝐹𝑃𝑡1subscript𝐹𝑃𝑡\sup_{t\in[q_{\frac{1}{k}},q_{1-\frac{1}{k}})}\Big{|}F_{P}(t)-F_{\hat{P}_{n}}(% t)\Big{|}\leq\sqrt{\frac{\log\frac{n}{\beta}}{c_{1}n}{F_{P}(t)(1-F_{P}(t))}}roman_sup start_POSTSUBSCRIPT italic_t ∈ [ italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | ≤ square-root start_ARG divide start_ARG roman_log divide start_ARG italic_n end_ARG start_ARG italic_β end_ARG end_ARG start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n end_ARG italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ( 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ) end_ARG

Condition on this for the rest of the proof. Then, we can write the following set of equations.

𝒲(P|q1k,q11k,P^n|q1k,q11k)𝒲evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘evaluated-atsubscript^𝑃𝑛subscript𝑞1𝑘subscript𝑞11𝑘\displaystyle\mathcal{W}(P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}},\hat{P}_{n}|_{% q_{\frac{1}{k}},q_{1-\frac{1}{k}}})caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) =q1kq11k|FP(t)FP^n(t)|𝑑tabsentsuperscriptsubscriptsubscript𝑞1𝑘subscript𝑞11𝑘subscript𝐹𝑃𝑡subscript𝐹subscript^𝑃𝑛𝑡differential-d𝑡\displaystyle=\int_{q_{\frac{1}{k}}}^{q_{1-\frac{1}{k}}}|F_{P}(t)-F_{\hat{P}_{% n}}(t)|dt= ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t
q1kq11klognβc1nFP(t)(1FP(t))𝑑tabsentsuperscriptsubscriptsubscript𝑞1𝑘subscript𝑞11𝑘𝑛𝛽subscript𝑐1𝑛subscript𝐹𝑃𝑡1subscript𝐹𝑃𝑡differential-d𝑡\displaystyle\leq\int_{q_{\frac{1}{k}}}^{q_{1-\frac{1}{k}}}\sqrt{\frac{\log% \frac{n}{\beta}}{c_{1}n}{F_{P}(t)(1-F_{P}(t))}}dt≤ ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUPERSCRIPT square-root start_ARG divide start_ARG roman_log divide start_ARG italic_n end_ARG start_ARG italic_β end_ARG end_ARG start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n end_ARG italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ( 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ) end_ARG italic_d italic_t
c5lognβq1kq11kFP(t)(1FP(t))n𝑑tabsentsubscript𝑐5𝑛𝛽superscriptsubscriptsubscript𝑞1𝑘subscript𝑞11𝑘subscript𝐹𝑃𝑡1subscript𝐹𝑃𝑡𝑛differential-d𝑡\displaystyle\leq\sqrt{c_{5}\log\frac{n}{\beta}}\int_{q_{\frac{1}{k}}}^{q_{1-% \frac{1}{k}}}\sqrt{\frac{F_{P}(t)(1-F_{P}(t))}{n}}dt≤ square-root start_ARG italic_c start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT roman_log divide start_ARG italic_n end_ARG start_ARG italic_β end_ARG end_ARG ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUPERSCRIPT square-root start_ARG divide start_ARG italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ( 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ) end_ARG start_ARG italic_n end_ARG end_ARG italic_d italic_t
c6lognβ𝔼[𝒲(P|q1k,q11k,P^n|q1k,q11k)]absentsubscript𝑐6𝑛𝛽𝔼delimited-[]𝒲evaluated-at𝑃subscript𝑞1𝑘subscript𝑞11𝑘evaluated-atsubscript^𝑃𝑛subscript𝑞1𝑘subscript𝑞11𝑘\displaystyle\leq\sqrt{c_{6}\log\frac{n}{\beta}}\mathbb{E}[\mathcal{W}(P|_{q_{% \frac{1}{k}},q_{1-\frac{1}{k}}},\hat{P}_{n}|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}% }})]≤ square-root start_ARG italic_c start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT roman_log divide start_ARG italic_n end_ARG start_ARG italic_β end_ARG end_ARG blackboard_E [ caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ]

as required.

F.4 Local Minimality in the One-Dimensional Setting

In this subsection, we argue that the instance-optimal algorithm discussed in Section 6.2 is also locally-minimal (See Section 3.2 for a discussion of local minimality).

First, we state a corollary of our upper bound for continuous distributions, Theorem 6.14. This corollary follows by discretizing the distribution and applying the previous upper bound to the discretized distribution. The parameters of the discretized distribution are related to that of the original distribution via simple coupling arguments.

Corollary F.5.

Fix ε,β(0,1]𝜀𝛽01\varepsilon,\beta\in(0,1]italic_ε , italic_β ∈ ( 0 , 1 ], a,b𝑎𝑏a,b\in\mathbb{R}italic_a , italic_b ∈ blackboard_R, n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N. Let P𝑃Pitalic_P be any continuous distribution supported on [a,b]𝑎𝑏[a,b][ italic_a , italic_b ]. Consider any γ<ba𝛾𝑏𝑎\gamma<b-a\in\mathbb{R}italic_γ < italic_b - italic_a ∈ blackboard_R (such that γ𝛾\gammaitalic_γ divides ba𝑏𝑎b-aitalic_b - italic_a), and let n>c2log4baγβεε𝑛subscript𝑐2superscript4𝑏𝑎𝛾𝛽𝜀𝜀n>c_{2}\frac{\log^{4}\frac{b-a}{\gamma\beta\varepsilon}}{\varepsilon}italic_n > italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT divide start_ARG roman_log start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG italic_b - italic_a end_ARG start_ARG italic_γ italic_β italic_ε end_ARG end_ARG start_ARG italic_ε end_ARG for some sufficiently large constant c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Then, there exists an algorithm, that when given inputs 𝐱Pnsimilar-to𝐱superscript𝑃𝑛{\bf x}\sim P^{n}bold_x ∼ italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, privacy parameter ε𝜀\varepsilonitalic_ε, interval end points a,b𝑎𝑏a,bitalic_a , italic_b, granularity γ𝛾\gammaitalic_γ, and access to algorithm Aquantsubscript𝐴𝑞𝑢𝑎𝑛𝑡A_{quant}italic_A start_POSTSUBSCRIPT italic_q italic_u italic_a italic_n italic_t end_POSTSUBSCRIPT, outputs a distribution PDPsuperscript𝑃𝐷𝑃P^{DP}italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT such that with probability at least 1O(β)1𝑂𝛽1-O(\beta)1 - italic_O ( italic_β ) over the randomness of 𝐱𝐱{\bf x}bold_x and the algorithm,

𝒲(P,PDP)=O(logn𝔼[𝒲(P|q1k,q11k,P^n|q1k,q11k]+𝒲(P,P|q1k,q11k)+1k(q11/kq1/k))+γ\mathcal{W}(P,P^{DP})=O\left(\sqrt{\log n}\mathbb{E}\left[\mathcal{W}(P|_{q_{% \frac{1}{k}},q_{1-\frac{1}{k}}},\hat{P}_{n}|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}% }}\right]+\mathcal{W}(P,P|_{q_{\frac{1}{k}},q_{1-\frac{1}{k}}})+\frac{1}{k}% \left(q_{1-1/k}-q_{1/k}\right)\right)+\gammacaligraphic_W ( italic_P , italic_P start_POSTSUPERSCRIPT italic_D italic_P end_POSTSUPERSCRIPT ) = italic_O ( square-root start_ARG roman_log italic_n end_ARG blackboard_E [ caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] + caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ( italic_q start_POSTSUBSCRIPT 1 - 1 / italic_k end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 1 / italic_k end_POSTSUBSCRIPT ) ) + italic_γ

where P^nsubscript^𝑃𝑛\hat{P}_{n}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the uniform distribution on 𝐱𝐱{\bf x}bold_x, qαsubscript𝑞𝛼q_{\alpha}italic_q start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT represents the α𝛼\alphaitalic_α-quantile of distribution P𝑃Pitalic_P, and k=εn4c3log3baβγlognβ𝑘𝜀𝑛4subscript𝑐3superscript3𝑏𝑎𝛽𝛾𝑛𝛽k=\lceil\frac{\varepsilon n}{4c_{3}\log^{3}\frac{b-a}{\beta\gamma}\log\frac{n}% {\beta}}\rceilitalic_k = ⌈ divide start_ARG italic_ε italic_n end_ARG start_ARG 4 italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT roman_log start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT divide start_ARG italic_b - italic_a end_ARG start_ARG italic_β italic_γ end_ARG roman_log divide start_ARG italic_n end_ARG start_ARG italic_β end_ARG end_ARG ⌉, where c3subscript𝑐3c_{3}italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT is a sufficiently large constant.

We state a lemma of Ledoux and Bobkov that we will use in the main proof of this section.

Lemma F.6 (Lemma 3.8 in [BL19]).

Let Sn=i=1nηisubscript𝑆𝑛superscriptsubscript𝑖1𝑛subscript𝜂𝑖S_{n}=\sum_{i=1}^{n}\eta_{i}italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be the sum of n𝑛nitalic_n independent Bernoulli random variables with Pr[ηi=1]=pprobabilitysubscript𝜂𝑖1𝑝\Pr[\eta_{i}=1]=proman_Pr [ italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 ] = italic_p and Pr[ηi=0]=q=1pprobabilitysubscript𝜂𝑖0𝑞1𝑝\Pr[\eta_{i}=0]=q=1-proman_Pr [ italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 ] = italic_q = 1 - italic_p (for all i𝑖iitalic_i). Then, for some sufficiently small constant c𝑐citalic_c,

cmin{2npq,npq}𝔼[|Snnp|]min{2npq,npq}𝑐2𝑛𝑝𝑞𝑛𝑝𝑞𝔼delimited-[]subscript𝑆𝑛𝑛𝑝2𝑛𝑝𝑞𝑛𝑝𝑞\displaystyle c\min\{2npq,\sqrt{npq}\}\leq\mathbb{E}[|S_{n}-np|]\leq\min\{2npq% ,\sqrt{npq}\}italic_c roman_min { 2 italic_n italic_p italic_q , square-root start_ARG italic_n italic_p italic_q end_ARG } ≤ blackboard_E [ | italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_n italic_p | ] ≤ roman_min { 2 italic_n italic_p italic_q , square-root start_ARG italic_n italic_p italic_q end_ARG }

Now, we are ready to state and prove the local minimality result. Note that the statement will reference the rates defined by Equation 1 in the introduction.

Theorem F.7.

Let a,b𝑎𝑏a,b\in\mathbb{R}italic_a , italic_b ∈ blackboard_R, γ𝛾\gamma\in\mathbb{R}italic_γ ∈ blackboard_R. For any continuous distribution P𝑃Pitalic_P over [a,b]𝑎𝑏[a,b][ italic_a , italic_b ] with a density, let N(P)={Q:D(P,Q)log2}𝑁𝑃conditional-set𝑄subscript𝐷𝑃𝑄2N(P)=\{Q:D_{\infty}(P,Q)\leq\log 2\}italic_N ( italic_P ) = { italic_Q : italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_Q ) ≤ roman_log 2 }. Fix β,γ,ε(0,1]𝛽𝛾𝜀01\beta,\gamma,\varepsilon\in(0,1]italic_β , italic_γ , italic_ε ∈ ( 0 , 1 ], and let n=Ω(log4baγεε)𝑛Ωsuperscript4𝑏𝑎𝛾𝜀𝜀n=\Omega\left(\frac{\log^{4}\frac{b-a}{\gamma\varepsilon}}{\varepsilon}\right)italic_n = roman_Ω ( divide start_ARG roman_log start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG italic_b - italic_a end_ARG start_ARG italic_γ italic_ε end_ARG end_ARG start_ARG italic_ε end_ARG ), with n=nc7lognlog3baγεsuperscript𝑛𝑛subscript𝑐7𝑛superscript3𝑏𝑎𝛾𝜀n^{\prime}=\frac{n}{c_{7}\log n\log^{3}\frac{b-a}{\gamma\varepsilon}}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG italic_n end_ARG start_ARG italic_c start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT roman_log italic_n roman_log start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT divide start_ARG italic_b - italic_a end_ARG start_ARG italic_γ italic_ε end_ARG end_ARG for some constant c7subscript𝑐7c_{7}italic_c start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT. There exists an algorithm 𝒜𝒜\mathcal{A}caligraphic_A such that for all continuous distributions P𝑃Pitalic_P, for all algorithms 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, there exists a distribution QN(P)𝑄𝑁𝑃Q\in N(P)italic_Q ∈ italic_N ( italic_P ) such that

R𝒜,n(Q)O(polylogn)max{R𝒜,n(Q),R𝒜,n/4(Q)}+γ,subscript𝑅𝒜𝑛𝑄𝑂polylog𝑛subscript𝑅superscript𝒜superscript𝑛𝑄subscript𝑅superscript𝒜superscript𝑛4𝑄𝛾R_{\mathcal{A},n}(Q)\leq O(\operatorname{polylog}n)\cdot\max\{R_{\mathcal{A}^{% \prime},\lceil n^{\prime}\rceil}(Q),R_{\mathcal{A}^{\prime},\lfloor n^{\prime}% /4\rfloor}(Q)\}+\gamma,italic_R start_POSTSUBSCRIPT caligraphic_A , italic_n end_POSTSUBSCRIPT ( italic_Q ) ≤ italic_O ( roman_polylog italic_n ) ⋅ roman_max { italic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_POSTSUBSCRIPT ( italic_Q ) , italic_R start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , ⌊ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / 4 ⌋ end_POSTSUBSCRIPT ( italic_Q ) } + italic_γ ,
Proof.

Let k=εn4c3log3baβγlognβ𝑘𝜀𝑛4subscript𝑐3superscript3𝑏𝑎𝛽𝛾𝑛𝛽k=\lceil\frac{\varepsilon n}{4c_{3}\log^{3}\frac{b-a}{\beta\gamma}\log\frac{n}% {\beta}}\rceilitalic_k = ⌈ divide start_ARG italic_ε italic_n end_ARG start_ARG 4 italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT roman_log start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT divide start_ARG italic_b - italic_a end_ARG start_ARG italic_β italic_γ end_ARG roman_log divide start_ARG italic_n end_ARG start_ARG italic_β end_ARG end_ARG ⌉, and set n=2nc4log3baβγlognβsuperscript𝑛2𝑛subscript𝑐4superscript3𝑏𝑎𝛽𝛾𝑛𝛽n^{\prime}=\frac{2n}{c_{4}\log^{3}\frac{b-a}{\beta\gamma}\log\frac{n}{\beta}}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG 2 italic_n end_ARG start_ARG italic_c start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT roman_log start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT divide start_ARG italic_b - italic_a end_ARG start_ARG italic_β italic_γ end_ARG roman_log divide start_ARG italic_n end_ARG start_ARG italic_β end_ARG end_ARG for a sufficiently large constant c4subscript𝑐4c_{4}italic_c start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT. Then, by Corollary F.5 with appropriately chosen β𝛽\betaitalic_β we have that with probability at least 0.95, for any distribution Q𝑄Qitalic_Q (and hence particularly any distribution QN(P)𝑄𝑁𝑃Q\in N(P)italic_Q ∈ italic_N ( italic_P ),

𝒲(Q,𝒜(Q^n)\displaystyle\mathcal{W}(Q,\mathcal{A}(\hat{Q}_{n})caligraphic_W ( italic_Q , caligraphic_A ( over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) =O(1εn(q12Cεn(Q)q2Cεn(Q))+𝒲(Q,Q|q2Cεn(Q),q12Cεn(Q))\displaystyle=O\Bigg{(}\frac{1}{\varepsilon n^{\prime}}\left(q_{1-\frac{2}{C% \varepsilon n^{\prime}}}(Q)-q_{\frac{2}{C\varepsilon n^{\prime}}}(Q)\right)+% \mathcal{W}(Q,Q|_{q_{\frac{2}{C\varepsilon n^{\prime}}(Q)},q_{1-\frac{2}{C% \varepsilon n^{\prime}}}(Q)})= italic_O ( divide start_ARG 1 end_ARG start_ARG italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ( italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_Q ) - italic_q start_POSTSUBSCRIPT divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_Q ) ) + caligraphic_W ( italic_Q , italic_Q | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ( italic_Q ) end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_Q ) end_POSTSUBSCRIPT )
+logn𝔼[𝒲(Q|q2Cεn(Q),q12Cεn(Q),Q^n|q2Cεn(Q),q12Cεn(Q))])+γ,\displaystyle+\sqrt{\log n}\mathbb{E}\left[\mathcal{W}\left(Q|_{q_{\frac{2}{C% \varepsilon n^{\prime}}}(Q),q_{1-\frac{2}{C\varepsilon n^{\prime}}}(Q)},\hat{Q% }_{n}|_{q_{\frac{2}{C\varepsilon n^{\prime}}}(Q),q_{1-\frac{2}{C\varepsilon n^% {\prime}}}(Q)}\right)\right]\Bigg{)}+\gamma,+ square-root start_ARG roman_log italic_n end_ARG blackboard_E [ caligraphic_W ( italic_Q | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_Q ) , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_Q ) end_POSTSUBSCRIPT , over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_Q ) , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_Q ) end_POSTSUBSCRIPT ) ] ) + italic_γ ,

where C𝐶Citalic_C is the constant referenced in Theorem 6.3. We will show that for distribution P𝑃Pitalic_P, each of the corresponding distribution-dependent terms is closely related to the terms for Q𝑄Qitalic_Q.

First, consider 1εn(q12Cεn(Q)q2Cεn(Q))1𝜀superscript𝑛subscript𝑞12𝐶𝜀superscript𝑛𝑄subscript𝑞2𝐶𝜀superscript𝑛𝑄\frac{1}{\varepsilon n^{\prime}}\left(q_{1-\frac{2}{C\varepsilon n^{\prime}}}(% Q)-q_{\frac{2}{C\varepsilon n^{\prime}}}(Q)\right)divide start_ARG 1 end_ARG start_ARG italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ( italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_Q ) - italic_q start_POSTSUBSCRIPT divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_Q ) ). Firstly, note that for all α(0,1)𝛼01\alpha\in(0,1)italic_α ∈ ( 0 , 1 ), qα(P)qα/2(Q)subscript𝑞𝛼𝑃subscript𝑞𝛼2𝑄q_{\alpha}(P)\geq q_{\alpha/2}(Q)italic_q start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_P ) ≥ italic_q start_POSTSUBSCRIPT italic_α / 2 end_POSTSUBSCRIPT ( italic_Q ), and qα(P)q2α(Q)subscript𝑞𝛼𝑃subscript𝑞2𝛼𝑄q_{\alpha}(P)\leq q_{2\alpha}(Q)italic_q start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_P ) ≤ italic_q start_POSTSUBSCRIPT 2 italic_α end_POSTSUBSCRIPT ( italic_Q ), since D(P,Q)ln2subscript𝐷𝑃𝑄2D_{\infty}(P,Q)\leq\ln 2italic_D start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_P , italic_Q ) ≤ roman_ln 2, which implies that 12FQ(t)FP(t)2FQ(t)12subscript𝐹𝑄𝑡subscript𝐹𝑃𝑡2subscript𝐹𝑄𝑡\frac{1}{2}F_{Q}(t)\leq F_{P}(t)\leq 2F_{Q}(t)divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) ≤ italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ≤ 2 italic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) for all t𝑡t\in\mathbb{R}italic_t ∈ blackboard_R. Similarly, note that for all α(0,1)𝛼01\alpha\in(0,1)italic_α ∈ ( 0 , 1 ), q1α(P)q12α(Q)subscript𝑞1𝛼𝑃subscript𝑞12𝛼𝑄q_{1-\alpha}(P)\geq q_{1-2\alpha}(Q)italic_q start_POSTSUBSCRIPT 1 - italic_α end_POSTSUBSCRIPT ( italic_P ) ≥ italic_q start_POSTSUBSCRIPT 1 - 2 italic_α end_POSTSUBSCRIPT ( italic_Q ), and q1α(P)q112α(Q)subscript𝑞1𝛼𝑃subscript𝑞112𝛼𝑄q_{1-\alpha}(P)\leq q_{1-\frac{1}{2}\cdot\alpha}(Q)italic_q start_POSTSUBSCRIPT 1 - italic_α end_POSTSUBSCRIPT ( italic_P ) ≤ italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ italic_α end_POSTSUBSCRIPT ( italic_Q ). Hence, we have that

1εn(q12Cεn(Q)q2Cεn(Q))1𝜀superscript𝑛subscript𝑞12𝐶𝜀superscript𝑛𝑄subscript𝑞2𝐶𝜀superscript𝑛𝑄\displaystyle\frac{1}{\varepsilon n^{\prime}}\left(q_{1-\frac{2}{C\varepsilon n% ^{\prime}}}(Q)-q_{\frac{2}{C\varepsilon n^{\prime}}}(Q)\right)divide start_ARG 1 end_ARG start_ARG italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ( italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_Q ) - italic_q start_POSTSUBSCRIPT divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_Q ) ) 1εn(q11Cεn(P)q1Cεn(P))absent1𝜀superscript𝑛subscript𝑞11𝐶𝜀superscript𝑛𝑃subscript𝑞1𝐶𝜀superscript𝑛𝑃\displaystyle\leq\frac{1}{\varepsilon n^{\prime}}\left(q_{1-\frac{1}{C% \varepsilon n^{\prime}}}(P)-q_{\frac{1}{C\varepsilon n^{\prime}}}(P)\right)≤ divide start_ARG 1 end_ARG start_ARG italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ( italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_P ) - italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_P ) )

Next, consider 𝒲(P,P|q1Cεn(P),q11Cεn(P))𝒲𝑃evaluated-at𝑃subscript𝑞1𝐶𝜀𝑛𝑃subscript𝑞11𝐶𝜀𝑛𝑃\mathcal{W}(P,P|_{q_{\frac{1}{C\varepsilon n}}(P),q_{1-\frac{1}{C\varepsilon n% }}(P)})caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT ( italic_P ) , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUBSCRIPT ). Recall that q1Cεn(P)q2Cεn(Q)subscript𝑞1𝐶𝜀𝑛𝑃subscript𝑞2𝐶𝜀𝑛𝑄q_{\frac{1}{C\varepsilon n}}(P)\leq q_{\frac{2}{C\varepsilon n}}(Q)italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT ( italic_P ) ≤ italic_q start_POSTSUBSCRIPT divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT ( italic_Q ), and q11Cεn(P)q12Cεn(Q)subscript𝑞11𝐶𝜀𝑛𝑃subscript𝑞12𝐶𝜀𝑛𝑄q_{1-\frac{1}{C\varepsilon n}}(P)\geq q_{1-\frac{2}{C\varepsilon n}}(Q)italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT ( italic_P ) ≥ italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT ( italic_Q ). Then, (noting that L(P)=L(Q)𝐿𝑃𝐿𝑄L(P)=L(Q)italic_L ( italic_P ) = italic_L ( italic_Q ) and q1(P)=q1(Q)subscript𝑞1𝑃subscript𝑞1𝑄q_{1}(P)=q_{1}(Q)italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_P ) = italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Q )), we have that

𝒲(Q,Q|q2Cεn(Q),q12Cεn(Q))𝒲𝑄evaluated-at𝑄subscript𝑞2𝐶𝜀superscript𝑛𝑄subscript𝑞12𝐶𝜀superscript𝑛𝑄\displaystyle\mathcal{W}(Q,Q|_{q_{\frac{2}{C\varepsilon n^{\prime}}}(Q),q_{1-% \frac{2}{C\varepsilon n^{\prime}}}(Q)})caligraphic_W ( italic_Q , italic_Q | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_Q ) , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_Q ) end_POSTSUBSCRIPT ) =L(Q)q2Cεn(Q)FQ(t)𝑑t+q12Cεn(Q)q1(Q)|1FQ(t)|𝑑tabsentsuperscriptsubscript𝐿𝑄subscript𝑞2𝐶𝜀superscript𝑛𝑄subscript𝐹𝑄𝑡differential-d𝑡superscriptsubscriptsubscript𝑞12𝐶𝜀superscript𝑛𝑄subscript𝑞1𝑄1subscript𝐹𝑄𝑡differential-d𝑡\displaystyle=\int_{L(Q)}^{q_{\frac{2}{C\varepsilon n^{\prime}}}(Q)}F_{Q}(t)dt% +\int_{q_{1-\frac{2}{C\varepsilon n^{\prime}}}(Q)}^{q_{1}(Q)}|1-F_{Q}(t)|dt= ∫ start_POSTSUBSCRIPT italic_L ( italic_Q ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_Q ) end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) italic_d italic_t + ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_Q ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Q ) end_POSTSUPERSCRIPT | 1 - italic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t
2L(Q)q2Cεn(Q)FP(t)𝑑t+2q11Cεn(Q)q1(Q)|1FP(t)|𝑑tabsent2superscriptsubscript𝐿𝑄subscript𝑞2𝐶𝜀superscript𝑛𝑄subscript𝐹𝑃𝑡differential-d𝑡2superscriptsubscriptsubscript𝑞11𝐶𝜀superscript𝑛𝑄subscript𝑞1𝑄1subscript𝐹𝑃𝑡differential-d𝑡\displaystyle\leq 2\int_{L(Q)}^{q_{\frac{2}{C\varepsilon n^{\prime}}}(Q)}F_{P}% (t)dt+2\int_{q_{1-\frac{1}{C\varepsilon n^{\prime}}}(Q)}^{q_{1}(Q)}|1-F_{P}(t)% |dt≤ 2 ∫ start_POSTSUBSCRIPT italic_L ( italic_Q ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_Q ) end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) italic_d italic_t + 2 ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_Q ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Q ) end_POSTSUPERSCRIPT | 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t
2L(P)q4Cεn(P)FP(t)𝑑t+2q114Cεn(P)q1(P)|1FP(t)|𝑑tabsent2superscriptsubscript𝐿𝑃subscript𝑞4𝐶𝜀superscript𝑛𝑃subscript𝐹𝑃𝑡differential-d𝑡2superscriptsubscriptsubscript𝑞114𝐶𝜀superscript𝑛𝑃subscript𝑞1𝑃1subscript𝐹𝑃𝑡differential-d𝑡\displaystyle\leq 2\int_{L(P)}^{q_{\frac{4}{C\varepsilon n^{\prime}}}(P)}F_{P}% (t)dt+2\int_{q_{1-\frac{1}{4C\varepsilon n^{\prime}}}(P)}^{q_{1}(P)}|1-F_{P}(t% )|dt≤ 2 ∫ start_POSTSUBSCRIPT italic_L ( italic_P ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 4 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) italic_d italic_t + 2 ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG 4 italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_P ) end_POSTSUPERSCRIPT | 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) | italic_d italic_t
=2𝒲(P,P|q4Cεn(P),q14Cεn(P))absent2𝒲𝑃evaluated-at𝑃subscript𝑞4𝐶𝜀superscript𝑛𝑃subscript𝑞14𝐶𝜀𝑛𝑃\displaystyle=2\mathcal{W}(P,P|_{q_{\frac{4}{C\varepsilon n^{\prime}}}(P),q_{1% -\frac{4}{C\varepsilon n}}(P)})= 2 caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 4 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_P ) , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 4 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUBSCRIPT )

Finally, consider 1logn𝔼[𝒲(P|q1Cεn(P),q11Cεn(P),P^n|q1Cεn(P),q11Cεn(P))]1𝑛𝔼delimited-[]𝒲evaluated-at𝑃subscript𝑞1𝐶𝜀𝑛𝑃subscript𝑞11𝐶𝜀𝑛𝑃evaluated-atsubscript^𝑃𝑛subscript𝑞1𝐶𝜀𝑛𝑃subscript𝑞11𝐶𝜀𝑛𝑃\frac{1}{\sqrt{\log n}}\mathbb{E}\left[\mathcal{W}(P|_{q_{\frac{1}{C% \varepsilon n}(P)},q_{1-\frac{1}{C\varepsilon n}}(P)},\hat{P}_{n}|_{q_{\frac{1% }{C\varepsilon n}(P)},q_{1-\frac{1}{C\varepsilon n}}(P)})\right]divide start_ARG 1 end_ARG start_ARG square-root start_ARG roman_log italic_n end_ARG end_ARG blackboard_E [ caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG ( italic_P ) end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG ( italic_P ) end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUBSCRIPT ) ]. By Fubini’s theorem and applying both inequalities in Lemma F.6, we have that

𝔼[𝒲(Q|q2Cεn(Q),q12Cεn(Q),Q^n|q2Cεn(Q),q12Cεn(Q))]𝔼delimited-[]𝒲evaluated-at𝑄subscript𝑞2𝐶𝜀superscript𝑛𝑄subscript𝑞12𝐶𝜀superscript𝑛𝑄evaluated-atsubscript^𝑄𝑛subscript𝑞2𝐶𝜀superscript𝑛𝑄subscript𝑞12𝐶𝜀superscript𝑛𝑄\displaystyle\mathbb{E}\left[\mathcal{W}(Q|_{q_{\frac{2}{C\varepsilon n^{% \prime}}(Q)},q_{1-\frac{2}{C\varepsilon n^{\prime}}}(Q)},\hat{Q}_{n}|_{q_{% \frac{2}{C\varepsilon n^{\prime}}(Q)},q_{1-\frac{2}{C\varepsilon n^{\prime}}}(% Q)})\right]blackboard_E [ caligraphic_W ( italic_Q | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ( italic_Q ) end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_Q ) end_POSTSUBSCRIPT , over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ( italic_Q ) end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_Q ) end_POSTSUBSCRIPT ) ]
=q2Cεn(Q)q12Cεn(Q)𝔼[|FQ(t)FQ^n(t)|]𝑑tabsentsuperscriptsubscriptsubscript𝑞2𝐶𝜀superscript𝑛𝑄subscript𝑞12𝐶𝜀superscript𝑛𝑄𝔼delimited-[]subscript𝐹𝑄𝑡subscript𝐹subscript^𝑄𝑛𝑡differential-d𝑡\displaystyle=\int_{q_{\frac{2}{C\varepsilon n^{\prime}}(Q)}}^{q_{1-\frac{2}{C% \varepsilon n^{\prime}}}(Q)}\mathbb{E}[|F_{Q}(t)-F_{\hat{Q}_{n}}(t)|]dt= ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ( italic_Q ) end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 2 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_Q ) end_POSTSUPERSCRIPT blackboard_E [ | italic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | ] italic_d italic_t
q1Cεn(P)q11Cεn(P)𝔼[|FQ(t)FQ^n(t)|]𝑑tabsentsuperscriptsubscriptsubscript𝑞1𝐶𝜀superscript𝑛𝑃subscript𝑞11𝐶𝜀superscript𝑛𝑃𝔼delimited-[]subscript𝐹𝑄𝑡subscript𝐹subscript^𝑄𝑛𝑡differential-d𝑡\displaystyle\leq\int_{q_{\frac{1}{C\varepsilon n^{\prime}}(P)}}^{q_{1-\frac{1% }{C\varepsilon n^{\prime}}}(P)}\mathbb{E}[|F_{Q}(t)-F_{\hat{Q}_{n}}(t)|]dt≤ ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ( italic_P ) end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUPERSCRIPT blackboard_E [ | italic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | ] italic_d italic_t
q1Cεn(P)q11Cεn(P)min{2FQ(t)[1FQ(t)],FQ(t)[1FQ(t)]n}𝑑tabsentsuperscriptsubscriptsubscript𝑞1𝐶𝜀superscript𝑛𝑃subscript𝑞11𝐶𝜀superscript𝑛𝑃2subscript𝐹𝑄𝑡delimited-[]1subscript𝐹𝑄𝑡subscript𝐹𝑄𝑡delimited-[]1subscript𝐹𝑄𝑡𝑛differential-d𝑡\displaystyle\leq\int_{q_{\frac{1}{C\varepsilon n^{\prime}}(P)}}^{q_{1-\frac{1% }{C\varepsilon n^{\prime}}}(P)}\min\left\{2F_{Q}(t)[1-F_{Q}(t)],\sqrt{\frac{F_% {Q}(t)[1-F_{Q}(t)]}{n}}\right\}dt≤ ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ( italic_P ) end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUPERSCRIPT roman_min { 2 italic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) [ 1 - italic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) ] , square-root start_ARG divide start_ARG italic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) [ 1 - italic_F start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) ] end_ARG start_ARG italic_n end_ARG end_ARG } italic_d italic_t
q1Cεn(P)q11Cεn(P)min{8FP(t)[1FP(t)],2FP(t)[1FP(t)]n}𝑑tabsentsuperscriptsubscriptsubscript𝑞1𝐶𝜀superscript𝑛𝑃subscript𝑞11𝐶𝜀superscript𝑛𝑃8subscript𝐹𝑃𝑡delimited-[]1subscript𝐹𝑃𝑡2subscript𝐹𝑃𝑡delimited-[]1subscript𝐹𝑃𝑡𝑛differential-d𝑡\displaystyle\leq\int_{q_{\frac{1}{C\varepsilon n^{\prime}}(P)}}^{q_{1-\frac{1% }{C\varepsilon n^{\prime}}}(P)}\min\left\{8F_{P}(t)[1-F_{P}(t)],2\sqrt{\frac{F% _{P}(t)[1-F_{P}(t)]}{n}}\right\}dt≤ ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ( italic_P ) end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUPERSCRIPT roman_min { 8 italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) [ 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ] , 2 square-root start_ARG divide start_ARG italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) [ 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ] end_ARG start_ARG italic_n end_ARG end_ARG } italic_d italic_t
q1Cεn(P)q11Cεn(P)min{8FP(t)[1FP(t)],2FP(t)[1FP(t)]n}𝑑tabsentsuperscriptsubscriptsubscript𝑞1𝐶𝜀superscript𝑛𝑃subscript𝑞11𝐶𝜀superscript𝑛𝑃8subscript𝐹𝑃𝑡delimited-[]1subscript𝐹𝑃𝑡2subscript𝐹𝑃𝑡delimited-[]1subscript𝐹𝑃𝑡superscript𝑛differential-d𝑡\displaystyle\leq\int_{q_{\frac{1}{C\varepsilon n^{\prime}}(P)}}^{q_{1-\frac{1% }{C\varepsilon n^{\prime}}}(P)}\min\left\{8F_{P}(t)[1-F_{P}(t)],2\sqrt{\frac{F% _{P}(t)[1-F_{P}(t)]}{\lceil n^{\prime}\rceil}}\right\}dt≤ ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ( italic_P ) end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUPERSCRIPT roman_min { 8 italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) [ 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ] , 2 square-root start_ARG divide start_ARG italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) [ 1 - italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) ] end_ARG start_ARG ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG end_ARG } italic_d italic_t
c5q1Cεn(P)q11Cεn(P)𝔼[|FP(t)FP^n(t)|]𝑑tabsentsubscript𝑐5superscriptsubscriptsubscript𝑞1𝐶𝜀superscript𝑛𝑃subscript𝑞11𝐶𝜀superscript𝑛𝑃𝔼delimited-[]subscript𝐹𝑃𝑡subscript𝐹subscript^𝑃superscript𝑛𝑡differential-d𝑡\displaystyle\leq c_{5}\int_{q_{\frac{1}{C\varepsilon n^{\prime}}(P)}}^{q_{1-% \frac{1}{C\varepsilon n^{\prime}}}(P)}\mathbb{E}[|F_{P}(t)-F_{\hat{P}_{\lceil n% ^{\prime}\rceil}}(t)|]dt≤ italic_c start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ( italic_P ) end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUPERSCRIPT blackboard_E [ | italic_F start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_t ) - italic_F start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) | ] italic_d italic_t
=𝔼[𝒲(P|q1Cεn(P),q11Cεn(P),P^n|q1Cεn(P),q11Cεn(P))],absent𝔼delimited-[]𝒲evaluated-at𝑃subscript𝑞1𝐶𝜀superscript𝑛𝑃subscript𝑞11𝐶𝜀superscript𝑛𝑃evaluated-atsubscript^𝑃superscript𝑛subscript𝑞1𝐶𝜀superscript𝑛𝑃subscript𝑞11𝐶𝜀superscript𝑛𝑃\displaystyle=\mathbb{E}\left[\mathcal{W}(P|_{q_{\frac{1}{C\varepsilon n^{% \prime}}(P)},q_{1-\frac{1}{C\varepsilon n^{\prime}}}(P)},\hat{P}_{\lceil n^{% \prime}\rceil}|_{q_{\frac{1}{C\varepsilon n^{\prime}}(P)},q_{1-\frac{1}{C% \varepsilon n^{\prime}}}(P)})\right],= blackboard_E [ caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ( italic_P ) end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ( italic_P ) end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUBSCRIPT ) ] ,

where c5subscript𝑐5c_{5}italic_c start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT is a sufficiently large constant and the fourth inequality holds since nnsuperscript𝑛𝑛\lceil n^{\prime}\rceil\leq n⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ ≤ italic_n.

By the above observations connecting the distribution-dependent terms with the corresponding terms for P𝑃Pitalic_P, we have that for all Q𝑄Qitalic_Q, with probability at least 0.950.950.950.95,

𝒲(Q,𝒜(Q^n)\displaystyle\mathcal{W}(Q,\mathcal{A}(\hat{Q}_{n})caligraphic_W ( italic_Q , caligraphic_A ( over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) =O(1εn(q11Cεn(P)q1Cεn(P))+𝒲(P,P|q4Cεn(P),q14Cεn(P))\displaystyle=O\Bigg{(}\frac{1}{\varepsilon n^{\prime}}\left(q_{1-\frac{1}{C% \varepsilon n^{\prime}}}(P)-q_{\frac{1}{C\varepsilon n^{\prime}}}(P)\right)+% \mathcal{W}(P,P|_{q_{\frac{4}{C\varepsilon n^{\prime}}(P)},q_{1-\frac{4}{C% \varepsilon n^{\prime}}}(P)})= italic_O ( divide start_ARG 1 end_ARG start_ARG italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ( italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_P ) - italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_P ) ) + caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 4 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ( italic_P ) end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 4 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUBSCRIPT )
+logn𝔼[𝒲(P|q1Cεn(P),q11Cεn(P),P^n|q1Cεn(P),q11Cεn(P))])+γ\displaystyle+\sqrt{\log n}\mathbb{E}\left[\mathcal{W}\left(P|_{q_{\frac{1}{C% \varepsilon n^{\prime}}}(P),q_{1-\frac{1}{C\varepsilon n^{\prime}}}(P)},\hat{P% }_{\lceil n^{\prime}\rceil}|_{q_{\frac{1}{C\varepsilon n^{\prime}}}(P),q_{1-% \frac{1}{C\varepsilon n^{\prime}}}(P)}\right)\right]\Bigg{)}+\gamma+ square-root start_ARG roman_log italic_n end_ARG blackboard_E [ caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_P ) , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_P ) , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUBSCRIPT ) ] ) + italic_γ
=O(logn)(1εn(q11Cεn(P)q1Cεn(P))+𝒲(P,P|q1Cεn/4(P),q11Cεn/4(P))\displaystyle=O(\log n)\Bigg{(}\frac{1}{\varepsilon\lceil n^{\prime}\rceil}% \left(q_{1-\frac{1}{C\varepsilon\lceil n^{\prime}\rceil}}(P)-q_{\frac{1}{C% \varepsilon\lceil n^{\prime}\rceil}}(P)\right)+\mathcal{W}(P,P|_{q_{\frac{1}{C% \varepsilon\lfloor n^{\prime}/4\rfloor}(P)},q_{1-\frac{1}{C\varepsilon\lfloor n% ^{\prime}/4\rfloor}}(P)})= italic_O ( roman_log italic_n ) ( divide start_ARG 1 end_ARG start_ARG italic_ε ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG ( italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG end_POSTSUBSCRIPT ( italic_P ) - italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG end_POSTSUBSCRIPT ( italic_P ) ) + caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌊ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / 4 ⌋ end_ARG ( italic_P ) end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌊ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / 4 ⌋ end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUBSCRIPT ) (20)
+1logn𝔼[𝒲(P|q1Cεn(P),q11Cεn(P),P^n|q1Cεn(P),q11Cεn(P))])+γ\displaystyle+\frac{1}{\sqrt{\log n}}\mathbb{E}\left[\mathcal{W}\left(P|_{q_{% \frac{1}{C\varepsilon\lceil n^{\prime}\rceil}}(P),q_{1-\frac{1}{C\varepsilon% \lceil n^{\prime}\rceil}}(P)},\hat{P}_{\lceil n^{\prime}\rceil}|_{q_{\frac{1}{% C\varepsilon\lceil n^{\prime}\rceil}}(P),q_{1-\frac{1}{C\varepsilon\lceil n^{% \prime}\rceil}}(P)}\right)\right]\Bigg{)}+\gamma+ divide start_ARG 1 end_ARG start_ARG square-root start_ARG roman_log italic_n end_ARG end_ARG blackboard_E [ caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG end_POSTSUBSCRIPT ( italic_P ) , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG end_POSTSUBSCRIPT ( italic_P ) , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUBSCRIPT ) ] ) + italic_γ

Now, we proceed with the analysis in two cases. Firstly, consider the case when the first and third terms inside the bracket on the RHS of equation 20 are larger than the second term inside the bracket. Then, we have that for all Q𝑄Qitalic_Q, with probability at least 0.950.950.950.95,

𝒲(Q,𝒜(Q^n)\displaystyle\mathcal{W}(Q,\mathcal{A}(\hat{Q}_{n})caligraphic_W ( italic_Q , caligraphic_A ( over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) =O(logn)(1εn(q11Cεn(P)q1Cεn(P))\displaystyle=O(\log n)\Bigg{(}\frac{1}{\varepsilon\lceil n^{\prime}\rceil}% \left(q_{1-\frac{1}{C\varepsilon\lceil n^{\prime}\rceil}}(P)-q_{\frac{1}{C% \varepsilon\lceil n^{\prime}\rceil}}(P)\right)= italic_O ( roman_log italic_n ) ( divide start_ARG 1 end_ARG start_ARG italic_ε ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG ( italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG end_POSTSUBSCRIPT ( italic_P ) - italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG end_POSTSUBSCRIPT ( italic_P ) )
+1logn𝔼[𝒲(P|q1Cεn(P),q11Cεn(P),P^n|q1Cεn(P),q11Cεn(P))])+γ\displaystyle+\frac{1}{\sqrt{\log n}}\mathbb{E}\left[\mathcal{W}\left(P|_{q_{% \frac{1}{C\varepsilon\lceil n^{\prime}\rceil}}(P),q_{1-\frac{1}{C\varepsilon% \lceil n^{\prime}\rceil}}(P)},\hat{P}_{\lceil n^{\prime}\rceil}|_{q_{\frac{1}{% C\varepsilon\lceil n^{\prime}\rceil}}(P),q_{1-\frac{1}{C\varepsilon\lceil n^{% \prime}\rceil}}(P)}\right)\right]\Bigg{)}+\gamma+ divide start_ARG 1 end_ARG start_ARG square-root start_ARG roman_log italic_n end_ARG end_ARG blackboard_E [ caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG end_POSTSUBSCRIPT ( italic_P ) , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG end_POSTSUBSCRIPT ( italic_P ) , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUBSCRIPT ) ] ) + italic_γ

By Theorem 6.3 and the fact that n<nsuperscript𝑛𝑛n^{\prime}<nitalic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT < italic_n, for all algorithms 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, there exists a distribution QN(P)𝑄𝑁𝑃Q\in N(P)italic_Q ∈ italic_N ( italic_P ) such that ,

RQ(𝒜,n)subscript𝑅𝑄superscript𝒜superscript𝑛\displaystyle R_{Q}(\mathcal{A}^{\prime},\lceil n^{\prime}\rceil)italic_R start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ ) =Ω(1εn(q11Cεn(P)q1Cεn(P))\displaystyle=\Omega\Bigg{(}\frac{1}{\varepsilon\lceil n^{\prime}\rceil}\left(% q_{1-\frac{1}{C\varepsilon\lceil n^{\prime}\rceil}}(P)-q_{\frac{1}{C% \varepsilon\lceil n^{\prime}\rceil}}(P)\right)= roman_Ω ( divide start_ARG 1 end_ARG start_ARG italic_ε ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG ( italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG end_POSTSUBSCRIPT ( italic_P ) - italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG end_POSTSUBSCRIPT ( italic_P ) )
+1logn𝔼[𝒲(P|q1Cεn(P),q11Cεn(P),P^n|q1Cεn(P),q11Cεn(P))]).\displaystyle+\frac{1}{\sqrt{\log n}}\mathbb{E}\left[\mathcal{W}\left(P|_{q_{% \frac{1}{C\varepsilon\lceil n^{\prime}\rceil}}(P),q_{1-\frac{1}{C\varepsilon% \lceil n^{\prime}\rceil}}(P)},\hat{P}_{\lceil n^{\prime}\rceil}|_{q_{\frac{1}{% C\varepsilon\lceil n^{\prime}\rceil}}(P),q_{1-\frac{1}{C\varepsilon\lceil n^{% \prime}\rceil}}(P)}\right)\right]\Bigg{)}.+ divide start_ARG 1 end_ARG start_ARG square-root start_ARG roman_log italic_n end_ARG end_ARG blackboard_E [ caligraphic_W ( italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG end_POSTSUBSCRIPT ( italic_P ) , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG end_POSTSUBSCRIPT ( italic_P ) , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUBSCRIPT ) ] ) .

Hence, for all algorithms Asuperscript𝐴A^{\prime}italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and the corresponding distribution Q𝑄Qitalic_Q, with probability at least 0.950.950.950.95,

𝒲(Q,𝒜(Q^n)O(logn)RQ(A,n)+γ.\displaystyle\mathcal{W}(Q,\mathcal{A}(\hat{Q}_{n})\leq O(\log n)R_{Q}(A^{% \prime},\lceil n^{\prime}\rceil)+\gamma.caligraphic_W ( italic_Q , caligraphic_A ( over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ≤ italic_O ( roman_log italic_n ) italic_R start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , ⌈ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌉ ) + italic_γ .

Next, consider the case where the first and third terms inside the bracket on the RHS of equation 20 are smaller than the second term inside the bracket. Then, we have that for all Q𝑄Qitalic_Q, with probability at least 0.950.950.950.95,

𝒲(Q,𝒜(Q^n)\displaystyle\mathcal{W}(Q,\mathcal{A}(\hat{Q}_{n})caligraphic_W ( italic_Q , caligraphic_A ( over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) =O(logn)𝒲(P,P|q1Cεn/4(P),q11Cεn/4(P))+γ.absent𝑂𝑛𝒲𝑃evaluated-at𝑃subscript𝑞1𝐶𝜀superscript𝑛4𝑃subscript𝑞11𝐶𝜀superscript𝑛4𝑃𝛾\displaystyle=O(\log n)\mathcal{W}(P,P|_{q_{\frac{1}{C\varepsilon\lfloor n^{% \prime}/4\rfloor}(P)},q_{1-\frac{1}{C\varepsilon\lfloor n^{\prime}/4\rfloor}}(% P)})+\gamma.= italic_O ( roman_log italic_n ) caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌊ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / 4 ⌋ end_ARG ( italic_P ) end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌊ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / 4 ⌋ end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUBSCRIPT ) + italic_γ .

By Theorem 6.3, for all algorithms 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, there exists a distribution QN(P)𝑄𝑁𝑃Q\in N(P)italic_Q ∈ italic_N ( italic_P ) such that

RQ(𝒜,n/4)=Ω(𝒲(P,P|q1Cεn/4(P),q11Cεn/4(P))).subscript𝑅𝑄superscript𝒜superscript𝑛4Ω𝒲𝑃evaluated-at𝑃subscript𝑞1𝐶𝜀superscript𝑛4𝑃subscript𝑞11𝐶𝜀superscript𝑛4𝑃R_{Q}(\mathcal{A}^{\prime},\lfloor n^{\prime}/4\rfloor)=\Omega\left(\mathcal{W% }(P,P|_{q_{\frac{1}{C\varepsilon\lfloor n^{\prime}/4\rfloor}(P)},q_{1-\frac{1}% {C\varepsilon\lfloor n^{\prime}/4\rfloor}}(P)})\right).italic_R start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , ⌊ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / 4 ⌋ ) = roman_Ω ( caligraphic_W ( italic_P , italic_P | start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌊ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / 4 ⌋ end_ARG ( italic_P ) end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 1 - divide start_ARG 1 end_ARG start_ARG italic_C italic_ε ⌊ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / 4 ⌋ end_ARG end_POSTSUBSCRIPT ( italic_P ) end_POSTSUBSCRIPT ) ) .

Hence, we have that for all algorithms Asuperscript𝐴A^{\prime}italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and for the corresponding distribution Q𝑄Qitalic_Q, with probability at least 0.950.950.950.95,

𝒲(Q,𝒜(Q^n)\displaystyle\mathcal{W}(Q,\mathcal{A}(\hat{Q}_{n})caligraphic_W ( italic_Q , caligraphic_A ( over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) =O(logn)RQ(𝒜,n/4)+γ,absent𝑂𝑛subscript𝑅𝑄superscript𝒜superscript𝑛4𝛾\displaystyle=O(\log n)R_{Q}(\mathcal{A}^{\prime},\lfloor n^{\prime}/4\rfloor)% +\gamma,= italic_O ( roman_log italic_n ) italic_R start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , ⌊ italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / 4 ⌋ ) + italic_γ ,

as required. This completes the proof. ∎