Metric Differential Privacy at the User-Level

Jacob Imola University of CopenhagenCopenhagenDenmark [email protected] , Amrita Roy Chowdhury UCSDLa JollaCaliforniaUSA [email protected] and Kamalika Chaudhuri UCSDLa JollaCaliforniaUSA [email protected]

(2018; 20 February 2007; 12 March 2009; 5 June 2009)

Abstract.

Metric differential privacy (DP) provides heterogeneous privacy guarantees based on a distance between the pair of inputs. It is a widely popular notion of privacy since it captures the natural privacy semantics for many applications (such as, for location data) and results in better utility than standard DP. However, prior work in metric DP has primarily focused on the item-level setting where every user only reports a single data item. A more realistic setting is that of user-level DP where each user contributes multiple items and privacy is then desired at the granularity of the user’s entire contribution. In this paper, we initiate the study of metric DP at the user-level. Specifically, we use the earth-mover’s distance ( $d_{\textsf{EM}}$ ) as our metric to obtain a notion of privacy as it captures both the magnitude and spatial aspects of changes in a user’s data.

We make three main technical contributions. First, we design two novel mechanisms under $d_{\textsf{EM}}$ -DP to answer linear queries and item-wise queries. Specifically, our analysis for the latter involves a generalization of the privacy amplification by shuffling result which may be of independent interest. Second, we provide a black-box reduction from the general unbounded to bounded $d_{\textsf{EM}}$ -DP (size of the dataset is fixed and public) with a novel sampling based mechanism. Third, we show that our proposed mechanisms can provably provide improved utility over user-level DP, for certain types of linear queries and frequency estimation.

User-level Differential Privacy, Earth-Mover’s Distance, Couplings

^†^†copyright: acmlicensed^†^†journalyear: 2018^†^†doi: XXXXXXX.XXXXXXX^†^†conference: Make sure to enter the correct conference title from your rights confirmation emai; June 03–05, 2018; Woodstock, NY^†^†isbn: 978-1-4503-XXXX-X/18/06^†^†ccs: Security and privacy^†^†ccs: Theory of computation Design and analysis of algorithms

1. Introduction

Differential privacy (DP) is the state-of-the art technique that enables useful data analysis while still providing a strong privacy guarantee at the granularity of individuals (Dwork, 2006). Over nearly two decades, DP has enjoyed significant academic attention and has proven its efficacy in practical applications as well. It has been successfully deployed in diverse settings, including the US census (Abowd, 2018), Apple’s iOS platform (Cormode et al., 2018), and Google Chrome (Erlingsson et al., 2014).

Intuitively, DP guarantee makes a pair of input data to be indistinguishable from each other. The standard DP guarantee requires all pairs of inputs to be indistinguishable thereby providing a uniform privacy guarantee to all pairs. This implies that every pair of input is considered equally sensitive. However, many practical applications call for a more tailored privacy semantics based on the heterogeneity of the data. In particular, input pairs that are closer or more similar to each other are considered to be more sensitive. For instance, for location data, revealing the exact city of residence is far more sensitive than revealing just the country. Metric DP ( $d_{\mathcal{X}}$ -DP; (Chatzikokolakis et al., 2013)) is a notion of DP that formally captures this heterogeneity in privacy semantics. Specifically, similarity is measured via a distance metric $d_{\mathcal{X}}$ and the privacy guarantee degrades linearly with the $d_{\mathcal{X}}$ distance between the pair of inputs. In addition to offering a more nuanced privacy definition, metric DP also improves utility compared to standard DP. This improvement stems from metric DP requiring only similar pairs of input to be indistinguishable, which results in a significantly lower noise than standard DP.

Prior work in metric DP has primarily focused on the item-level setting where every user only reports a single data item (for e.g., a single record in a dataset). However, in many practical applications, a user contributes multiple items to a dataset. Privacy is then desired at the granularity of the user’s entire contribution. This has spurred a large body of work known as user-level DP (Amin et al., 2019; Bassily and Sun, 2023; Cummings et al., 2022; Acharya et al., 2023). However, all of this work considers only standard DP and is thus susceptible to the same limitations in utility as noted earlier. To this end, we initiate the study of metric DP at the user-level. While there have been some prior attempts at this, these work is limited to specific settings such as text data (Fernandes et al., 2019). To the best of our knowledge, this is the first work to give a general definition of metric DP at the user-level.

The immediate task is to define a metric on the entire collection of a user’s data. Recall that metric DP caters to the privacy semantics that similar data is more sensitive. But the challenge here is that the similarity between two collections (sets) of data points has to be measured along two dimensions – $(1)$ the distance between the individual data items, and $(2)$ the fraction of the data items in the set that are different. In particular, note that in addition to small changes in the item-wise distances, changes in a smaller amount of the data also indicate more similarity and hence, correspond to more sensitive information (see below for concrete examples). This necessitates a measure that can express both of these quantities as a single metric. We tackle this challenge by using the earth-mover’s distance ( $d_{\textsf{EM}}$ ; (Givens and Shortt, 1984)) on the normalized representation of the user’s data. Informally, the $d_{\textsf{EM}}$ between two distributions is the minimum cost of transporting one distribution to another, where the cost is determined by the quantity of data items moved multiplied by the distance (measured via $d_{\mathcal{X}}$ ) over which they are moved. Our resulting privacy definition, denoted as $d_{\textsf{EM}}$ -DP, yields the following privacy semantics. Under $d_{\textsf{EM}}$ -DP, the strength of the privacy guarantee (indistinguishability) between two pairs of inputs $K,K^{\prime}$ (sets of data items) grows inversely with $\tau q$ if $K^{\prime}$ can be obtained by changing $\tau$ fraction of $K$ by an average distance of $q$ (Def. 3.1). $d_{\textsf{EM}}$ therefore takes into account both the structure of the distributions as well as the raw difference in their values. Consequently, the parameters $\tau$ and $q$ provide flexibility in interpretation and offer a nuanced privacy definition suitable for many practical applications. We illustrate this with the following examples:
Location Data. We will use our location dataset as a canonical example throughout the paper. Suppose that the location dataset consists of daily locations of users collected over a period of time. Here, the parameter $\tau$ can be interpreted in terms of the length of the time window the change in $K^{\prime}$ pertains to, and $q$ corresponds to the extent of change in the location. Then, $d_{\textsf{EM}}$ -DP makes it harder to distinguish between locations that are $(1)$ close to each other, and $(2)$ collected over a smaller time window. This is natural, since locations gathered over an extended period, such as a month, may reveal routine patterns that are less sensitive than locations recorded on a single day (for instance, a single-day location might reveal a non-routine visit to a friend or hospital).
Textual Data. Consider a natural language dataset of user conversations where each user’s data is represented as a set of words. Typically, word embeddings $\phi$ map each word into a high-dimensional space, and word similarity is measured using a distance, such as the Euclidean distance, between $\phi(x_{1})$ and $\phi(x_{2})$ . Now, the parameter $\tau$ corresponds to what fraction of the user’s conversation has changed in $K^{\prime}$ from $K$ , while $q$ corresponds to the extent of the changes in the textual content. Thus, two conversations are harder to distinguish if $(1)$ there is only a fine-grained difference in their textual semantics ¹¹1Such as transitioning from text about algebra to trigonometry versus changing it from ”math” to ”classical music”., and $(2)$ if it pertains to just a small fraction of the conversation (indicating a user rarely discussed the topic, which typically implies more sensitive information).
Graph Data. Consider a graph $G=(V,E)$ in which connections in $E$ are private. Suppose there is additional public information in the form of a covariate $\phi:V\rightarrow\mathbb{R}^{d}$ , which captures some auxiliary information about a user—for instance, the interests of a user. Here similarity between users is measured via covariate distance. The parameter $\tau$ corresponds to the fraction of a user’s connections which has changed in $K^{\prime}$ from $K$ , and the parameter $q$ corresponds to the extent of the change in their interests. Thus, two graphs are harder to distinguish between if $(1)$ it is a fine-grained change to the interest²²2for instance, shifting from movies featuring Dwayne Johnson to Vin Diesel instead of from ”action” to “rom-com”, and $(2)$ if it pertains to only a few of the user’s connections. (say a small, private group of friends). This again captures natural privacy semantics as users are more likely to share common interests with their close friends than with a larger group, such as all workplace colleagues.

1.1. Details of Our Contributions

We consider $n$ users who hold datasets $\{K_{i}\}_{i=1}^{n}$ , each containing elements from a data domain $\mathcal{X}$ of size $k=|\mathcal{X}|$ . Let $d_{\mathcal{X}}$ denote the distance metric defined over $\mathcal{X}$ . WLOG, we consider $d_{\mathcal{X}}$ to be a normalized distance metric, i.e., all measures of distance are normalized to be at most $1$ . Let $\tilde{K}_{i}$ denote the normalized version of the dataset $K_{i}$ . $d_{\textsf{EM}}$ between any pair of datasets $\{K_{i},K_{i}^{\prime}\}$ can be defined by first normalizing them to $\{\tilde{K}_{i},\tilde{K}_{i}^{\prime}\}$ , and then using $d_{\mathcal{X}}$ to measure the minimum cost of transporting $\tilde{K}_{i}$ to $\tilde{K}_{i}^{\prime}$ . The global dataset is given by $K_{G}=K_{1}\cup\cdots\cup K_{n}$ , and there is an aggregator who wants to privately compute a query $F(K)$ . In the central model, the aggregator already holds $K_{i}$ from each user, and applies a private mechanism $\mathcal{M}(K)$ to obtain a private estimate for $F$ . In the local model, the users do not trust the aggregator, and communicate private messages $\{m_{i}=\mathcal{M}_{i}(K_{i})\}$ to the aggregator. The aggregator then post-processes these messages $\mathcal{F}(m_{1},\ldots,m_{n})$ to output a private estimate of $F$ . For simplicity, in this work we assume the mechanisms $\mathcal{M}_{i}$ to be non-interactive.

We also make a distinction between bounded and unbounded data. Note that boundedness here refers to the size of each user’s dataset and not the number of the users – throughout the paper, we assume that the number of users, $n$ , is fixed and publicly known. In our specific context, bounded data corresponds to the case where the size of each user’s dataset is publicly known, and the mechanism $\mathcal{M}$ only needs to preserve privacy between datasets of the same size. Furthermore, in the central model, each user’s dataset has the same public size. The benefit of this simplification is that algorithm analysis is easier. Such a bounded data setting has been considered in many previous works (Li et al., 2016). We also consider the general unbounded data setting where each user can have datasets of varying sizes, with the size being private as well.

For each model and type of boundedness, we summarize how one would apply $d_{\textsf{EM}}$ -DP, along with the resulting semantics, in Table 1. We also include a corresponding notion of the standard user-level DP (Liu et al., 2023) (provides a uniform privacy guarantee to all pairs of datasets) and serves as our baseline. In what follows, we elaborate on our main contributions.

Model	Granularity	Data Boundedness	Privacy Guarantee	Semantics	Notes
Local (applies to each $\mathcal{M}_{i}$ )	User	Unbounded	$(\varepsilon,\delta)$ -user-level DP (Def. 2.1)	Two input datasets $K,K^{\prime}\in\mathcal{X}^{*}$ are indistinguishable with parameters $(\varepsilon,\delta)$	Recently proposed in (Acharya et al., 2023). Acts our baseline for the local model.
	User	Bounded	$(\alpha,\delta)$ -bounded $d_{\textsf{EM}}$ -DP (Def. 3.1)	Two input datasets $K,K^{\prime}\in\mathcal{X}^{m}$ are indistinguishable with parameters $(\alpha d_{\textsf{EM}}(\tilde{K},\tilde{K}^{\prime}),\delta)$ .	The size of each dataset, $m$ , is public. Proofs of privacy easier due to Lemma 2.1.
	User	Unbounded	$(\alpha,\delta)$ -unbounded $d_{\textsf{EM}}$ -DP (Def. 3.1)	Two input datasets $K,K^{\prime}\in\mathcal{X}^{*}$ is indistinguishable with parameters $(\alpha d_{\textsf{EM}}(K,K^{\prime}),\delta)$ .	Implies user-level DP when $\alpha\leq\varepsilon$ since $d_{\textsf{EM}}(\cdot,\cdot)\leq 1$ .
	Item	N/A	$(\alpha,\delta)$ - $d_{\mathcal{X}}$ -DP (Def. 2.3)	Two input items $x,x^{\prime}\in\mathcal{X}$ is protected with parameters $(\alpha d_{\mathcal{X}}(x,x^{\prime}),\delta)$	Proposed in (Chatzikokolakis et al., 2015)
Central (applies to $\mathcal{M}$ )	User	Unbounded	$(\varepsilon,\delta)$ -user-level DP (Def. 2.2)	Let $K_{G}=K_{1}\cup\cdots K_{n}$ where $K_{i}\in\mathcal{X}^{*}$ . Two input global datasets $K_{G},K_{G}^{\prime}$ s.t. they differ only on the dataset of a single user $\{K_{i},K_{i}^{\prime}\},i\in[n]$ are indistinguishable with parameters $(\varepsilon,\delta)$	Studied widely (Bassily and Sun, 2023; Liu et al., 2020, 2023). Acts our baseline for the central model.
	User	Bounded	$(\alpha,\delta)$ -bounded $d_{\textsf{EM}}$ -DP (Def. 3.1)	Two input global datasets $K_{G},K_{G}^{\prime}$ s.t. they differ only on $\{K_{i},K_{i}^{\prime}\}\in\mathcal{X}^{m}\times\mathcal{X}^{m}$ are indistinguishable with parameters $(\alpha d_{\textsf{EM}}(\tilde{K}_{i},\tilde{K}_{i}^{\prime}),\delta)$	Each $K_{i}$ has size $m$ which is public.
	User	Unbounded	$(\varepsilon,\delta,r)$ -discrete $d_{\textsf{EM}}$ -DP (Def. 3.2)	Two input global datasets $K_{G},K_{G}^{\prime}$ s.t. they differ only on $\{K_{i},K_{i}^{\prime}\}$ and $d_{\textsf{EM}}(\tilde{K}_{i},\tilde{K}_{i}^{\prime})\leq r$ are indistinguishable with parameters $(\varepsilon,\delta)$ .	Using group privacy, we can show the following parameters $(\varepsilon\lceil\frac{d}{r}\rceil,\delta\exp(\varepsilon\lceil\frac{d}{r}% \rceil))$ where $d=d_{\textsf{EM}}(\tilde{K}_{i},\tilde{K}_{i}^{\prime})$ . Implies user-level DP when $r\geq 1$ since $d_{\textsf{EM}}(\cdot,\cdot)\leq 1$ .

Table 1. Summary of privacy definitions for this paper. The number of users,

n

, is fixed and publicly known for all the definitions.

1.1.1. Mechanism Design

We provide novel mechanisms for answering two types of queries for $d_{\textsf{EM}}$ -DP.

Linear Query

First, we study how to release linear queries $V\tilde{K}_{G}$ , where $\tilde{K}_{G}\in\mathbb{R}^{\mathcal{X}}$ is the normalized representation of the global dataset $K_{G}$ and $V\in\mathbb{R}^{d\times\mathcal{X}}$ is a real-valued matrix with bounded entries. While computing the sensitivity of a linear query is easy under user-level DP, proving a sensitivity under $d_{\textsf{EM}}$ -DP is quite challenging. Specifically, it requires analysis of a coupling between two possible datasets, along with a stronger assumption that $V$ is “Lipschitz” in a sense, rather than just being bounded. To this end, we first prove the following bound:

Theorem 1.1.

(Informal version of Thm. 4.1): The sensitivity of $V\tilde{K}_{G}$ is upper bounded by

\max_{K,K^{\prime}}\frac{\|V\tilde{K}_{G}-V\tilde{K}_{G}^{\prime}\|}{d_{% \textsf{EM}}(\tilde{K}_{G},{\tilde{K}_{G}}^{\prime})}\leq\max_{x,x^{\prime}\in% \mathcal{X}}\frac{\|V[x]-V[x^{\prime}]\|}{d_{\mathcal{X}}(x,x^{\prime})},

where the notation $V[x]$ indicates the column of $V$ indexed by $x$ .

Using the above result, we show that the sensitivity of $V$ , which is a maximum over the space of all datasets, can be reduced to a Lipschitz property of $V$ that is much easier to compute. In Sec. 6.1, we show that a special class of linear queries, which we call linear embedding queries, satisfies the above mentioned Lipschitzness and can provide provably better utility than user-level DP.

Unordered Release of Item-wise Queries

We design a mechanism for performing itemwise queries on the entire dataset $K$ . Our approach is to simply apply a private mechanism $\mathcal{A}$ to each item $k_{i}\in K$ and release the set of noisy outputs $\{\mathcal{A}(k_{i})\}$ after shuffling them. Here $\mathcal{A}$ can be an arbitrary mechanism satisfying $(\alpha,0)$ - $d_{\mathcal{X}}$ DP which makes our mechanism completely general-purpose (see Sec. 4.2 for some concrete examples of $\mathcal{A}$ ). Here we consider the bounded data setting since the size of $K$ is revealed. The main technical novelty lies in providing a tight privacy analysis of the above mechanism. Specifically, prior work shows that the above mechanism satisfies bounded $(m\alpha,0)$ - $d_{\textsf{EM}}$ -DP (Fernandes et al., 2019) by using the interplay between couplings and privacy via composition. However, we show that composition is not the right tool for tight privacy analysis since it does not take into that the output of our mechanism is an unordered list, i.e., the $\mathcal{A}(k_{i})$ s are released in a random arbitrary order. Instead, we generalize a tight result from privacy amplification by shuffling (Feldman et al., 2022) to metric DP.

Theorem 1.2.

(Informal version of Thm. 4.3) Suppose that $\mathcal{A}:\mathcal{X}\rightarrow\mathcal{Y}$ is an $\alpha$ - $d_{\mathcal{X}}$ DP algorithm with respect to $d_{\mathcal{X}}$ . Let $(x_{1},\ldots,x_{m})\in\mathcal{X}^{m}$ be a dataset. Then, releasing $\mathsf{Shuffle}(\mathcal{A}(x_{1}),\ldots,\mathcal{A}(x_{m}))$ satisfies $(O(\alpha\sqrt{me^{\alpha}\ln(m/\delta)}),\delta e^{\alpha})$ - $d_{\textsf{EM}}$ DP.

This analysis reduces the cost of releasing $m$ points in the multiset from $m\alpha$ to $\sqrt{m}\alpha$ , allowing for better utility. We keep the analysis general – we consider releasing the shuffled multiset of any black-box mechanism $\mathcal{A}$ , that satisfies metric DP in the data domain $\mathcal{X}$ , applied to each data point. Consequently, this result has broader applications to the shuffle model of privacy, and may be of independent interest.

1.1.2. Extending $d_{\textsf{EM}}$ -DP to the Unbounded Setting

We start our mechanism designs by considering the bounded data settings in both the local and central models of privacy (See Table 1) as this enables easier privacy analysis (Sec. 4). However, the bounded setting might be restrictive in practice as it cannot support usecases where users have different amounts of data, or the data sizes are also private. To this end, we extend $d_{\textsf{EM}}$ -DP to the more general unbounded setting. We show that when user data is relatively homogeneous (such as, when it is i.i.d.), the privacy analysis of the unbounded setting may be reduced to the bounded setting.

Specifically, in Sec. 5 we create a black-box projection mechanism which projects any unbounded dataset onto a dataset where each user contributes a fixed, predefined amount of data. This enables running any bounded $d_{\textsf{EM}}$ -DP mechanism on the projected data. Our projection mechanism samples a fixed number of dataset items with replacement from each user. The privacy analysis follows by showing that the $d_{\textsf{EM}}$ between any two datasets remains relatively unchanged by sampling, up to a small additive factor as determined by the Chernoff’s bound.

One caveat is that the introduced additive factor necessitates a slight adjustment to the privacy semantics of $d_{\textsf{EM}}$ -DP. Instead of protecting any change of $d_{\textsf{EM}}$ distance $d$ with a privacy parameter $d\alpha$ , we consider a small threshold $r$ such that all changes less than $r$ are protected with a uniform parameter $r\alpha$ . In essence, this privacy guarantee provides $d_{\textsf{EM}}$ -DP at the granularity of units of $d_{\textsf{EM}}$ distance $r$ . We refer to this notion as discrete user-level $d_{\textsf{EM}}$ -DP (Def. 5.1) and have the following result:

Theorem 1.3.

(Informal version of Thm. 5.3) Suppose that for $n$ users, $\mathcal{M}$ is a mechanism which satisfies $(\alpha,\delta)$ -bounded $d_{\textsf{EM}}$ -DP. The algorithm which, given arbitrary user datasets $K_{1},\ldots,K_{n}$ , takes $s$ i.i.d. samples from each $K_{i}$ and then applies $\mathcal{M}$ on each of the sampled data items, satisfies $(\alpha r,\delta,r)$ -discrete $d_{\textsf{EM}}$ -DP for all $r\geq\frac{2\ln(1/\delta)}{s}$ .

The two notions of privacy are nearly equivalent for small $r$ , showing that unbounded $d_{\textsf{EM}}$ -DP can be reduced to bounded $d_{\textsf{EM}}$ -DP with an almost exact translation of the privacy guarantee.

1.1.3. Demonstrating Improvements Over User-level DP

We compare the privacy and utility of our proposed $d_{\textsf{EM}}$ -DP mechanisms with baseline mechanisms satisfying user-level DP. Specifically, in Sec. 6.1, we study a special type of linear query called linear embedding queries and in Sec. 6.2, we study problem of private frequency estimation. For simplicity, we consider the bounded data setting.

Let’s start by understanding the relationship between $(\alpha,\delta)$ - $d_{\textsf{EM}}$ -DP and $(\varepsilon,\delta)$ -user-level DP. The following observations hold in both the central and local models:

•

$\alpha=\varepsilon$ : Since³³3If $\alpha\leq\varepsilon$ , then user-level DP is strictly weaker than $d_{\textsf{EM}}$ -DP; the more appropriate baseline is to use $\alpha=\varepsilon$ . we assume $d_{\mathcal{X}}$ is normalized, we always have $d_{\textsf{EM}}\leq 1$ . Thus, in this case $(\varepsilon,\delta)$ - $d_{\textsf{EM}}$ -DP implies $(\varepsilon,\delta)$ -user-level DP. However, any pair of input $K,K^{\prime}$ such that $d_{\textsf{EM}}(\tilde{K},\tilde{K}^{\prime})<1$ the privacy protection of $d_{\textsf{EM}}$ -DP is actually stronger.
•

$\alpha>\varepsilon$ : In this case, some pairs of inputs (with a large $d_{\textsf{EM}}$ distance between them) are protected less strongly than they are under user-level DP. However, as indicated in our aforementioned real-life examples, input pairs with high $d_{\textsf{EM}}$ (i.e., dissimilar input pairs) are typically less sensitive.

Now, we interpret the theoretical error bounds for linear embedding queries. From Table 2(a), the error for releasing a $d$ -dimensional linear embedding query under user-level DP is $O(\frac{d}{\varepsilon n})$ , while it is $O(\frac{d}{\alpha n})$ for $d_{\textsf{EM}}$ -DP. When $\alpha=\varepsilon$ , these utilities are identical, but $d_{\textsf{EM}}$ DP offers stronger privacy. When $\alpha>\varepsilon$ , then the utility of $d_{\textsf{EM}}$ -DP is higher than that of user-level DP, with the the two guarantees offering differing privacy semantics. Thus, in both cases, there is a clear benefit of using $d_{\textsf{EM}}$ -DP. These observations are the same in the local model.

Finally, for frequency estimation in the local model, Table 2(b) shows that the error of user-level DP is $O(\sqrt{\frac{k^{2}\ln(m/\delta)}{n\varepsilon^{2}}})$ , while it is $O(\sqrt{\frac{k^{3}}{n\alpha^{2}}\max\{\ln(\frac{m}{\delta},\alpha\}}$ for $d_{\textsf{EM}}$ DP. For constant $\varepsilon$ and $\alpha\geq\varepsilon^{2}\frac{k}{\ln(m/\delta)}$ , the utility is improved. In the central model, the error of the user-level DP algorithm is $O(\frac{k}{n\varepsilon})$ while it is $O(\frac{k^{3/2}}{n\alpha}\sqrt{\max\{\ln(\frac{m}{\delta}),\alpha\}})$ for $d_{\textsf{EM}}$ -DP. The algorithm under $d_{\textsf{EM}}$ -DP has the added benefit that it can be implemented in the shuffle model of privacy, which requires less trust and parallels prior work in the shuffle model (Feldman et al., 2022). There is a utility improvement for $\alpha\geq\varepsilon^{2}k$ . When $\varepsilon\leq\alpha\leq\varepsilon^{2}k$ , we leave it as an interesting open problem whether $d_{\textsf{EM}}$ -DP can offer utility improvements over user-level DP.

Linear Embedding Queries
Algorithm	Privacy Guarantee	Privacy Model	$\ell_{2}$ Error		Notes
Laplace Mechanism	$(\varepsilon,0)$ -user level DP	Central, Bounded	$O(\frac{d}{\varepsilon n})$	(Lemma 6.3)	$d_{\textsf{EM}}-DP$ gives same utility but stronger privacy $\alpha=\varepsilon$ ; $d_{\textsf{EM}}-DP$ gives better utility but different privacy for $\alpha>\varepsilon$ .
PrivEMDLinear	$(\alpha,\delta)$ - $d_{\textsf{EM}}$ -DP	Central, Bounded	$O(\frac{d}{\alpha n}\sqrt{\ln\frac{1}{\delta}})$	(Lemma 6.2)

(a) Comparison of

d_{\textsf{EM}}

-DP to user-level DP in the central model for releasing a

d

-dimensional linear embedding query. The errors in the local model are a factor

\sqrt{n}

higher.

Frequency Estimation
Algorithm	Privacy Guarantee	Privacy Model	$d_{\textsf{EM}}$ Error		Notes
Hadamard Response	$(\varepsilon,0)$ -user-level DP	Local, Bounded	$O\left(\sqrt{\frac{k^{2}\ln(m/\delta)}{n\varepsilon^{2}}}\right)$	(Lemma 6.4)	Assuming $k,\varepsilon,\alpha\leq\sqrt{m}$ ; $d_{\textsf{EM}}$ -DP gives better utility for $\alpha\geq\varepsilon^{2}\frac{k}{\ln(m/\delta)}$ .
PrivEMDItemWise	$(\alpha,\delta)$ - $d_{\textsf{EM}}$ -DP	Local, Bounded	$O\left(\sqrt{\frac{k^{3}}{n\alpha^{2}}\max\left\{\ln(\frac{m}{\delta}),\alpha% \right\}}\right)$	(Thm. 6.6)
Laplace Mechanism	$(\varepsilon,0)$ -user-level DP	Central, Bounded	$O\left(\frac{k}{n\varepsilon}\right)$	(Lemma 6.7)	Assuming $n<\frac{m}{\alpha}$ ; $d_{\textsf{EM}}$ -DP gives better utility when $\alpha>\varepsilon^{2}k$ .
PrivEMDItemWise	$(\alpha,\delta)$ - $d_{\textsf{EM}}$ -DP	Central, Bounded ⁴⁴4This algorithm works in the shuffle model, which requires less trust than the central model.	$O\left(\frac{k^{3/2}}{n\alpha}\sqrt{\max\left\{\ln(\frac{m}{\delta}),\alpha% \right\}}\right)$	(Corollary 6.8)

(b) Comparison of

d_{\textsf{EM}}

-DP to user-level DP for frequency estimation in the setting defined in Sec. 6.2.

k

is the domain size

|\mathcal{X}|

Table 2. Summary of theoretical utility guarantees, assuming there are

n

users who hold datasets of size

m

2. Background

2.1. Differential Privacy

Intuitively, DP is a property of a mechanism which ensures that its output distribution remains insensitive to changes in the data of a single individual. The standard DP guarantee, which is also know as item-level DP, considers each user $U_{i}$ to contribute only a single item $x_{i}\in\mathcal{X}$ to a global dataset, i.e., $K_{i}=x_{i}$ . In this paper, we consider differential privacy at the user level. We start by considering the local model:

Definition 2.1 (Unbounded User-level Local DP (Acharya et al., 2023)).

We say a mechanism $\mathcal{M}$ acting on a dataset $K$ satisfies $(\varepsilon,\delta)$ -unbounded user-level local DP if, for all $K,K^{\prime}\in\mathcal{X}^{*}$ and all outputs $O$

(1)

\Pr[\mathcal{M}(K)=O]\leq e^{\varepsilon}\Pr[\mathcal{M}(K^{\prime})=O]+\delta.

Note that here we consider the more general unbounded data setting where the two datasets $\{K,K^{\prime}\}$ can have arbitrary sizes.

Next, we present the definition for the central model.

Definition 2.2 (Unbounded User-level Central DP (Liu et al., 2023)).

Let $K_{G}=K_{1}\cup\cdots\cup K_{n}$ denote a global dataset from $n$ users where $\forall i\in[n],K_{i}\in\mathcal{X}^{*}$ . We say $K_{G}\sim K^{\prime}_{G}$ , if $K^{\prime}_{G}$ can be obtained from $K_{G}$ by changing the dataset of a single user $U_{i}$ from $K_{i}$ to $K_{i}^{\prime}$ . We say a mechanism $\mathcal{M}$ acting on a dataset $K$ satisfies $(\varepsilon,\delta)$ -unbounded user-level central DP if, for all $K_{G},K_{G}^{\prime}$ such that $K_{G}\sim K^{\prime}_{G}$ , and all outputs $O$

(2)

\Pr[\mathcal{M}(K_{G})=O]\leq e^{\varepsilon}\Pr[\mathcal{M}(K_{G}^{\prime})=O% ]+\delta.

Note that there is no restriction in the sizes of the datasets $\{K_{i}\},i\in[n]$ in the above definition.

Next, we define metric DP ( $d_{\mathcal{X}}$ -DP) that enables the privacy guarantee to depend on the $d_{\mathcal{X}}$ distance between the pair of inputs. We start by introducing it at the item-level (so we consider changing an item $x\in\mathcal{X}$ to another item $x^{\prime}\in\mathcal{X}$ ). For simplicity, we consider the local model, so the mechanism acts on just a single item:

Definition 2.3 (Local $d_{\mathcal{X}}$ -DP (Alvim et al., 2018)).

We say $\mathcal{M}$ satisfies $(\alpha,\delta)$ -local $d_{\mathcal{X}}$ -DP if for all data elements $x,x^{\prime}\in\mathcal{X}$ , and all outputs $O$

\Pr[\mathcal{M}(x)=O]\leq e^{\alpha d_{\mathcal{X}}(x,x^{\prime})}\Pr[\mathcal% {M}(x^{\prime})=O].

We replace the traditional privacy parameter $\varepsilon$ with $\alpha$ in the above definition, because $\varepsilon$ in Definitions 2.1 and 2.2 is a unitless parameter while $\alpha$ has the inverse unit of $d_{\mathcal{X}}$ .

2.2. Earth-Mover’s Distance

Notations. We denote the set of all possible datasets as $\mathcal{X}^{*}$ . We will also view a dataset $K\in\mathcal{X}^{*}$ as a probability distribution defined by its normalized histogram $\tilde{K}$ . To do so, let $\Delta^{\mathcal{X}}\subseteq\mathbb{R}^{\mathcal{X}}$ denote the probability simplex indexed by $\mathcal{X}$ —i.e. the set of all vectors $\langle v_{x}\rangle_{x\in\mathcal{X}}$ such that $v_{x}\geq 0$ and $\sum_{x\in\mathcal{X}}v_{x}=1$ . For a dataset $K$ , $\tilde{K}\in\Delta^{\mathcal{X}}$ then denotes the probability distribution defined by $K$ , meaning $\tilde{K}[x]=\frac{\text{Num. occurrences of $x$ in $K$}}{|K|}$ . A natural way to extend the notion of distance from items in $\mathcal{X}$ to distributions in $\Delta^{\mathcal{X}}$ is to use the Earth-Mover’s (or $1$ -Wasserstein) distance (Givens and Shortt, 1984), which we now define. For a joint distribution $C(x_{1},x_{2})\in\Delta^{\mathcal{X}\times\mathcal{X}}$ , let $C_{x_{1}}(x_{2})$ denote the distribution conditioned on observing $x_{1}$ , and let $C_{1}(x_{1})$ denote the marginal distribution of $x_{1}$ . We define $C_{x_{2}}(x_{1})$ and $C_{2}(x_{2})$ similarly.

Definition 2.4.

For distributions $P,Q\in\Delta^{\mathcal{X}}$ , a joint distribution $C$ on $\mathcal{X}\times\mathcal{X}$ is a coupling between $P$ and $Q$ if $C_{1}=P$ and $C_{2}=Q$ . We let $\mathcal{C}(P,Q)$ denote the set of couplings between $P$ and $Q$ .

A coupling $C$ can be viewed as a “transportation plan” between $P$ and $Q$ , in the sense that if $C$ places $m$ probability mass at a point $(x_{1},x_{2})$ , then $m$ probability mass from $P$ at $x_{1}$ is transported to $Q$ at $x_{2}$ (or vice-versa). We define the cost of a coupling as the expected transportation distance given by $\mathbb{E}_{(x,x^{\prime})\sim C}d_{\mathcal{X}}(x,x^{\prime})$ . The earth-mover’s distance ( $d_{\textsf{EM}}$ ) between $P,Q$ is equal to the minimum possible cost of a coupling between $P$ and $Q$ :

d_{\textsf{EM}}(P,Q)=\inf_{C\in\mathcal{C}(P,Q)}\operatorname{\mathbb{E}}_{(x,% x^{\prime})\sim C}d_{\mathcal{X}}(x,x^{\prime}).

Since we assume that $d_{\mathcal{X}}$ is bounded by $1$ , we have $d_{\textsf{EM}}(\cdot,\cdot)\leq 1$ .

Next we present the Birkhoff-Von Neumann Theorem which is useful in our privacy analysis in Sec. 4.2. The theorem states that if both $P$ and $Q$ are empirical distributions with the same number of points, then the $d_{\textsf{EM}}$ between them is the cost of the coupling that moves the entire mass in each point to the same destination:

Lemma 2.1.

[Birkhoff-Von Neumann Theorem (Konig, 2001), Lemma A.1 in (Fernandes et al., 2019)): For two datasets $K=\{x_{1},\ldots,x_{m}\}$ and $K^{\prime}=\{y_{1},\ldots,y_{m}\}$ , there is a permutation $\pi:[m]\rightarrow[m]$ such that

(3)

d_{\textsf{EM}}(\tilde{K},\tilde{K}^{\prime})=\frac{1}{m}\sum_{i=1}^{m}d_{% \mathcal{X}}(x_{i},y_{\pi(i)}).

3. Definition of $d_{\textsf{EM}}$ -DP

In this section, we introduce our generalization of metric DP to the user-level. We start with the local model. We use the $d_{\textsf{EM}}$ metric to measure the distance between two datasets $K,K^{\prime}$ since it captures the intuition that the changes which move smaller amounts of data by smaller distances are more sensitive (as discussed in Sec. 1).

Definition 3.1 ((Un)Bounded Local $d_{\textsf{EM}}$ -DP).

Let $\mathcal{M}$ be a mechanism which acts on a dataset $K$ . We say $\mathcal{M}$ satisfies $(\alpha,\delta)$ -bounded local $d_{\textsf{EM}}$ -DP if for any two datasets $K,K^{\prime}$ such that $|K|=|K^{\prime}|$ , and for any output $O$ , we have

(4)

\Pr[\mathcal{M}(K)=O]\leq e^{\alpha d_{\textsf{EM}}(\tilde{K},\tilde{K}^{% \prime})}\Pr[\mathcal{M}(K^{\prime})=O]+\delta.

If the above equation holds for all datasets $K,K^{\prime}$ , regardless of whether $|K|=|K^{\prime}|$ , we say that $\mathcal{M}$ satisfies $(\alpha,\delta)$ -unbounded local $d_{\textsf{EM}}$ -DP.

For bounded $d_{\textsf{EM}}$ -DP, the size of the dataset is not protected, which is acceptable for applications where the amount of data is not sensitive. We explicitly differentiate between bounded and unbounded data since privacy analysis is easier under bounded $d_{\textsf{EM}}$ -DP by leveraging Lemma 2.1 (see Section 4).
In the central model, our goal is to protect changes in a single user’s dataset, transitioning from $K_{i}$ to $K^{\prime}_{i}$ , with a privacy guarantee that depends on $d_{\textsf{EM}}(\tilde{K},\tilde{K}_{i})$ . We consider the bounded data setting where each dataset $K_{i}$ has a publicly known fixed size $m$ .

Definition 3.2 (Bounded Central $d_{\textsf{EM}}$ -DP).

Let $K_{G}=K_{1}\cup\cdots\cup K_{n}$ denote a global dataset from $n$ users where $\forall i\in[n],K_{i}\in\mathcal{X}^{m}$ . We say $K_{G}\sim K^{\prime}_{G}$ if $K^{\prime}_{G}$ can be obtained from $K_{G}$ by changing the dataset of a single user $U_{i}$ from $K_{i}$ to $K_{i}^{\prime}$ . We say a mechanism $\mathcal{M}$ satisfies $(\alpha,\delta)$ -bounded $d_{\textsf{EM}}$ -DP if, for all $K_{G},K_{G}^{\prime}$ such that $K_{G}\sim K^{\prime}_{G}$ , and all outputs $O$ , we have

\Pr[\mathcal{M}(K_{G})=O]\leq e^{\alpha d_{\textsf{EM}}(\tilde{K}_{i},\tilde{K% }_{i}^{\prime})}\Pr[\mathcal{M}(K_{G}^{\prime})=O]+\delta.

In the above definition, the two global datasets $K_{G},K_{G}^{\prime}$ are indistinguishable with a privacy parameter $\alpha d_{\textsf{EM}}(K_{i},K_{i}^{\prime})$ . Since we consider the bounded data setting, neither the number of total users, $n$ , nor the size of the individual datasets, $m$ , are protected.

It is important to note that the above definition cannot be directly translated to the unbounded data setting. This limitation arises from the fact that if each $K_{i}$ is allowed to have an arbitrary size, then changing a single $K_{i}$ could potentially change the entirety of $K_{G}$ in the worst-case (where user $U_{i}$ contributes the entire global dataset). This essentially reduces the central model (Def. 3.2) to the local model (Def. 3.1). We circumvent this challenge and provide a privacy definition for the undounded data setting in Sec. 5, by controlling the amount of data from each user.

Setting the Privacy Parameters. There are some semantic differences between the parameter $\alpha$ in Definitions 3.1 and 3.2, and $\varepsilon$ in Definitions 2.1 and 2.2. The privacy parameter $\varepsilon$ is unitless. On the other hand, $\alpha$ is not unitless – it has a unit inversely proportional to $d_{\textsf{EM}}$ . While $\varepsilon\gg 1$ is usually not considered acceptable for standard DP, it is not unreasonable to set $\alpha\gg 1$ in our case. This is acceptable if a strong privacy guarantee is needed only for input pairs that are close to each other since $d_{\textsf{EM}}(\cdot,\cdot)<1$ . For all $q,\tau\in[0,1]$ , let $\mathcal{E}(q,\tau)$ refer to the minimum privacy parameter that is acceptable over all data changes of the form

A $\tau$ -fraction of $K$ is changed by average distance $q$ .

Then, $\alpha$ may be set as $\alpha=\inf_{q,\tau\in[0,1]}\tfrac{\mathcal{E}(q,\tau)}{q\tau}$ , and we can verify that Definition 3.1 will protect an input pair with the corresponding budget $\mathcal{E}(q,\tau)$ . The parameter $\delta$ has the same interpretation as in standard DP, and should be set $\delta\ll\frac{1}{poly(n)}$ .

Concrete Example. Throughout this paper, we consider a dataset of $n=10^{5}$ users, each of whom contributes $m=10^{3}$ location data points over the period of a month. We use the length of the shortest path on Earth’s surface as our metric $d_{\mathcal{X}}$ . Suppose we want to protect a user’s location over any particular day within a radius of $1000$ miles, and the user’s location over the entire time period within a distance of $100$ miles. In the normalized metric space, these distances are $q_{1}=0.08$ and $q_{2}=0.008$ , respectively⁵⁵5The maximum surface distance between two points on Earth is $\approx 12930$ miles.. They correspond to a fraction $\tau_{1}=\frac{1}{30}$ and $\tau_{2}=1$ of the metric space changing, respectively. Suppose we want to protect both of these inputs with privacy parameter $\varepsilon=0.2$ . Hence, we set $\alpha=\min\left\{\frac{\varepsilon}{\tau_{1}q_{1}},\frac{\varepsilon}{\tau_{2% }q_{2}}\right\}=25.$ This value is much higher than typical privacy parameters used in DP, and yet it is able to adequately protect the desired inputs. Finally, we will set $\delta=10^{-12}$ .

4. Mechanisms for $d_{\textsf{EM}}$ -DP

Now, we describe our mechanisms for releasing queries under $d_{\textsf{EM}}$ -DP. Throughout this section, we focus on the bounded data setting, and consider both the local and central models. In Sec. 4.1, we show how to bound the sensitivity of linear queries, which can then be released with the addition of calibrated noise. Then, in Sec. 4.2, we show that we can release a noisy representation of $\tilde{K}$ under $d_{\textsf{EM}}$ -DP by applying any $d_{\mathcal{X}}$ -DP mechanism to each item in $K$ , and aggregating the outputs.

4.1. Linear Queries

A non-adaptive linear query on a dataset $K$ computes the value of $F\tilde{K}$ , where $F\in\mathbb{R}^{d\times|\mathcal{X}|}$ is a matrix with $d$ rows. The linearity comes from the linear transformation $F$ ; our linear queries are normalized since they operate on $\tilde{K}$ rather than $K$ . Such normalized queries can be used for answering the fraction of users satisfying a predicate (Blum et al., 2013). Nevertheless, one can estimate the non-normalized query by multiplying by an estimate of $|K|$ .

Let us represent $F$ by a function $f:\mathcal{X}\rightarrow\mathbb{R}^{d}$ where $f(x)=F[x]$ , the $x$ th column of $F$ . The linear query can then be re-written as

(5)

q_{f}(K)=\mathbb{E}_{x\sim\tilde{K}}[f(x)].

Thus, we may interpret a linear query on $\tilde{K}$ as expected value of $f$ over a random item from $K$ . Linear queries are simple but capable of expressing many indispensible tools in data analysis, and they are well-studied in differential privacy (Blum et al., 2013; Hardt and Talwar, 2010; Dwork et al., 2014). We will design a simple mechanism satisfying $d_{\textsf{EM}}$ -DP for releasing a linear query, based on bounding the sensitivity of $q_{f}$ under the $d_{\textsf{EM}}$ . The sensitivity measures the maximum change output $q_{f}$ , measured according to some norm $\|\cdot\|$ on $\mathbb{R}^{d}$ , relative to a change in the inputs by a certain $d_{\textsf{EM}}$ . This is given by:

\Delta_{\mathsf{EM}}(q_{f})=\max_{K,K^{\prime}\in\mathcal{X}^{*}}\frac{\|q_{f}% (K)-q_{f}(K^{\prime})\|}{d_{\textsf{EM}}(\tilde{K},\tilde{K}^{\prime})}.

Naively, it is intractible to compute this sensitivity since there are exponentially many datasets of a given size. Additionally, this sensitivity might not always be bounded. For instance, consider two points $x,x^{\prime}$ that are close in $\mathcal{X}$ , but $f(x)$ is very far from $f(x^{\prime})$ . In this case, we cannot bound $\Delta_{\mathsf{EM}}$ , since the $K$ and $K^{\prime}$ which put all their mass on $x$ and $x^{\prime}$ , respectively, will have $\frac{\|q_{f}(K)-q_{f}(K^{\prime})\|}{d_{\textsf{EM}}(\tilde{K},\tilde{K}^{% \prime})}=\|f(x)-f(x^{\prime})\|$ . However, if $f$ is $\ell$ -Lipschitz, meaning

\max_{x,x^{\prime}\in\mathcal{X}}\frac{\|f(x)-f(x^{\prime})\|}{d_{\mathcal{X}}% (x,x^{\prime})}\leq\ell,

then it is possible to bound $\Delta_{\mathsf{EM}}(q_{f})$ using $\ell$ . We do this by observing that, for any coupling between $\tilde{K},\tilde{K}^{\prime}$ , each mass that moves a distance $d$ may change $q_{f}$ by up to $\ell d$ (based on Eq. (5)). This allows us to compute the following bound.

Theorem 4.1.

Let $q_{f}(K)$ be a linear query of the form in (5), where $f:\mathcal{X}\rightarrow\mathbb{R}^{d}$ is $\ell$ -Lipschitz. Then, we have $\Delta_{\mathsf{EM}}(q_{f})\leq\ell$ .

Remarks. The above theorem implies that a reasonable upper bound for $\Delta_{\mathsf{EM}}(q_{f})$ can be made when the query function $f$ is smooth in terms of $d_{\mathcal{X}}$ . In Sec. 6.1 we outline a special type of linear query for which this is the case. Additionally, the aforementioned example illustrates that this sensitivity analysis is tight. This means that $d_{\mathcal{X}}$ , in addition to defining the privacy semantics, also influences the types of queries that can be answered with good utility.

Proof Sketch. Let $C$ be the minimum-cost coupling between $\tilde{K},\tilde{K}^{\prime}$ . $\|\mathbb{E}_{x\sim\tilde{K}}[f(x)]-\mathbb{E}_{x\sim\tilde{K}^{\prime}}[f(x)]\|$ can then be bounded by transporting $\tilde{K}^{\prime}$ onto $\tilde{K}$ times the amount that $f$ can change when each mass is transported, which is atmost $\ell d_{\textsf{EM}}(\tilde{K},\tilde{K}^{\prime})$ . ∎

Using the upper bound on $\Delta_{\mathsf{EM}}(q_{f})$ , we follow a well-known approach for privately releasing a point with known sensitivity under a norm: Sample a point $U$ uniformly from the ball $\{x\in\mathbb{R}^{d}:\|x\|=1\}$ , and release $q_{f}+\ell gU$ , where $g\sim\Gamma(d,\frac{\omega}{\alpha})$ , the Gamma distribution with shape $d$ and scale $\frac{\omega}{\alpha}$ (Hardt and Talwar, 2010). Here, $\omega$ is a scale parameter that may be different in the central or local model, since the sensitivity of $f$ is less in the bounded central model. This mechanism, PrivEMDLinear, is outlined in Alg. 1. Combining Thm. 4.1 with a standard privacy analysis, we can show that PrivEMDLinear satisfies $(\alpha,0)$ - $d_{\textsf{EM}}$ DP.

Lemma 4.2.

PrivEMDLinear (Alg. 1) with scale $\omega=\frac{1}{\alpha}$ satisfies $(\alpha,0)$ -unbounded local $d_{\textsf{EM}}$ -DP and with scale $\omega=\frac{1}{\alpha n}$ satisfies $(\alpha,0)$ -bounded central $d_{\textsf{EM}}$ -DP.

Remarks. When using the $1$ -norm, PrivEMDLinear becomes the multidimensional Laplace mechanism. We may instantiate PrivEMDLinear with any noise mechanism that preserves $\alpha\|\cdot\|_{p}$ -metric DP in the space $\mathbb{R}^{d}$ . In particular, under the $2$ -norm, we can add Gaussian noise of width $\frac{\omega\sqrt{1.25\ln(1/\delta)}}{\alpha}$ (Dwork et al., 2014), and this will give better utility than adding noise based on the Gamma distribution at the cost of $(\alpha,\delta)$ -bounded local $d_{\textsf{EM}}$ DP.

Concrete Example. In our location example, consider releasing the average distance of each point in $K$ from a particular city in the local model. This can be expressed with $f(x)=d_{\mathcal{X}}(x,c)$ , where $c$ is the city; by the triangle inequality this is $1$ -Lipschitz. PrivEMDLinear could then be applied to release $q_{f}(K_{i})$ plus noise of expected magnitude $\frac{\ell}{\alpha}=0.04$ per user; the total noise will be $\frac{0.04}{\sqrt{n}}=1.26\times 10^{-4}$ , corresponding to an error of just $1.6$ miles.

Data:

q_{f}

– A

d

-dimensional linear query;

\ell

– Upper bound of the Lipschitz constant of

f

;

K

– Input dataset;

\omega

– scale parameter

Result: An estimate of

q_{f}(K)

Sample

U

uniformly from

\{x\in\mathbb{R}^{d}:\|x\|=1\}

;

Sample

g\in\mathbb{R}

from

\Gamma(d,\omega)

;

return $\hat{q}=q_{f}(\tilde{K})+\ell gU$ ;

Algorithm 1 PrivEMDLinear, an algorithm for releasing linear queries under bounded

d_{\textsf{EM}}

-DP.

4.2. Unordered Release of Item-wise Queries

We now consider the problem of directly releasing a private query applied to each item in $K$ . This can provide a more fine-grained result than the aforementioned linear queries, which outputs the average over all the items. We release the query results as an unordered list to take advantage of the fact that subsequent computation (such as, aggregation) often does not depend on the ordering of the data (Feldman et al., 2022). Specifically, our second mechanism PrivEMDItemWise applies a mechanism $\mathcal{A}$ , which satisfies $\alpha_{0}$ - $d_{\mathcal{X}}$ DP, to each item individually. We use $\mathcal{A}$ as a black-box making PrivEMDItemWise completely general-purpose. For example, one could let $\mathcal{A}$ be a private item-release mechanism⁶⁶6A number of metric DP mechanisms for releasing items in specific applications are mentioned in Sec. 7. and use PrivEMDItemWise to form a histogram of the dataset. $\mathcal{A}$ could also be a classifer, and PrivEMDItemWise can then release a simplified representation of the dataset.

Once PrivEMDItemWise applies $\mathcal{A}$ to each element in the dataset, it shuffles the results (to remove any ordering of the data) and outputs the shuffled list. This appears in Alg. 2, and a precursor to this algorithm appeared in (Fernandes et al., 2019).

Data: Dataset

K\in\mathcal{X}^{m}

, Mechanism

\mathcal{A}:\mathcal{X}\rightarrow\mathcal{Y}

satisfying

(\alpha_{0},0)

d_{\mathcal{X}}

Result:

L\in\mathcal{Y}^{m}

, unordered list (multiset) of item-wise queries from

K

L=\emptyset

;

for $i=1,\ldots,m$ do

Add

\mathcal{A}(x_{i})

L

;

end for

\textrm{Shuffle}(L)

;

return $L$

Algorithm 2 PrivEMDItemWise, a general mechanism for releasing a item-wise queries from

K

as an unordered list under bounded

d_{\textsf{EM}}

-DP

As PrivEMDItemWise does not hide the size of $K$ , we show it satisfies bounded $d_{\textsf{EM}}$ DP. We use the following argument: for a neighboring dataset $K^{\prime}=\{x_{1}^{\prime},\ldots,x_{m}^{\prime}\}$ , by Lemma 2.1 there exists a permutation $\pi:[m]\rightarrow[m]$ satisfying (3). Observe that we release the query responses in an unordered fashion by explicitly shuffling them. This allows us to pair up the element $x_{i}$ with $x_{\pi(i)}$ and analyze the privacy guarantee of releasing $\mathcal{A}(x_{1}),\ldots,\mathcal{A}(x_{m})$ versus $\mathcal{A}(x_{\pi(i)}^{\prime}),\mathcal{A}(x_{\pi(m)}^{\prime})$ . Prior work does this with composition (Fernandes et al., 2019).

However, composition is not the right tool for obtaining a tight privacy analysis. The reason is that composition assumes that each $\mathcal{A}(x_{i})$ is output sequentially, and in particular it is possible to identify which point came from $\mathcal{A}(x_{i})$ and which came from $\mathcal{A}(x_{\pi(i)})$ . In our case, we output an unordered list, and it is not possible to link which point came from an index $i$ . Based on this observation, our key idea is to leverage privacy amplification by shuffling (Feldman et al., 2022) instead, which can yield a much smaller privacy parameter when the output is order invariant.

In particular, our core technical contribution is to generalize the privacy amplification by shuffling to $d_{\mathcal{X}}$ -DP. Specifically, we analyze the privacy guarantee between two multisets $\textsf{Shuffle}(\{\mathcal{A}(x_{1}),\ldots,\mathcal{A}(x_{m}))$ and $\textsf{Shuffle}(\mathcal{A}(x_{1}^{\prime}),\ldots,\mathcal{A}(x_{m}^{\prime}))$ when $\mathcal{A}$ satisfies $d_{\mathcal{X}}$ -DP, in terms of the vector of distances $v=(d_{\mathcal{X}}(x_{i},x_{i}^{\prime}))_{i=1}^{m}$ . The parameters we will be interested in are $\|v\|_{0}$ , or the number of nonzero elements in $v$ , and $\|v\|_{1}$ since in our different privacy models we will be able to bound both. Formally:

Theorem 4.3.

Suppose that $(\mathcal{X},d_{\mathcal{X}})$ is a metric space such that $d_{\mathcal{X}}(\cdot,\cdot)\leq 1$ , and that $\mathcal{A}$ is an ( $\alpha_{0},0)$ $d_{\mathcal{X}}$ -DP algorithm. Let $(x_{1},\ldots,x_{m})$ and $(x_{1}^{\prime},\ldots,x_{m}^{\prime})$ be two vectors, and we define $v=(d_{\mathcal{X}}(x_{i},x_{i}^{\prime}))_{i=1}^{m}$ . Let $0<\delta<1$ be a constant, and suppose it holds that $\alpha_{0}<\ln(\frac{m}{16\ln(4m/\delta)})$ . Then, for all outputs $O$ , we have that

\Pr[\mathsf{Shuffle}(\mathcal{A}(x_{1}),\ldots,\mathcal{A}(x_{m}))=O]\\ \leq e^{\alpha}\Pr[\mathsf{Shuffle}(\mathcal{A}(x_{1}^{\prime}),\ldots,% \mathcal{A}(x_{m}^{\prime}))=O]+\delta e^{\alpha},

where

\alpha\leq\|v\|_{0}\ln\left(1+\frac{\exp(\nicefrac{{\alpha_{0}\|v\|_{1}}}{{\|v% \|_{0}}})-1}{\exp{(\nicefrac{{\alpha_{0}\|v\|_{1}}}{{\|v\|_{0}}})}+1}\left(% \frac{8\sqrt{e^{\alpha_{0}}\ln(4\|v\|_{0}/\delta)}}{\sqrt{m}}+\frac{8e^{\alpha% _{0}}}{m}\right)\right).

Remarks. In particular, if $\alpha_{0}\leq\frac{\|v\|_{0}}{\|v\|_{1}}$ , the above bound is roughly $\frac{\alpha_{0}\|v\|_{1}}{\sqrt{m}}$ , which grows with just $\sqrt{m}$ (as $\|v\|_{1}\leq m$ ). The standard result is only applicable when $\mathcal{A}$ satisfies $\alpha$ -local DP, and just $x_{1}$ is changed to $x_{1}^{\prime}$ (since each user owns just one item). We generalize the state-of-the-art privacy amplification by shuffling analysis of Feldman et al. (Feldman et al., 2022) to $d_{\mathcal{X}}$ -DP, and our result may be of independent interest. We state our result generally in terms of the vector $\|v\|_{0},\|v\|_{1}$ since we will be applying it with different known bounds on these quantities.

Proof Sketch. We first generalize the analysis of amplification by shuffling to the datasets $(x_{1},x_{2},\ldots,x_{m})$ and $(x_{1}^{\prime},x_{2},\ldots,x_{m})$ , where $d_{\mathcal{X}}(x_{1},x_{1}^{\prime})\leq w_{1}$ . We show the resulting privacy parameter is given by $f(w_{1})$ where

f(w_{1})=\ln\left(1+\frac{e^{\alpha_{0}w_{1}}-1}{e^{\alpha_{0}w_{1}}+1}\left(% \frac{8\sqrt{e^{\alpha_{0}}\ln(4/\delta)}}{\sqrt{m}}+\frac{8e^{\alpha_{0}}}{m}% \right)\right).

Then, we apply group privacy $\|v\|_{0}$ times to show the general result holds with parameter $\sum_{i=1}^{\|v\|_{0}}f(\alpha_{0}w_{i})$ . The function $f$ is concave so the worst-case amplification is simply $\|v\|_{0}f(\frac{\|v\|_{1}}{\|v\|_{0}})$ . ∎

Comparison with Composition. Analyzing Thm. 4.3 using the state-of-the-art composition results (Kairouz et al., 2015) and $\alpha_{0}\leq 1$ gives us

\alpha\leq O\left(\alpha_{0}\|v\|_{2}\sqrt{\ln\tfrac{1}{\delta}}\right).

However, we cannot form a satisfying bound on the $2$ -norm of $v$ —it is only possible to say $\|v\|_{2}\leq\|v\|_{1}$ which is tight when e.g. $\|v\|_{0}$ = 1. The bound is thus missing the factor of $\frac{1}{\sqrt{m}}$ —composition here does not leverage the fact that all $m$ items are released in a random order.

Combining (3) and Thm. 4.3, we obtain an improved privacy guarantee for PrivEMDItemWise. We may state this guarantee in both the bounded local and central models. In the local model, recall that each user is applying PrivEMDItemWise to their data. In the central model, the central server applies PrivEMDItemWise to the entire dataset, and thus releases the frequencies of $mn$ itemwise queries.

Theorem 4.4.

For any $\delta\in(0,1)$ , PrivEMDItemWise shown in Alg. 2 satisfies bounded local $(\alpha,\delta^{\prime})$ - $d_{\textsf{EM}}$ DP, where

\textstyle{\alpha=\sup_{w\in[0,1]}\frac{h(m;m,mw)}{w}\ \ \ \ \ \text{ and }% \delta^{\prime}=\delta e^{h(m;m,m)},}

and

h(m;x_{0},x_{1})=x_{0}\ln\left(1+\frac{\exp(\nicefrac{{\alpha_{0}x_{1}}}{{x_{0% }}})-1}{\exp{(\nicefrac{{\alpha_{0}x_{1}}}{{x_{0}}})}+1}\left(\frac{8\sqrt{e^{% \alpha_{0}}\ln(4x_{0}/\delta)}}{\sqrt{m}}+\frac{8e^{\alpha_{0}}}{m}\right)% \right).

Similarly, PrivEMDItemWise satisfies bounded central $(\alpha,\delta^{\prime})$ - $d_{\textsf{EM}}$ DP, where

\textstyle{\alpha=\sup_{w\in[0,1]}\frac{h(mn;m,mw)}{w}\ \ \ \ \ \text{ and }% \delta^{\prime}=\delta e^{h(mn;m,m)}.}

Remarks. Thm. 4.4 gives the tightest possible privacy parameters, but we may also give an asymptotic formula as follows. For desired privacy parameters $(\alpha,\delta)$ , one should set

(6)

\displaystyle\alpha_{0}=\begin{cases}\frac{\alpha}{32\sqrt{m\ln(4me^{\alpha}/% \delta)}}&\text{ if $\alpha\leq 32\sqrt{m\ln(4me^{\alpha}/\delta)}$}\\ 2\ln\left(\frac{\alpha}{16\sqrt{m\ln(4me^{\alpha}/\delta)}}\right)&32\sqrt{m% \ln(4me^{\alpha}/\delta)}<\alpha<m\end{cases}

and respectively

(7)

\displaystyle\alpha_{0}=\begin{cases}\frac{\alpha\sqrt{n}}{32\sqrt{m\ln(4me^{% \alpha}/\delta)}}&\text{ if $\alpha\sqrt{n}\leq 32\sqrt{m\ln(4me^{\alpha}/% \delta)}$}\\ 2\ln\left(\frac{\alpha\sqrt{n}}{16\sqrt{m\ln(4me^{\alpha}/\delta)}}\right)&32% \sqrt{m\ln(4me^{\alpha}/\delta)}<\alpha\sqrt{n}<m\sqrt{n}\end{cases}

in order to achieve $d_{\textsf{EM}}$ DP in the bounded local (respectively central) model. Assuming $\alpha\leq O(\ln(m))$ , this means that the privacy parameter will be roughly $\frac{\alpha}{\sqrt{m}}$ (resp. $\ln(\frac{\alpha\sqrt{n}}{\sqrt{m}})$ ) for releasing the $m$ samples; this is asymptotically better than the analysis with composition which would require setting $\alpha_{0}=\frac{\alpha}{m}$ (resp. $\frac{\alpha}{m})$ ). Even with higher $\alpha=m^{c}$ for $c<1$ , the budget is still $\frac{\alpha}{\sqrt{m^{1+c}}}$ (resp. $\ln(\frac{\alpha\sqrt{n}}{\sqrt{m^{1+c}}}$ ), which are both significant asymptotic improvements.

Concrete Example. Our improved analysis makes the most significant improvements in the central model. Here, we would have to apply PrivEMDItemWise with $\alpha_{0}=\frac{\alpha}{m}=0.025$ for each of the $m=10^{3}$ location data points per user. Using the guarantee of Thm. 4.4, it is possible to set $\alpha_{0}\approx 3.0$ – a several orders of magnitude improvement.

5. Generalization to Unbounded DP

The mechanisms presented so far face two challenges when applied to the unbounded data setting. First, a direct privacy analysis of the unbounded data setting is difficult since we cannot leverage Lemma 2.1, which significantly simplifies the analysis (for the bounded data setting). Second, and more importantly, the unbounded central model offers no utility improvement over the local model. In the worst-case scenario, a single user may contribute nearly all the data in the dataset, effectively reducing any algorithm to satisfying only local $d_{\textsf{EM}}$ -DP. This issue has been noted in previous work in user-level DP (Liu et al., 2023).

In this section, we tackle these challenges by showing a blackbox reduction from unbounded $d_{\textsf{EM}}$ -DP to bounded $d_{\textsf{EM}}$ -DP. Our reduction works in both the local and central models. The key idea of the reduction is to smoothly project a dataset $K$ of any size to a dataset $L$ of a given fixed size, such that the $d_{\textsf{EM}}$ distance between any two input datasets and the $d_{\textsf{EM}}$ distance between their projections are roughly the same. Then, it is easy to show that applying any bounded $d_{\textsf{EM}}$ -DP algorithm to the smooth projections is sufficient to guarantee unbounded $d_{\textsf{EM}}$ -DP for the entire scheme.

Our proposed projection mechanism is smooth in a near-multiplicative sense, albeit with a small additive penalty when the $d_{\textsf{EM}}$ between the two datasets is small. We account for this subtlety, by slightly modify the privacy semantics of $d_{\textsf{EM}}$ -DP in the unbounded setting to not grow arbitrarily strong as $d_{\textsf{EM}}(\tilde{K},\tilde{K}^{\prime})\rightarrow 0$ . Instead, we introduce a distance threshold $r$ such that for all $d_{\textsf{EM}}(\tilde{K},\tilde{K}^{\prime})\leq r$ enjoys a uniform privacy guarantee of $\varepsilon$ . This refined privacy definition, termed discrete $d_{\textsf{EM}}$ -DP, is formalized (in the local model) as:

Definition 5.1.

[Discrete Local $d_{\textsf{EM}}$ -DP] Let $\mathcal{M}$ be a mechanism which acts on a dataset $K$ . We say $\mathcal{M}$ satisfies $(\varepsilon,\delta,r)$ -discrete local $d_{\textsf{EM}}$ -DP if, for any two datasets $K,K^{\prime}\in\mathcal{X}^{*}$ such that $d_{\textsf{EM}}(\tilde{K},\tilde{K}^{\prime})\leq r$ ,

\Pr[M(K)=O]\leq e^{\varepsilon}\Pr[M(K^{\prime})=O]+\delta.

Like in standard DP, the above definition uses the parameter $\varepsilon$ because it is a unitless privacy parameter—the unit of the metric is expressed in the parameter $r$ .

Fact 5.1.

For any $K,K^{\prime}$ such that $d=d_{\textsf{EM}}(\tilde{K},\tilde{K}^{\prime})$ , $\mathcal{M}$ satisfies

\Pr[M(K)=O]\leq e^{\varepsilon\lceil\frac{d}{r}\rceil}\Pr[M(K^{\prime})=O]+% \delta\exp(\lceil\tfrac{d}{r}\rceil).

Fact 5.1 is implied from Definition 5.1 by a direct application of group privacy (Dwork, 2006) (proven in Lemma A.2).This guarantee can be interpreted as providing $d_{\textsf{EM}}$ -DP at the granularity of units of $d_{\textsf{EM}}$ distance $r$ . Note that for all $d\geq r$ , we have $\varepsilon\lceil\frac{d}{r}\rceil\leq\frac{2\varepsilon}{r}d$ . Thus, $(\varepsilon,\delta,r)$ -discrete local $d_{\textsf{EM}}$ DP is roughly equivalent to $(\frac{2\varepsilon}{r},\delta)$ -unbounded local $d_{\textsf{EM}}$ -DP, except if $d_{\textsf{EM}}(\tilde{K},\tilde{K}^{\prime})\leq r$ . In this case, the privacy parameter will not go below $\varepsilon$ . This adjustment does not significantly alter the overall privacy semantics of $d_{\textsf{EM}}$ -DP; one may simply set $\alpha$ as described in Sec. 4.

In the central model, we make a similar definition:

Definition 5.2.

[Discrete Central $d_{\textsf{EM}}$ -DP] Let $K_{G}=K_{1}\cup\cdots\cup K_{n}$ denote a global dataset from $n$ users (of any size). We say $K_{G}\sim_{r}K^{\prime}_{G}$ if $K^{\prime}_{G}$ can be obtained from $K_{G}$ by changing $K_{i}$ to $K_{i}^{\prime}$ for just one user $i$ , such that $d_{\textsf{EM}}(\tilde{K}_{i},\tilde{K}^{\prime}_{i})\leq r$ . We say a mechanism $\mathcal{M}(K_{G})$ satisfies $(\varepsilon,\delta,r)$ -discrete central $d_{\textsf{EM}}$ -DP if, for all $K_{G},K_{G}^{\prime}$ such that $K_{G}\sim_{r}K_{G}^{\prime}$ , we have

\Pr[M(K_{G})=O]\leq e^{\varepsilon}\Pr[M(K^{\prime}_{G})=O]+\delta.

As before, $(\varepsilon,\delta,r)$ -discrete central $d_{\textsf{EM}}$ -DP is roughly equivalent to $(\frac{2\varepsilon}{r},\delta)$ -bounded central $d_{\textsf{EM}}$ -DP when all user datasets have size $m$ . However, we will see that Definition 5.2 is the appropriate generalization to unbounded user datasets under our projection mechanism which is described below.

Our projection mechanism first generates a fixed number of samples with replacement from each user’s dataset $K$ . Next, it applies a blackbox bounded $d_{\textsf{EM}}$ -DP mechanism, $\mathcal{A}$ , to the projected dataset $L$ . By blackbox application we mean that $\mathcal{A}$ can be any arbitrary mechanism as long as it satisfies bounded $d_{\textsf{EM}}$ -DP. The projection mechanism, in the central model, is outlined in Alg. 3 (in the local model, each user samples from their own $K_{i}$ , so we would simply have $n=1$ ). The smoothness of the projection from $K_{G}$ to $L$ follows from the following claim:

Lemma 5.2.

Let $\tilde{K},\tilde{K}^{\prime}\in\Delta^{\mathcal{X}}$ be probability distributions, and let $C^{*}$ be the minimum cost coupling between $\tilde{K},\tilde{K}^{\prime}$ . Let $\{(x_{i},y_{i})\}_{i=1}^{s}$ denote $s$ i.i.d. samples from $C^{*}$ , and let $L=(x_{1},\ldots,x_{s})$ and $L^{\prime}=(y_{1},\ldots,y_{s})$ . Then,

\Pr[d_{\textsf{EM}}(\tilde{L},\tilde{L}^{\prime})\geq(1+\sqrt{2})d_{\textsf{EM% }}(\tilde{K},\tilde{K}^{\prime})+\tfrac{3}{s}\ln(\tfrac{1}{\delta})]\leq\delta.

Data:

K_{G}

- Global datasets of

n

users;

\mathcal{A}

- A mechanism satisfying bounded

d_{\textsf{EM}}

-DP;

s

- Number of samples.

L=\emptyset

;

for $i=1$ to $n$ do

Add

s

uniform samples with replacement from

K_{i}

L

end for

O=\mathcal{A}(L)

;

return $O$

Algorithm 3 BoundedEMDReduction, a reduction from unbounded

d_{\textsf{EM}}

-DP to bounded

d_{\textsf{EM}}

-DP.

Proof Sketch: Intuitively, for any coupling $C$ between $\tilde{K}_{i}$ and $\tilde{K}_{i}^{\prime}$ , we can simulate sampling $s$ times from $\tilde{K}_{i}^{\prime}$ by first sampling $\{x_{1},\ldots,x_{s}\}$ from $\tilde{K}_{i}$ , and then sampling $y_{i}\sim C_{x_{i}}(\cdot)$ . This view shows there is a transportation plan from $L=\{x_{1},\ldots,x_{s}\}$ and $L^{\prime}=\{y_{1},\ldots,y_{s}\}$ of expected cost $\mathbb{E}_{x\sim C_{1},y\sim C_{x}}d_{\mathcal{X}}(x,y)=d_{\textsf{EM}}(% \tilde{K}_{i},\tilde{K}_{i}^{\prime})$ . Using Bernstein’s inequality, we can show with probability at least $1-\delta$ , $d_{\textsf{EM}}(\tilde{L},\tilde{L}^{\prime})$ is upper bounded by $2d_{\textsf{EM}}(\tilde{K}_{i},\tilde{K}_{i}^{\prime})+\frac{6}{s}\log(\frac{1% }{\delta})$ . ∎

Thus, sampling is a smooth projection from $K_{G}$ to $L$ , with the caveat that there is an additive $\frac{\log(1/\delta)}{s}$ term that comes into play if $s$ is too small to guarantee convergence. This is the reason for our relaxation to discrete $d_{\textsf{EM}}$ -DP.

In summary, we have the following privacy guarantee:

Theorem 5.3.

Let $\varepsilon>0$ and $\delta,r\in[0,1]$ be arbitrary constants. Suppose $\mathcal{M}$ is a mechanism which satisfies $(\alpha,\delta)$ -bounded local $d_{\textsf{EM}}$ -DP (Definition 3.1), where

\alpha=\tfrac{\varepsilon}{(1+\sqrt{2})r+\tfrac{3}{s}\ln(\tfrac{1}{\delta})}.

Then, BoundedEMDReduction satisfies $(\varepsilon,2\delta,r)$ -discrete local $d_{\textsf{EM}}$ -DP. Similarly, if $\mathcal{M}$ satisfies $(\alpha,\delta)$ -bounded central $d_{\textsf{EM}}$ -DP (Definition 3.1), then BoundedEMDReduction satisfies $(\varepsilon,2\delta,r)$ -discrete central $d_{\textsf{EM}}$ -DP.

Remarks. If the number of samples $s$ is at least $\frac{\ln(1/\delta)}{r}$ , then Thm. 5.3 shows there is only a small multiplicative cost to considering just bounded $d_{\textsf{EM}}$ -DP (in the respective local or central model). In this case, the bounded algorithm will need to roughly satisfy $(\frac{\varepsilon}{r},\delta)$ -bounded $d_{\textsf{EM}}$ -DP, and this is roughly the same as the resulting $(\varepsilon,\delta,r)$ -discrete $d_{\textsf{EM}}$ DP algorithm. There is no privacy disadvantage to taking a large number of samples, and the utility may also increase due to more information about the dataset being captured (recall that the projection does not providing privacy; it is being provided by $\mathcal{M}$ ). Thus, the number of samples may be set to be large with computational costs being the only constraint.

Proof Sketch. Let $L$ and $L^{\prime}$ denote the sampled data for $K_{G}$ and $K_{G}^{\prime}$ , respectively. By convexity of DP, it suffices to analyze the privacy guarantee for any coupling between the random variables $L,L^{\prime}$ . In particular, we use the optimal coupling between $K_{G},K_{G}^{\prime}$ to define this coupling. The resulting privacy parameter is bounded in terms of the expected cost $d_{\textsf{EM}}(\tilde{K},\tilde{K}^{\prime})$ by Lemma 5.2. ∎

BoundedEMDReduction can be used to bound the contribution of each user in the central setting, allowing us to apply the simpler Definition 3.2. In addition, it can be used to adapt PrivEMDItemWise to the unbounded data setting. One caveat is that utility may not be preserved if the number of user samples is too small or, in the central setting, if the users data distributions are heterogeneous. In particular, if users have varying numbers of samples, each from different distributions, applying BoundedEMDReduction equalizes the frequency of all user data. Nonetheless, it is often reasonable to assume the users have homogeneous data distributions (Liu et al., 2020; Acharya et al., 2023).

6. Applications of Proposed Mechanisms

In this section, we compare the utilities of PrivEMDLinear and PrivEMDItemWise to existing mechanisms satisfying user-level DP. For simplicity, we assume the bounded data setting.
Notations. We define the following quantities of a real matrix $M\in\mathbb{R}^{d\times k}$ . First, the $(p,q)$ operator norm of $M$ , denoted by $\|M\|_{p\rightarrow q}$ , is given by $\|M\|_{p\rightarrow q}=\sup_{x\in\mathbb{R}^{k},\|x\|_{p}\leq 1}\|Mx\|_{q}$ . We can show that $\|M\|_{1\rightarrow 2}$ is equal to the maximum $\ell_{2}$ norm of a column of $M$ . Furthermore, $\|M\|_{2\rightarrow 2}$ , more commonly written as $\|M\|_{2}$ , is the spectral norm and is equal to the maximum singular value of $M$ . Matrix norms satisfy the important submultiplicative property, which states that $\|MN\|_{p\rightarrow r}\leq\|M\|_{q\rightarrow r}\|N\|_{p\rightarrow q}$ for any matrices $M,N$ and $p,q,r\geq 1$ . Next, let $I_{d}$ denote the $d\times d$ identity matrix, and again suppose that $M\in R^{d\times k}$ with $d\leq k$ . If $M$ has full row rank, then there exists a matrix $N\in\mathbb{R}^{k\times d}$ such that $MN=I_{d}$ . We call such a matrix $N$ a right inverse of $M$ . Finally, for $M\in\mathbb{R}^{s_{1}\times t_{1}}$ and $N\in\mathbb{R}^{s_{2}\times t_{2}}$ , let $M\otimes N\in\mathbb{R}^{s_{1}s_{2}\times t_{1}t_{2}}$ denote the Kronecker product of two real matrices, whose entry in $((i_{1},i_{2}),(j_{1},j_{2}))$ is $M_{i_{1}j_{1}}N_{i_{2}j_{2}}$ .

6.1. Linear Embedding Queries

Many applications of metric DP assume there is an embedding function $\phi:\mathcal{X}\rightarrow\mathbb{R}^{t}$ , which maps an item to its semantic representation in $\mathbb{R}^{t}$ (each of the examples in Sec. 1 have an embedding representation). The metric $d_{\mathcal{X}}$ is then the distance between $\phi(x)$ and $\phi(x^{\prime})$ ; in this section, we consider the $l_{2}$ distance.

Since $\phi(x)$ also communicates information about the item $x$ , we define linear embedding queries as linear queries applied to an item’s embedding $\phi(x)$ . Formally,

q_{f\circ\phi}(K)=\mathbb{E}_{x\sim\tilde{K}}[f\circ\phi(x)],

where $f(y)=Fy$ for a matrix $F\in\mathbb{R}^{d\times t}$ (meaning that $f$ is a linear function). Assume each row $F_{i}$ of $F$ is normalized so that $\|F_{i}\|_{2}\leq 1$ . Each coordinate of $f\circ\phi$ is equal to $\mathbb{E}_{x\sim\tilde{K}}[\langle F_{i},\phi(x)\rangle]$ . This can be interpreted as the average similarity of each item in $K$ with the vector $F_{i}$ . Our analysis will assume that $d<|\mathcal{X}|$ and $d\ll n$ , which is usually the case in practice. Note that we may write $q_{f\circ\phi}$ as $F\Phi\tilde{K}$ , where $\Phi\in\mathbb{R}^{t\times\mathcal{X}}$ is the collection of embedding vectors in $\mathcal{X}$ .

6.1.1. Local Model

Existing user-level DP solutions ask each user $U_{i}$ to privately release the query $\hat{q}_{i}=q_{f\circ\phi}(\tilde{K}_{i})$ . The aggregator computes the average $\hat{q}=\frac{1}{n}\sum_{i=1}^{n}\hat{q}_{i}$ . The current best solutions have the following error guarantee (Duchi et al., 2013; Bassily, 2019):

Lemma 6.1.

(From Proposition 3 in (Duchi et al., 2013)) There exists an $(\varepsilon,0)$ -bounded user-level DP in the local model algorithm which produces an estimate $\hat{q}$ such that, for all $\tilde{K}$ ,

\mathbb{E}[\|\hat{q}-q_{f\circ\phi}(\tilde{K})\|_{2}]\leq O\left(\|F\Phi\|_{1% \rightarrow 2}\tfrac{\sqrt{d}}{\varepsilon\sqrt{n}}\right).

To interpret the term $\|F\Phi\|_{1,2}$ , we can use the inequality $\|F\Phi\|_{1\rightarrow 2}\leq\|F\|_{2}\|\Phi\|_{1\rightarrow 2}$ , which is tight for certain choices of $F$ and $\Phi$ . By assumption, we know $\|F\|_{2}\leq\sqrt{d}$ and $\|\Phi\|_{1,2}\leq 1$ , both of which can also be tight. The bound is thus $O(\frac{d}{\varepsilon\sqrt{n}})$ .

On the other hand, for $d_{\textsf{EM}}$ -DP, by Thm. 4.1, we know that $\Delta_{\mathsf{EM}}(q_{f\circ\phi})$ is at most the Lipschitz constant of $f\circ\phi$ given by:

\max_{x,x^{\prime}\in\mathcal{X}}\tfrac{\|F(\phi(x))-F(\phi(x^{\prime}))\|}{\|% \phi(x)-\phi(x^{\prime})\|}\leq\max_{x,x^{\prime}\in\mathcal{X}}\tfrac{\|F(% \phi(x))-\phi(x^{\prime}))\|}{\|\phi(x)-\phi(x^{\prime})\|}\leq\|F\|_{2}.

Hence, each user can apply PrivEMDLinear with the Gaussian mechanism with $\ell=\|F\|_{2}$ , which gives the following utility guarantee:

Lemma 6.2.

There exists an $(\alpha,\delta)$ -bounded $d_{\textsf{EM}}$ -DP algorithm in the local model which produces an estimate $\hat{q}$ such that, for all $\tilde{K}$ ,

\mathbb{E}[\|\hat{q}-q_{f\circ\phi}(\tilde{K})\|_{2}]\leq\|F\|_{2}\tfrac{\sqrt% {1.25d\ln(1/\delta))}}{\alpha\sqrt{n}}.

Remarks. We use the Gaussian mechanism because it performs better under the $\ell_{2}$ error than the pure $(\alpha,0)$ -bounded local $d_{\textsf{EM}}$ DP illustrated in Alg. 1. However, this forces us to use $\delta>0$ . We leave it as an interesting open question whether similar error can be achieved with $\delta=0$ . Compared to Lemma 6.1, the above bound differs by a factor of $\frac{\varepsilon}{\alpha}$ (and small $\ln\frac{1}{\delta}$ terms)—when $\alpha=\varepsilon$ , we know that $d_{\textsf{EM}}$ -DP provides better privacy. When $\varepsilon\ll\alpha$ , that PrivEMDItemWise offers lower error than Lemma 6.1.

6.1.2. Central Model

In the central model, linear query release has been extensively studied, and optimal algorithms under item-level DP are known (Hardt and Talwar, 2010; Bhaskara et al., 2012; Nikolov et al., 2013). These algorithms can be easily adapted to user-level DP, which will provide the following guarantee:

Lemma 6.3.

(From Thm. 1.3 in (Hardt and Talwar, 2010)) There exists an $(\varepsilon,0)$ -bounded user-level DP algorithm in the central model which produces an estimate $\hat{q}$ such that, for all $\tilde{K}$ ,

\mathbb{E}[\|\hat{q}-q_{f\circ\phi}(\tilde{K})\|_{2}]\leq O\left(\|F\Phi\|_{1% \rightarrow 2}\tfrac{\sqrt{d}}{\varepsilon n}\ln\left(\tfrac{k}{d}\right)% \right).

To provide $(\alpha,\delta)$ -bounded $d_{\textsf{EM}}$ -DP in the central model, we can use PrivEMDLinear with the Gaussian mechanism with scale $\omega=\frac{1}{n\alpha}$ . Following the same approach as in Lemma 6.2, this results in $O(\|F\|_{2}\frac{\sqrt{d\ln\frac{1}{\delta}}}{\alpha n})$ error. Again, this is worse than Lemma 6.3 by a factor of $\frac{\varepsilon}{\alpha}$ , and similar observations apply.

6.2. Frequency Estimation

Here, we evaluate the error of PrivEMDItemWise for private frequency estimation, where the goal is to obtain a private estimate $\tilde{F}$ of the (normalized) histogram $\tilde{K}$ . This problem has been extensively studied in privacy (Hay et al., 2009; Xu et al., 2013; Suresh, 2019; Kairouz et al., 2016; Acharya et al., 2018; Chen et al., 2020; Acharya et al., 2023); the high-level goal is to minimize the $\ell_{p}$ distance between $\tilde{F}$ and $\tilde{K}$ . However when the data domain is a general metric space $\mathcal{X}$ , not all $\ell_{p}$ perturbations to $\tilde{K}$ are the same. Thus, we will measure similarity between $\tilde{K},\tilde{F}$ with $d_{\textsf{EM}}(\tilde{K},\tilde{F})$ , as we do in our privacy definition.

To ease the analysis while still demonstrating the effectiveness of our mechanisms, we fix $\mathcal{X}$ to be the following “clustered” metric space. Let $\mathcal{X}=\mathcal{B}\times\mathcal{C}$ , where $\mathcal{B}=\{b_{1},\ldots,b_{s}\}$ , $\mathcal{C}=\{c_{1},\ldots,c_{t}\}$ and $s\cdot t=k$ . For some $r<\frac{1}{2}$ , the distance is given by the following:

d_{\mathcal{B}\times\mathcal{C}}((b,c),(b^{\prime},c^{\prime}))=\begin{cases}0% &\text{if $b=b^{\prime}$ and $c=c^{\prime}$}\\ r&\text{if $b=b^{\prime}$}\\ 1&\text{otherwise}.\end{cases}

We can think of this metric space as a collection of $s$ clusters consisting of the $t$ items $\{(b,c_{1}),\ldots,(b,c_{t})\}$ for each $b\in\mathcal{B}$ . Points in a cluster are more related, being at distance $r$ apart, than items in two different clusters, which are distance $1$ apart. We will assume that privacy is only needed between two items in the same cluster, so we will set $\alpha=\frac{\varepsilon}{r}$ .

6.2.1. Algorithms in the Local Model

At a high level, in the local model each user applies a private mechanism $\mathcal{A}:\mathcal{X}\rightarrow\mathcal{Y}$ (with $\mathcal{Y}$ discrete and $|\mathcal{Y}|\geq|\mathcal{X}|$ ) to each sample and releases it. The central server forms an aggregate vector $v\in\mathbb{R}^{\mathcal{Y}}$ . Let $A\in\mathbb{R}^{\mathcal{X}\times\mathcal{Y}}$ denote the transition probability matrix of $\mathcal{A}$ ; we have by linearity of expectation that $\mathbb{E}[v]=\tilde{K}A$ . Assuming that $A$ has a right inverse $B$ , the central server returns the estimate $\tilde{L}=vB$ , which is unbiased. All previous work in distribution estimation under local DP can be expressed in this way (Kairouz et al., 2016; Acharya et al., 2018; Chen et al., 2020; Acharya et al., 2023). We summarize this in Alg. 4.

Data:

K

, a family of datasets from

n

users each with size

m

;

\mathcal{A}

, a mechanism from

\mathcal{X}

\mathcal{Y}

;

B\in\mathbb{R}^{\mathcal{Y}\times\mathcal{X}}

, a right inverse of

\mathcal{A}

for each user $i$ from $1$ to $n$ do

L_{i}=\emptyset

;

for $l_{j}\in K_{i}$ do

r_{j}=\mathcal{A}(l_{j})

;

Add

r_{j}

L_{i}

;

end for

Release

\tilde{L}_{i}

;

end for

v=\frac{1}{n}\sum_{i=1}^{n}\tilde{L}_{i}

;

\tilde{F}=vB

;

return $\tilde{F}$

Algorithm 4 FreqEstLocal, a general framework for histogram estimation under local DP

The state-of-the-art approach for frequency estimation is the Hadamard response (Acharya et al., 2018; Chen et al., 2020), which is based off of the Hadamard matrices (which form a robust encoding of $\mathcal{X}$ ). Specifically, the matrix $A$ is given by $q_{1}\textbf{1}+q_{2}H$ , where $H$ is a Hadamard matrix and $q_{1},q_{2}$ are constants chosen so that $A$ is normalized and that each element is proportional to either $e^{\varepsilon}$ or $1$ . This mechanism has the following utility:

Lemma 6.4.

(From Thm. 3.1 in (Chen et al., 2020)) There exists a mechanism $\mathcal{A}$ such that FreqEstLocal satisfies $(\varepsilon,\delta)$ -bounded user-level DP and returns an estimator $\tilde{F}$ such that

\max_{K}\mathbb{E}[d_{\textsf{EM}}(\tilde{K},\tilde{F})]\leq O\left(\sqrt{% \tfrac{k}{mn}}+\sqrt{\tfrac{k^{2}\ln\frac{m}{\delta}}{n\varepsilon^{2}}}\right).

Remarks. In order to adapt the Hadamard response to the user-level setting, we suppose each user applies $\mathcal{A}$ to each sample with privacy budget $\frac{\varepsilon}{\sqrt{m\ln(m/\delta)}}$ , and $(\varepsilon,\delta)$ -user level DP follows from composition (Kairouz et al., 2015). The term $\sqrt{\frac{k}{mn}}$ is a sampling error which does not depend on $\varepsilon$ , and the second $\sqrt{\frac{k^{2}\ln(m/\delta)}{n\varepsilon^{2}}}$ term is the cost of privacy. The cost of privacy usually dominates, and furthermore its dependence on $m$ is not significant. This is because $m$ reduces both the effect of each sample on the final estimate, and the privacy budget per sample, countervailing itself.

Under $d_{\textsf{EM}}$ -DP, we can use a transition probability matrix $A$ that is less noisy. Specifically, each user may apply $\mathcal{A}$ to their dataset using PrivEMDItemWise, and by Thm. 4.3, $\mathcal{A}$ needs to satisfy $(O(\frac{\alpha}{\sqrt{m\ln(me^{\alpha}/\delta)}}),0)$ $d_{\mathcal{X}}$ -DP. Note that for our choice of $\alpha$ and $\mathcal{X}$ , this is often a less restrictive requirement than ( $\frac{\varepsilon}{\sqrt{m\ln(m/\delta)}},0)$ -DP) since $\alpha<\varepsilon$ . We first derive an error bound on FreqEstLocal in terms of $A$ (specifically its right inverse), which we will then optimize later.

Theorem 6.5.

For the metric space $\mathcal{X}=\mathcal{B}\times\mathcal{C}$ and any mechanism $\mathcal{A}$ satisfying ( $\alpha_{0},0)$ $d_{\mathcal{X}}$ -DP where $\alpha_{0}=O(\frac{\alpha}{\sqrt{m\ln(me^{\alpha}/\delta)}})$ ( $\alpha_{0}$ is specifically defined in Thm. 4.4), FreqEstLocal satisfies $(\alpha,\delta)$ -bounded $d_{\textsf{EM}}$ -DP in the local model and returns an estimator $\tilde{F}$ such that

(8)

\max_{K}\mathbb{E}[d_{\textsf{EM}}(\tilde{F},\tilde{K})]\leq r\sqrt{\frac{st(% \|B^{T}\|_{1\rightarrow 2}^{2}-1)}{mn}}+\sqrt{\frac{s(\|P^{T}B^{T}\|_{1% \rightarrow 2}^{2}-1)}{mn}},

where $B$ is a right inverse of $\mathcal{A}$ , $P=I_{\mathcal{B}}\otimes 1^{+}_{\mathcal{C}}$ , and $1^{+}_{\mathcal{C}}$ is a column vector of $1$ s indexed by $\mathcal{C}$ .

Remarks. The first term in the RHS of (8) is the cost of equalizing mass between clusters, and the second term is the cost of equalizing the mass across clusters (since the matrix $P$ essentially projects $\mathcal{A}$ to act between clusters). For small $r$ , the first term approaches $0$ , and the latter term may also approach $0$ because $\mathcal{A}$ will not often map a point outside its cluster under $d_{\mathcal{X}}$ -DP (and thus, $\|P^{T}B^{T}\|^{2}_{1\rightarrow 2}-1\rightarrow 0$ ).

Proof Sketch. Our bound forms a transportation plan between $\tilde{F}$ and $\tilde{K}$ by first map** the mass within each cluster arbitrarily, which incurs at most $r\|\tilde{F}-\tilde{K}\|_{1}$ cost, and then equalizing the mass between clusters, which incurs at most $\|\tilde{F}-\tilde{K}\|_{1}$ cost. Both of the error terms can then be bounded by viewing $\tilde{F}-\tilde{K}$ as the sum of $mn$ independent variables drawn from a Dirichlet distribution with mean $0$ , and applying a standard variance analysis.∎

We apply Thm. 6.5 with $\mathcal{A}$ being a generalization of $k$ -randomized response (Kairouz et al., 2016) which is adapted to $d_{\mathcal{X}}$ -DP. Specifically, $\textsf{GKRR}_{\alpha_{0}}$ has probabilities given by, for each $(b,c)\in\mathcal{X}$ ,

	$\displaystyle\Pr[\textsf{GKRR}(b,c)=(b,c))]\propto e^{\alpha_{0}},$
	$\displaystyle\Pr[\textsf{GKRR}(b,c))=(b,c^{\prime})]\propto e^{(1-r)\alpha_{0}% }~{}~{}~{}~{}~{}~{}\forall c^{\prime}\neq c,$
	$\displaystyle\Pr[\textsf{GKRR}(b,c))=(b^{\prime},c^{\prime})]\propto 1~{}~{}~{% }~{}~{}~{}\forall b^{\prime}\neq b,c.$

Using this mechanism, the higher-order terms of Eq. (8) will approach $0$ with $r$ , as follows:

Theorem 6.6.

For the metric space $\mathcal{X}=\mathcal{B}\times\mathcal{C}$ , FreqEstLocal with the mechanism $\mathcal{A}=\textsf{GKRR}_{\alpha_{0}}$ satisfies $(\alpha,\delta)$ - $d_{\textsf{EM}}$ DP in the local model and returns an estimator $\tilde{F}$ such that

\max_{K}\mathbb{E}[d_{\textsf{EM}}(\tilde{F},\tilde{K})]\leq r\sqrt{\frac{st^{% 3}}{mn}}\left(\frac{e^{\alpha_{0}}+s}{e^{\alpha_{0}}-e^{(1-r)\alpha_{0}}}% \right)\\ +\sqrt{\frac{s^{2}t^{2}}{mn}}\left(\frac{\sqrt{s+2(e^{\alpha_{0}}-1)}}{e^{% \alpha_{0}}+(t-1)e^{(1-r)\alpha_{0}}-t}\right),

where $\alpha_{0}$ is defined in Eq. (6).

Remarks: Specifically, for our choice of $\alpha=\frac{\varepsilon}{r}$ , we have

\max_{K}\mathbb{E}[d_{\textsf{EM}}(\tilde{K},\tilde{F})]\leq 4\sqrt{\tfrac{k^{% 3}}{mn}}+64\tfrac{\sqrt{k^{3}}}{\alpha\sqrt{n}}\sqrt{\ln(4m\exp(\alpha)/\delta% )}.

Similar to Lemma 6.4, the $\sqrt{\frac{k^{3}}{mn}}$ term is the cost of sampling. The $r\frac{\sqrt{k^{3}}}{\varepsilon\sqrt{n}}$ term is the cost of privacy, and it dominates when $\alpha\leq\sqrt{m}$ . We will compare Thm. 6.6 with Lemma 6.4 when $k,\varepsilon,\alpha<\sqrt{m}$ —then the cost of privacy dominates. Specifically, the cost of Lemma 6.4 is $O(\sqrt{\frac{k^{2}\ln(m/\delta)}{n\varepsilon^{2}}})$ , and the cost of Thm. 6.6 is $O(\sqrt{\frac{k^{3}}{\alpha^{2}n}\max\{\ln(\frac{m}{\delta}),\alpha\}})$ . Given $\varepsilon$ , the error will be smaller if

\alpha>\begin{cases}\varepsilon\sqrt{k}&\varepsilon<\frac{1}{\sqrt{k}}\ln(% \frac{m}{\delta})\\ \varepsilon^{2}\frac{k}{\ln(m/\delta)}&\text{otherwise}\end{cases}

i.e. if there is a gap between $\alpha,\varepsilon$ of size at least $\sqrt{k}$ . This is possible if $k\ll\frac{1}{r}$ , and for these instances $d_{\textsf{EM}}$ DP offers better utility than user-level DP. In Thm. 6.6, the super-linear factor of $k^{3/2}$ comes from the fact that the $k$ -RR is suboptimal in terms of $k$ (Acharya et al., 2018).

6.2.2. Algorithms in the Central Model

The Laplace mechanism has been shown to be optimal for many instances of frequency estimation (Dwork et al., 2014). To attain user-level privacy, the baseline Laplace mechanism releases, for each $x\in\mathcal{X}$ , the values $F_{x}=\tilde{K}_{G}(x)+Y$ , where $Y\sim Lap(\frac{1}{n\varepsilon})$ . The distribution function $\tilde{F}$ is then the normalization of $\langle F_{x}:x\in\mathcal{X}\rangle$ . This gives us the following guarantees.

Lemma 6.7.

For the metric space $\mathcal{X}=\mathcal{B}\times\mathcal{C}$ , the Laplace mechanism described above satisfies $(\varepsilon,0)$ -user level DP, and produces an estimate $\tilde{F}$ such that

\max_{K}\mathbb{E}[d_{\textsf{EM}}(\tilde{K},\tilde{F})]\leq O\left(\tfrac{k}{% n\varepsilon}\right).

Again, this utility does not depend on $m$ , since each user contributes $\frac{1}{n}$ fraction of the whole dataset which is independent of $m$ . Consistent with central DP, the error decreases with $\frac{1}{n}$ , which is much faster than the $\frac{1}{\sqrt{n}}$ in the local model.

It is possible to adapt FreqEstLocal to bounded central $d_{\textsf{EM}}$ -DP by simply pretending to be one user who holds the global dataset $K_{G}$ . The privacy analysis of Thm. 4.4, and the utility analysis of Thm. 6.6 may be combined for the following corollary:

Corollary 6.8.

For the metric space $\mathcal{X}=\mathcal{B}\times\mathcal{C}$ , FreqEstLocal with the mechanism $\mathcal{A}=\textsf{GKRR}_{\alpha_{0}}$ with $\alpha_{0}$ given in (7) satisfies $(\alpha,\delta)$ - $d_{\textsf{EM}}$ DP in the central model and returns an estimator $\tilde{F}$ with error given in (7).

Remarks. In particular

\max_{K}\mathbb{E}[d_{\textsf{EM}}(\tilde{F},\tilde{K})]\leq 4\tfrac{\sqrt{k^{% 3}}}{\sqrt{mn}}+64\tfrac{\sqrt{k^{3}}}{\alpha n}\sqrt{\ln(4m\exp(\alpha)/% \delta)}.

The same sampling error is present, but the cost of privacy is reduced from a $\frac{1}{\sqrt{n}}$ dependence in Thm. 6.6 to just $\frac{1}{n}$ . To compare just the cost of privacy in Corollary 6.6 to Lemma 6.7, we will assume we are in the regime $n\leq\frac{m}{\alpha}$ . Then, the cost in Corollary 6.8 is $O(\frac{\sqrt{k^{3}}}{\alpha n})\sqrt{\max\{\ln(\frac{m}{\delta}),\alpha\}}$ . The error of Corollary 6.8 will be less when

\alpha\geq\begin{cases}\varepsilon\sqrt{k\ln(\frac{m}{\delta})}&\varepsilon% \leq\sqrt{\frac{\ln(m/\delta)}{k}}\\ \varepsilon^{2}k&\text{otherwise}\end{cases}

Thus, the utility is improved when $\alpha$ is bigger than $\varepsilon$ by a factor of at least $\sqrt{k}$ , which is achieved when $k\ll\frac{1}{r}$ . One final advantage of Corollary 6.8 is that it may be implemented in the shuffle model of DP, which requires less trust than the central model. This parallels prior results of the shuffle model of DP (Feldman et al., 2022).

7. Related Work

Item-level DP. DP was originally considered at the item-level (Dwork, 2006), where a privacy guarantee is made when one item in the sensitive dataset is changed. Of the most relevance to our setting are results in distribution estimation (Hay et al., 2009; Xu et al., 2013; Suresh, 2019); these results study more complex estimation problems than frequency. Also, we consider linear query release, for which there is a long line of work (Hardt and Talwar, 2010; Bhaskara et al., 2012; Nikolov et al., 2013; Blum et al., 2013; Li et al., 2015). The mechanism in (Hardt and Talwar, 2010) is often optimal and easy to adapt to our setting; we compare our algorithms with it.
User-level DP. With a vast increase data collected about users, user-level privacy is gaining more interest (Amin et al., 2019; Narayanan et al., 2022; Bassily and Sun, 2023; Levy et al., 2021; Liu et al., 2020; Cummings et al., 2022). The most relevant work to ours on user-level private mean estimation (Cummings et al., 2022) and histogram estimation (Liu et al., 2023; Acharya et al., 2023), though these problems are again more complex than the ones we study. Another related area is the problem of deciding the amount of data to pick from each user in cases where the users have different amounts of data (Amin et al., 2019; Liu et al., 2023; Cummings et al., 2022), which is related to our unbounded DP setup. These techniques apply to more specialized settings than our general blackbox reduction, and they are not immediately comparable.
Local DP. Local DP has also received lots of attention recently. The most results to our work is locally-private linear query release (Duchi et al., 2013; Bassily, 2019) and distribution estimation (Duchi et al., 2013; Kairouz et al., 2016; Acharya et al., 2018; Chen et al., 2020; Acharya et al., 2023). We directly compare our work to the optimal algorithms in (Bassily, 2019) and (Chen et al., 2020) for our problems, which can be adapted to user-level DP easily. The other related line of work is privacy amplification from the local model to the central model, given access to a trusted shuffler (Erlingsson et al., 2019; Girgis et al., 2021; Feldman et al., 2022). We extend the state-of-the-art analysis in (Feldman et al., 2022) to general metric DP.
Metric DP. Metric DP was first proposed in (Chatzikokolakis et al., 2013) in the central model. In the local model, this has led to work on releasing numeric data (Roy Chowdhury et al., 2022), location data (Andrés et al., 2013; Bordenabe et al., 2014; Chatzikokolakis et al., 2015; Weggenmann and Kerschbaum, 2021) and text (Feyisetan et al., 2019, 2020; Feyisetan and Kasiviswanathan, 2021; Imola et al., 2022). Unlike these works, we consider privacy in a general metric space. The most related work is that of (Fernandes et al., 2019), which proposes metric DP based on the $d_{\textsf{EM}}$ for releasing text embeddings. As explained in the introduction, we consider a much more general setting than (Fernandes et al., 2019).

8. Conclusion

We have proposed metric DP at the user level using the earth-mover’s distance $d_{\textsf{EM}}$ . This captures both the magnitude and structural aspects of changes in the data, resulting in a tailored privacy semantic. We have designed two novel privacy mechanisms under $d_{\textsf{EM}}$ -DP which improves the utility over standard DP. Additionally, we have shown that general (unbounded) $d_{\textsf{EM}}$ -DP can be reduced to the simpler case (bounded) where all users have the same amount of data. Finally, we have demonstrated that $d_{\textsf{EM}}$ -DP .

References

(1)
Abowd (2018) John M Abowd. 2018. The US Census Bureau adopts differential privacy. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2867–2867.
Acharya et al. (2023) Jayadev Acharya, Yuhan Liu, and Ziteng Sun. 2023. Discrete distribution estimation under user-level local differential privacy. In International Conference on Artificial Intelligence and Statistics. PMLR, 8561–8585.
Acharya et al. (2018) Jayadev Acharya, Ziteng Sun, and Huanyu Zhang. 2018. Communication Efficient, Sample Optimal, Linear Time Locally Private Discrete Distribution Estimation. CoRR abs/1802.04705 (2018). arXiv:1802.04705 http://arxiv.longhoe.net/abs/1802.04705
Alvim et al. (2018) Mário Alvim, Konstantinos Chatzikokolakis, Catuscia Palamidessi, and Anna Pazii. 2018. Invited Paper: Local Differential Privacy on Metric Spaces: Optimizing the Trade-Off with Utility. In 2018 IEEE 31st Computer Security Foundations Symposium (CSF). 262–267. https://doi.org/10.1109/CSF.2018.00026
Amin et al. (2019) Kareem Amin, Alex Kulesza, Andres Munoz, and Sergei Vassilvtiskii. 2019. Bounding user contributions: A bias-variance trade-off in differential privacy. In International Conference on Machine Learning. PMLR, 263–271.
Andrés et al. (2013) Miguel E Andrés, Nicolás E Bordenabe, Konstantinos Chatzikokolakis, and Catuscia Palamidessi. 2013. Geo-indistinguishability: Differential privacy for location-based systems. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security. 901–914.
Balle et al. (2018) Borja Balle, Gilles Barthe, and Marco Gaboardi. 2018. Privacy amplification by subsampling: Tight analyses via couplings and divergences. Advances in neural information processing systems 31 (2018).
Barthe and Olmedo (2013) Gilles Barthe and Federico Olmedo. 2013. Beyond differential privacy: Composition theorems and relational logic for f-divergences between probabilistic programs. In International Colloquium on Automata, Languages, and Programming. Springer, 49–60.
Bassily (2019) Raef Bassily. 2019. Linear queries estimation with local differential privacy. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 721–729.
Bassily and Sun (2023) Raef Bassily and Ziteng Sun. 2023. User-level private stochastic convex optimization with optimal rates. In International Conference on Machine Learning. PMLR, 1838–1851.
Bhaskara et al. (2012) Aditya Bhaskara, Daniel Dadush, Ravishankar Krishnaswamy, and Kunal Talwar. 2012. Unconditional differentially private mechanisms for linear queries. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing. 1269–1284.
Blum et al. (2013) Avrim Blum, Katrina Ligett, and Aaron Roth. 2013. A learning theory approach to noninteractive database privacy. Journal of the ACM (JACM) 60, 2 (2013), 1–25.
Bordenabe et al. (2014) Nicolás E. Bordenabe, Konstantinos Chatzikokolakis, and Catuscia Palamidessi. 2014. Optimal Geo-Indistinguishable Mechanisms for Location Privacy. Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (Nov. 2014), 251–262. https://doi.org/10.1145/2660267.2660345 arXiv: 1402.5029.
Chatzikokolakis et al. (2013) Konstantinos Chatzikokolakis, Miguel E Andrés, Nicolás Emilio Bordenabe, and Catuscia Palamidessi. 2013. Broadening the scope of differential privacy using metrics. In PETS.
Chatzikokolakis et al. (2015) Konstantinos Chatzikokolakis, Catuscia Palamidessi, and Marco Stronati. 2015. Constructing elastic distinguishability metrics for location privacy. arXiv preprint arXiv:1503.00756 (2015).
Chen et al. (2020) Wei-Ning Chen, Peter Kairouz, and Ayfer Ozgur. 2020. Breaking the communication-privacy-accuracy trilemma. Advances in Neural Information Processing Systems 33 (2020), 3312–3324.
Cormode et al. (2018) Graham Cormode, Somesh Jha, Tejas Kulkarni, Ninghui Li, Divesh Srivastava, and Tianhao Wang. 2018. Privacy at scale: Local differential privacy in practice. In Proceedings of the 2018 International Conference on Management of Data. 1655–1658.
Csiszár (1975) Imre Csiszár. 1975. I-divergence geometry of probability distributions and minimization problems. The annals of probability (1975), 146–158.
Cummings et al. (2022) Rachel Cummings, Vitaly Feldman, Audra McMillan, and Kunal Talwar. 2022. Mean estimation with user-level privacy under data heterogeneity. Advances in Neural Information Processing Systems 35 (2022), 29139–29151.
Duchi et al. (2013) John C Duchi, Michael I Jordan, and Martin J Wainwright. 2013. Local privacy, data processing inequalities, and minimax rates. arXiv preprint arXiv:1302.3203 (2013).
Dwork (2006) Cynthia Dwork. 2006. Differential privacy. In International colloquium on automata, languages, and programming. Springer, 1–12.
Dwork et al. (2014) Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9, 3–4 (2014), 211–407.
Erlingsson et al. (2019) Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. 2019. Amplification by shuffling: From local to central differential privacy via anonymity. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 2468–2479.
Erlingsson et al. (2014) Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security. 1054–1067.
Feldman et al. (2022) Vitaly Feldman, Audra McMillan, and Kunal Talwar. 2022. Hiding among the clones: A simple and nearly optimal analysis of privacy amplification by shuffling. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 954–964.
Fernandes et al. (2019) Natasha Fernandes, Mark Dras, and Annabelle McIver. 2019. Generalised differential privacy for text document processing. In Principles of Security and Trust: 8th International Conference, POST 2019, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019, Prague, Czech Republic, April 6–11, 2019, Proceedings 8. Springer International Publishing, 123–148.
Feyisetan et al. (2020) Oluwaseyi Feyisetan, Borja Balle, Thomas Drake, and Tom Diethe. 2020. Privacy-and utility-preserving textual analysis via calibrated multivariate perturbations. In Proceedings of the 13th international conference on web search and data mining. 178–186.
Feyisetan et al. (2019) Oluwaseyi Feyisetan, Tom Diethe, and Thomas Drake. 2019. Leveraging hierarchical representations for preserving privacy and utility in text. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 210–219.
Feyisetan and Kasiviswanathan (2021) Oluwaseyi Feyisetan and Shiva Kasiviswanathan. 2021. Private release of text embedding vectors. In Proceedings of the First Workshop on Trustworthy Natural Language Processing. 15–27.
Girgis et al. (2021) Antonious M Girgis, Deepesh Data, Suhas Diggavi, Ananda Theertha Suresh, and Peter Kairouz. 2021. On the renyi differential privacy of the shuffle model. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 2321–2341.
Givens and Shortt (1984) Clark R Givens and Rae Michael Shortt. 1984. A class of Wasserstein metrics for probability distributions. Michigan Mathematical Journal 31, 2 (1984), 231–240.
Hardt and Talwar (2010) Moritz Hardt and Kunal Talwar. 2010. On the geometry of differential privacy. In Proceedings of the forty-second ACM symposium on Theory of computing. 705–714.
Hay et al. (2009) Michael Hay, Vibhor Rastogi, Gerome Miklau, and Dan Suciu. 2009. Boosting the accuracy of differentially-private histograms through consistency. arXiv preprint arXiv:0904.0942 (2009).
Imola et al. (2022) Jacob Imola, Shiva Kasiviswanathan, Stephen White, Abhinav Aggarwal, and Nathanael Teissier. 2022. Balancing utility and scalability in metric differential privacy. In Uncertainty in Artificial Intelligence. PMLR, 885–894.
Kairouz et al. (2016) Peter Kairouz, Keith Bonawitz, and Daniel Ramage. 2016. Discrete distribution estimation under local privacy. In International Conference on Machine Learning. PMLR, 2436–2444.
Kairouz et al. (2015) Peter Kairouz, Sewoong Oh, and Pramod Viswanath. 2015. The composition theorem for differential privacy. In International conference on machine learning. PMLR, 1376–1385.
Konig (2001) Dénes Konig. 2001. Theorie der endlichen und unendlichen Graphen. Vol. 72. American Mathematical Soc.
Levy et al. (2021) Daniel Levy, Ziteng Sun, Kareem Amin, Satyen Kale, Alex Kulesza, Mehryar Mohri, and Ananda Theertha Suresh. 2021. Learning with User-Level Privacy. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 12466–12479. https://proceedings.neurips.cc/paper_files/paper/2021/file/67e235e7f2fa8800d8375409b566e6b6-Paper.pdf
Li et al. (2015) Chao Li, Gerome Miklau, Michael Hay, Andrew McGregor, and Vibhor Rastogi. 2015. The matrix mechanism: optimizing linear counting queries under differential privacy. The VLDB journal 24 (2015), 757–781.
Li et al. (2016) N. Li, M. Lyu, D. Su, and W. Yang. 2016. Differential Privacy: From Theory to Practice. Morgan and Claypool. https://ieeexplore.ieee.org/document/7731575
Liu et al. (2020) Yuhan Liu, Ananda Theertha Suresh, Felix Xinnan X Yu, Sanjiv Kumar, and Michael Riley. 2020. Learning discrete distributions: user vs item-level privacy. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 20965–20976. https://proceedings.neurips.cc/paper_files/paper/2020/file/f06edc8ab534b2c7ecbd4c2051d9cb1e-Paper.pdf
Liu et al. (2023) Yuhan Liu, Ananda Theertha Suresh, Wennan Zhu, Peter Kairouz, and Marco Gruteser. 2023. Algorithms for bounding contribution for histogram estimation under user-level privacy. In International Conference on Machine Learning. PMLR, 21969–21996.
Narayanan et al. (2022) Shyam Narayanan, Vahab Mirrokni, and Hossein Esfandiari. 2022. Tight and robust private mean estimation with few users. In International Conference on Machine Learning. PMLR, 16383–16412.
Nikolov et al. (2013) Aleksandar Nikolov, Kunal Talwar, and Li Zhang. 2013. The geometry of differential privacy: the sparse and approximate cases. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing. 351–360.
Roy Chowdhury et al. (2022) Amrita Roy Chowdhury, Bolin Ding, Somesh Jha, Weiran Liu, and **gren Zhou. 2022. Strengthening Order Preserving Encryption with Differential Privacy. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (Los Angeles, CA, USA) (CCS ’22). Association for Computing Machinery, New York, NY, USA, 2519–2533. https://doi.org/10.1145/3548606.3560610
Suresh (2019) Ananda Theertha Suresh. 2019. Differentially private anonymized histograms. Advances in Neural Information Processing Systems 32 (2019).
Weggenmann and Kerschbaum (2021) Benjamin Weggenmann and Florian Kerschbaum. 2021. Differential privacy for directional data. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 1205–1222.
Xu et al. (2013) Jia Xu, Zhenjie Zhang, Xiaokui Xiao, Yin Yang, Ge Yu, and Marianne Winslett. 2013. Differentially private histogram publication. The VLDB journal 22 (2013), 797–822.

Appendix A Omitted Technical Details

An alternative characterization of differential privacy is through the hockey-stick divergence (Barthe and Olmedo, 2013). For probability distributions $P,Q$ defined on a space $\mathcal{Y}$ , this is given by the following:

Definition A.1.

Let $\varepsilon,\delta>0$ , and let $P,Q$ be distributions on a space $\mathcal{Y}$ . The Hockey Stick Divergence is given by

D_{e^{\varepsilon}}(P\|Q)=\int_{\mathcal{Y}}\max\left\{\frac{P(y)}{Q(y)}-e^{% \varepsilon},0\right\}Q(y)dy.

It is easy to show that $D_{e^{\varepsilon}}(M(K)\|M(K^{\prime}))\leq\delta$ implies (2), so Definition A.1 provides an alternative way to prove privacy.

Definition A.1 satisfies a number of useful properties. First, because it is an $f$ -divergence (Csiszár, 1975), it satisfies the data-processing inequality: for any function $f$ , we have

D_{e^{\varepsilon}}(f(P)\|f(Q))\leq D_{e^{\varepsilon}}(f(P)\|f(Q)).

This property is used to show that DP is invariant to post-processing by any function $f$ . The second property, again holding for all $f$ -divergences, is convexity. This states that for two pairs of distributions $P_{1},P_{2},Q_{1},Q_{2}\in\Delta^{\mathcal{Y}}$ and a real number $\lambda\in[0,1]$ we have

D_{e^{\varepsilon}}(\lambda P_{1}+(1-\lambda)P_{2}\|\lambda Q_{1}+(1-\lambda)Q% _{2})\\ \leq\lambda D_{e^{\varepsilon}}(P_{1}\|Q_{1})+(1-\lambda)D_{e^{\varepsilon}}(P% _{2}\|Q_{2}).

Stated in terms of couplings, we may generalize convexity as follows:

Lemma A.1.

Suppose $X,Y\in\mathcal{X}$ are random variables with probability distributions $P_{X},P_{Y}\in\Delta^{\mathcal{X}}$ . Suppose $\mathcal{M}:\mathcal{X}\rightarrow\mathcal{Y}$ is a randomized function. Then, for any coupling $C\in\mathcal{C}(P_{X},P_{Y})$ , we have

D_{e^{\varepsilon}}(\mathcal{M}(X)\|\mathcal{M}(Y))\leq\mathbb{E}_{(x,y)\sim C% }[D_{e^{\varepsilon}}(\mathcal{M}(x)\|\mathcal{M}(y))].

Proof.

We may write

	$\displaystyle\mathcal{M}(X)$	$\displaystyle=\sum_{x\in\mathcal{X}}P_{X}(x)\mathcal{M}(x)=\sum_{x,y\in% \mathcal{X}}C(x,y)\mathcal{M}(x)$
	$\displaystyle\mathcal{M}(Y)$	$\displaystyle=\sum_{x,y\in\mathcal{X}}C(x,y)\mathcal{M}(y).$

Applying convexity, we have

D_{e^{\varepsilon}}(\mathcal{M}(X)\|\mathcal{M}(Y))\leq\sum_{x,y\in\mathcal{X}% }C(x,y)D_{e^{\varepsilon}}(\mathcal{M}(x)\|\mathcal{M}(y)),

and the claim follows. ∎

Third, $D_{e^{\varepsilon}}$ satisfies a “weak” triangle inequality (also known as group privacy):

Lemma A.2.

For distributions $P,Q,R$ on $\mathcal{Y}$ , we have $D_{e^{\alpha+\beta}}(P\|R)\leq D_{e^{\alpha}}(P\|Q)+e^{\alpha}D_{e^{\beta}}(Q% \|R)$ .

Proof.

For any $P,Q,\varepsilon$ , we may view $D_{e^{\varepsilon}}(P\|Q)$ through its dual form as

D_{e^{\varepsilon}}(P\|Q)=\sup_{Y\subseteq\mathcal{Y}}(P(Y)-e^{\varepsilon}Q(Y% )).

Thus, let $Y^{*}$ denote the maximal set such that

\displaystyle D_{e^{\alpha+\beta}}(P\|R)=(P(Y^{*})-e^{\alpha+\beta}R(Y^{*})).

We may rewrite this as

	$\displaystyle D_{e^{\alpha+\beta}}(P\\|R)$	$\displaystyle=(P(Y^{})-e^{\alpha}Q(Y^{}))+e^{\alpha}(Q(Y^{})-e^{\beta}R(Y^{% }))$
		$\displaystyle\leq D_{e^{\alpha}}(P\\|Q)+e^{\alpha}D_{e^{\beta}}(Q\\|R),$

showing the claim. ∎

Appendix B Omitted Proofs from Section 4

B.1. Proof of Theorem 4.1

See 4.1 For any two distributions $\tilde{K},\tilde{K}^{\prime}$ , we have

	$\displaystyle q_{f}(K)-q_{f}(K^{\prime})$	$\displaystyle=\mathbb{E}_{x\sim\tilde{K}}[f(x)]-\mathbb{E}_{x\sim\tilde{K}^{% \prime}}[f(x)]$
		$\displaystyle=\sum_{x\in\mathcal{X}}f(x)\tilde{K}(x)-\sum_{x\in\mathcal{X}}f(y% )\tilde{K}^{\prime}(y).$

Let $C(x,y)=\{C_{x}(y)\}_{x\in\mathcal{X}}$ be the minimum-transport coupling between $\tilde{K}$ and $\tilde{K}^{\prime}$ . By Definition 2.4, we have $\tilde{K}^{\prime}(y)=\sum_{x\in\mathcal{X}}C(x,y)$ , and $d_{\textsf{EM}}(\tilde{K},\tilde{K}^{\prime})=\sum_{x,y\in\mathcal{X}}d_{% \mathcal{X}}(x,y)C(x,y)$ . Now, we write

	$\displaystyle\sum_{x\in\mathcal{X}}f(x)\tilde{K}(x)-\sum_{y\in\mathcal{X}}f(y)% \tilde{K}^{\prime}(y)$
	$\displaystyle\qquad=\sum_{x\in\mathcal{X}}f(x)\tilde{K}(x)-\sum_{y\in\mathcal{% X}}f(y)\sum_{x\in\mathcal{X}}C(x,y)$
	$\displaystyle\qquad=\sum_{x\in\mathcal{X}}\left(f(x)-\sum_{y\in\mathcal{X}}f(y% )C_{x}(y)\right)\tilde{K}(x)$
	$\displaystyle\qquad=\sum_{x\in\mathcal{X}}\left(\sum_{y\in\mathcal{X}}f(x)C_{x% }(y)-\sum_{y\in\mathcal{X}}f(y)C_{x}(y)\right)\tilde{K}(x)$
	$\displaystyle\qquad=\sum_{x\in\mathcal{X}}\sum_{y\in\mathcal{X}}\left(f(x)-f(y% )\right)C_{x}(y)\tilde{K}(x)$
	$\displaystyle\qquad=\sum_{x,y\in\mathcal{X}}\left(f(x)-f(y)\right)C(x,y).$

By the triangle inequality and the fact that $f$ is $\ell$ -Lipschitz, we may write

	$\displaystyle\\|q_{f}(K)-q_{f}(K^{\prime})\\|$	$\displaystyle\leq\sum_{x,y\in\mathcal{X}}\\|f(x)-f(y)\\|C(x,y)$
		$\displaystyle\leq\sum_{x,y\in\mathcal{X}}\ell d_{\mathcal{X}}(x,y)C(x,y)$
		$\displaystyle=\ell d_{\textsf{EM}}(\tilde{K},\tilde{K}^{\prime}).$

The last equation tells us that $\Delta_{d_{\textsf{EM}}}(q_{f})\leq\ell$ .

B.2. Proof of Lemma 4.2

See 4.2 In the local model, by Theorem 4.1, we have $\|q_{f}(K)-q_{f}(K^{\prime})\|\leq\ell$ . By adding noise drawn from $\Gamma(d,\frac{1}{\alpha})$ , it is known this satisfies $(\alpha,0)$ -DP (Hardt and Talwar, 2010). In the bounded central setting, we have $\|q_{f}(K)-q_{f}(K^{\prime})\|\leq\frac{\ell}{n}$ , and thus we may add noise drawn from $\Gamma(d,\frac{1}{n\alpha})$ .

B.3. Proof of Theorem 4.3

See 4.3 We will first assume the following lemma:

Lemma B.1.

Suppose that $\mathcal{A}$ is an $\alpha_{0}d_{\mathcal{X}}$ -metric DP algorithm, where $d_{\mathcal{X}}\leq 1$ . Let $x_{1}^{0},x_{1}^{1},x_{2},\ldots,x_{m}\in\mathcal{X}$ be a set of inputs such that $d_{\mathcal{X}}(x_{1}^{0},x_{1}^{1})\leq d$ , and let $\delta>0$ be a constant such that $\alpha_{0}\leq\ln(\frac{m}{16\ln(2/\delta)})$ . Then, we have that

D_{e^{\alpha}}(\mathsf{Shuffle}(\mathcal{A}(x_{1}^{0}),\ldots,\mathcal{A}(x_{m% })),\\ \mathsf{Shuffle}(\mathcal{A}(x_{1}^{1}),\mathcal{A}(x_{2}),\ldots,\mathcal{A}(% x_{m})))\leq\delta,

where

\alpha\leq\ln\left(1+\frac{e^{\alpha_{0}d}-1}{e^{\alpha_{0}d}+1}\left(\frac{8% \sqrt{e^{\alpha_{0}}\ln(4/\delta)}}{\sqrt{m}}+\frac{8e^{\alpha_{0}}}{m}\right)% \right).

To prove Theorem 4.3, let

S(\mathbf{x}_{i},\mathbf{x}_{m-i}^{\prime})=\mathsf{Shuffle}(\mathcal{A}(x_{1}% ),\ldots,\mathcal{A}(x_{i}),\mathcal{A}(x_{i+1}^{\prime}),\ldots,\mathcal{A}(x% _{m}^{\prime})).

Let $m^{\prime}=\|v\|_{0}$ , and WLOG suppose that $x_{i}=x_{i}^{\prime}$ for $i>m^{\prime}$ . Our goal is to show that

D_{e^{\alpha}}(S(\mathbf{x}_{m^{\prime}},\mathbf{x}_{0}^{\prime})\|S(\mathbf{x% }_{0},\mathbf{x}_{m^{\prime}}^{\prime}))\leq\delta.

By Lemma B.1, we have for each $1\leq i\leq m^{\prime}$ that

D_{\exp(\alpha(i))}(S(\mathbf{x}_{i-1},\mathbf{x}_{m^{\prime}-i+1}^{\prime})\|% S(\mathbf{x}_{i},\mathbf{x}_{m^{\prime}-i}^{\prime}))\leq\frac{\delta}{m^{% \prime}},

where

\alpha(i)=\ln\left(1+\frac{e^{\alpha_{0}d_{\mathcal{X}}(x_{i},x_{i}^{\prime})}% -1}{e^{\alpha_{0}d_{\mathcal{X}}(x_{i},x_{i}^{\prime})}+1}\left(\frac{8\sqrt{e% ^{\alpha_{0}}\ln(4m^{\prime}/\delta)}}{\sqrt{m}}+\frac{8e^{\alpha_{0}}}{m}% \right)\right).

Applying Lemma A.2 $m^{\prime}$ times, we see

	$\displaystyle D_{\exp(\alpha(1)+\cdots+\alpha(m^{\prime}))}(S(\mathbf{x}_{m^{% \prime}},\mathbf{x}_{0}^{\prime})\\|S(\mathbf{x}_{0},\mathbf{x}_{m^{\prime}}^{% \prime}))$
	$\displaystyle\leq D_{\exp(\alpha(m^{\prime}))}(S(\mathbf{x}_{m^{\prime}},% \mathbf{x}_{0}^{\prime})\\|S(\mathbf{x}_{m^{\prime}-1},\mathbf{x}_{1}^{\prime}))$
	$\displaystyle+e^{\alpha(m^{\prime})}D_{\exp(\alpha(m^{\prime}-1))}(S(\mathbf{x% }_{m^{\prime}-1},\mathbf{x}_{1}^{\prime})\\|S(\mathbf{x}_{m^{\prime}-2},\mathbf% {x}_{2}^{\prime}))$
	$\displaystyle+\cdots$
	$\displaystyle+e^{\alpha(2)+\cdots+\alpha(m^{\prime})}D_{\exp(\alpha(1))}(S(% \mathbf{x}_{1},\mathbf{x}_{m^{\prime}-1}^{\prime})\\|S(\mathbf{x}_{0},\mathbf{x% }_{m^{\prime}}^{\prime}))$
	$\displaystyle\leq e^{\alpha(1)+\cdots+\alpha(m^{\prime})}\sum_{i=1}^{m^{\prime% }}D_{\exp(\alpha(i))}(S(\mathbf{x}_{i-1},\mathbf{x}_{m^{\prime}-i+1}^{\prime})% \\|S(\mathbf{x}_{i},\mathbf{x}_{m^{\prime}-i}^{\prime}))$
	$\displaystyle\leq e^{\alpha(1)+\cdots+\alpha(m^{\prime})}\delta.$

We now show that $\alpha(i)$ is a concave function of $d_{\mathcal{X}}(x_{i},x_{i}^{\prime})$ ; to do this we write $\alpha(i)=f(d)=\ln(1+g(d)K)$ , where $g(d)=\frac{e^{d}-1}{e^{d}+1}$ and $K>0$ is a suitable constant. We will show that $f^{\prime\prime}(d)\leq 0$ . Taking derivatives, it is easy to show that $f^{\prime\prime}(d)$ has the same sign as $(1+Kg(d))g^{\prime\prime}(d)-Kg^{\prime}(d)^{2}$ . Thus, we will show that $(1+Kg(d))g^{\prime\prime}(d)\leq Kg^{\prime}(d)^{2}$ . We may write

	$\displaystyle g(d)=1-\frac{2}{e^{d}+1}$
	$\displaystyle g^{\prime}(d)=\frac{2e^{d}}{(e^{d}+1)^{2}}$
	$\displaystyle g^{\prime\prime}(d)=2\frac{(e^{d}+1)^{2}e^{d}-2e^{d}(e^{d}+1)e^{% d}}{(e^{d}+1)^{4}}=2\frac{e^{d}-e^{2d}}{(e^{d}+1)^{3}}.$

Now, we have

	$\displaystyle(1+Kg(d))g^{\prime\prime}(d)\leq Kg^{\prime}(d)^{2}$
	$\displaystyle\Longleftrightarrow(1+K-\frac{2K}{e^{d}+1})2\frac{e^{d}-e^{2d}}{(% e^{d}+1)^{3}}\leq K\frac{4e^{2d}}{(e^{d}+1)^{4}}$
	$\displaystyle\Longleftrightarrow((e^{d}+1)(K+1)-2K)(1-e^{d})\leq 2Ke^{d}$
	$\displaystyle\Longleftrightarrow(Ke^{d}-K+e^{d}+1)(1-e^{d})\leq 2Ke^{d}$
	$\displaystyle\Longleftrightarrow Ke^{d}-K+1-Ke^{2d}+Ke^{d}-e^{2d}\leq 2Ke^{d}$
	$\displaystyle\Longleftrightarrow-K+1-Ke^{2d}-e^{2d}\leq 0$

We are done by observing that $1-e^{2d}\leq 0$ , and $-K-Ke^{2d}\leq 0$ . Having shown convexity, we establish the maximum occurs when each $\alpha(i)$ is equal to $\frac{\|v\|_{1}}{\|v\|_{0}}$ . This gives us a bound of

\alpha(1)+\cdots+\alpha(m^{\prime})\\ \leq\|v\|_{0}\ln\left(1+\frac{e^{\alpha_{0}\|v\|_{1}/\|v\|_{0}}-1}{e^{\alpha_{% 0}\|v\|_{1}/\|v\|_{0}}+1}\left(\frac{8\sqrt{e^{\alpha_{0}}\ln(4\|v\|_{0}/% \delta)}}{\sqrt{m}}+\frac{8e^{\alpha_{0}}}{m}\right)\right).

B.4. Proof of Lemma B.1

This lemma can be viewed as a generalization of amplification by shuffling, which has the same setup but sets $d=1$ and merely requires that $\mathcal{M}$ satisfy $\varepsilon$ -local DP. We generalize the approach of Feldman et al. (2022), starting with the the following preliminary claims.

B.4.1. Preliminary Lemmas

Lemma B.2.

(Generalization of Lemma 3.3 in Feldman et al. (2022)). Let $X=\{x_{1}^{0},x_{1}^{1},x_{2}\ldots,x_{m}\}$ be a set of indices, and for $x\in X$ , let $R(x),Q(x)$ be two families of distributions and $\alpha\in[0,1],\beta\in[0,\frac{1}{2}]$ be coefficients such that

	$\displaystyle R(x_{1}^{0})=(1-\alpha)Q(x_{1}^{0})+\alpha Q(x_{1}^{1})$
	$\displaystyle R(x_{1}^{1})=\alpha Q(x_{1}^{0})+(1-\alpha)Q(x_{1}^{1})$
	$\displaystyle R(x_{j})=\beta Q(x_{1}^{0})+\beta Q(x_{1}^{1})+(1-2\beta)Q(x_{j}% )~{}~{}~{}\forall j\geq 2.$

Then, there exists a post-processing mechanism $\mathcal{S}$ such that

	$\displaystyle\mathsf{Shuffle}(R(x_{1}^{0}),R(x_{2}),\ldots R(x_{m}))$	$\displaystyle=\mathcal{S}(A+1-\Delta,C-A+\Delta)\ \ \ \ \ \ \ \text{and}$
	$\displaystyle\mathsf{Shuffle}(R(x_{1}^{1}),R(x_{2}),\ldots,R(x_{m}))$	$\displaystyle=\mathcal{S}(A+\Delta,C-A+1-\Delta),$

where $C\sim\text{Bin}(s-1,2\beta)$ , $A\sim\text{Bin}(C,\frac{1}{2})$ , and $\Delta\sim\text{Bernoulli}(\alpha)$ , and $\mathsf{Shuffle}$ is a uniformly random shuffle.

Proof.

Let $Y_{1}^{0},Y_{1}^{1},Y_{2},\ldots,Y_{m}$ be distributions where $Y_{1}^{b}$ is defined over $\{0,1\}$ and satisfies $Y_{1}^{0}(0)=1-\alpha$ and $Y_{1}^{1}(1)=\alpha$ (with reversed probabilities if $b=1$ ), and $Y_{j}$ for $j\geq 2$ is defined over $\{0,1,2\}$ and satisfies $Y_{j}(0)=Y_{j}(1)=\beta$ and $Y_{j}(2)=1-2\beta$ . Let $F$ be a function returning a distribution satisfying

F_{j}(v)=\begin{cases}Q(x_{1}^{0})&v=0\\ Q(x_{1}^{1})&v=1\\ Q(x_{j})&\text{otherwise}\end{cases}

Observe that by definition, the following probability distributions are equal for $b\in\{0,1\}$ :

R(x_{1}^{b}),R(x_{2}),\ldots,R(x_{m})=F_{1}(Y_{1}^{b}),F_{2}(Y_{2}),\ldots,F_{% m}(Y_{m}).

Let $\textbf{0}(Y_{1},\ldots,Y_{m})$ denote the number of indices $j$ such that $Y_{j}=0$ , and define $\textbf{1}(Y_{1},\ldots,Y_{m})$ similarly. We will show that there exists a post-processing function $\mathcal{S}$ such that, for both $b\in\{0,1\}$ , we have

(9)

\mathsf{Shuffle}(F_{1}(Y_{1}^{b}),F_{2}(Y_{2}),\ldots,F_{m}(Y_{m}))\\ =\mathcal{S}(\textbf{0}(Y_{1}^{b},\ldots,Y_{m}),\textbf{1}(Y_{1}^{b},\ldots,Y_% {m})).

We will do this by conditioning on the event $E_{u,v}$ that

(\textbf{0}(Y_{1}^{b},Y_{2},\ldots,Y_{m}),\textbf{1}(Y_{1}^{b},Y_{2},\ldots,Y_% {m}))=(u,v),

where $u,v\in\mathbb{N}$ satisfy $1\leq u+v\leq m$ . Now, define the vector $r=\mathsf{Shuffle}(F(Y_{1}),F_{2}(Y_{2}),\ldots,F_{m}(Y_{m}))$ . Conditioned on $E_{u,v}$ , $r$ is distributed according to the following process: First, select a random partition $U\sqcup V\sqcup W=[m]$ such that $|U|=u$ and $|V|=v$ , corresponding to the indices (after shuffling) where $Y_{1}^{b},Y_{2},\ldots,Y_{m}$ are equal to $0,1$ , or $2$ . Next, let $\pi$ be a random injection from $W$ to $[m]\setminus 1$ . Then, $r$ is distributed according to:

(10)		$\displaystyle r(u)=Q(x_{1}^{0})\ \ \ \forall u\in U$
(11)		$\displaystyle r(v)=Q(x_{1}^{1})\ \ \ \forall v\in V$
(12)		$\displaystyle r(w)=Q(x_{\pi(w)})\ \ \ \forall w\in W.$

The above process is independent of $\alpha,\beta$ given $E_{u,v}$ . In particular, it does not care whether we replace $\alpha$ with $1-\alpha$ , and thus it serves as our process $\mathcal{S}$ satisfying (9) for both values of $b$ . Having established this, it is easy to show that $\textbf{0}(Y_{1}^{0},\ldots,Y_{m})=A+1-\Delta$ , $\textbf{1}(Y_{1}^{0},\ldots,Y_{m})=C-A+\Delta$ for $b=0$ , and $\textbf{0}(Y_{1}^{1},\ldots,Y_{m})=A+\Delta$ , $\textbf{1}(Y_{1}^{1},\ldots,Y_{m})=C-A+1-\Delta$ for $b=1$ . ∎

Having reduced the shuffling problem to a divergence between two fixed probability distributions, we follow the method of (Feldman et al., 2022) to compute this divergence. We use the following two results:

Lemma B.3.

(Restatement of Lemma A.1 from (Feldman et al., 2022)): Suppose $p\geq\frac{16\ln(2/\delta)}{m}$ , $C\sim Bin(m-1,p)$ and $A\sim Bin(C,\frac{1}{2})$ . Define $P=(A+1,C-A)$ and $Q=(A,C-A+1)$ . Then, $D_{e^{\varepsilon}}(P\|Q)\leq\delta$ , where

\varepsilon=\ln\left(1+\frac{8\sqrt{\ln(4/\delta)})}{\sqrt{pm}}+\frac{8}{pm}\right)

The next result, advanced joint convexity, originally appeared in the privacy amplification by sampling literature and can be used to improve the parameter $\varepsilon$ when computing $D_{\alpha}(P\|Q)$ between two distributions which are nearly the same.

Lemma B.4.

(Restatement of Theorem 2 from (Balle et al., 2018)) Let $P,Q$ be probability distributions satisfying $P=\nu M+(1-\nu)N$ and $Q=\nu M^{\prime}+(1-\nu)N$ for distributions $M,M^{\prime},N$ and $\nu\in[0,1]$ . Given $\alpha\geq 1$ , define $\alpha^{\prime}=1+\nu(\alpha-1)$ and $\beta=\frac{\alpha^{\prime}}{\alpha}$ . Then,

D_{\alpha^{\prime}}(P\|Q)\leq\nu D_{\alpha}(M\|(1-\beta)N+\beta M^{\prime}).

Finally, we require a result from local DP:

Lemma B.5.

(Restatement of Theorem 2.5 from (Kairouz et al., 2015)) Let $P,Q$ be two distributions and $\alpha\geq 1$ be a parameter such that $D_{\alpha}(P\|Q)=0$ . Then, there exist distributions $M,N$ such that

	$\displaystyle P=\frac{\alpha}{\alpha+1}M+\frac{1}{\alpha+1}N$
	$\displaystyle Q=\frac{1}{\alpha+1}M+\frac{1}{\alpha+1}N.$

With these results in order, we are ready to complete the proof.

B.4.2. Completing the proof of Lemma B.1

Using the definition of $d_{\mathcal{X}}$ -DP and the fact that $d_{\mathcal{X}}\leq 1$ , we have

	$\displaystyle D_{\exp(\varepsilon_{0}d)}(\mathcal{A}(x_{1}^{0})\\|\mathcal{A}(x% _{1}^{1}))=0$
	$\displaystyle D_{\exp(\varepsilon_{0})}(\mathcal{A}(x_{1}^{0})\\|\mathcal{A}(x_% {j}))=0~{}~{}~{}\forall j\geq 2$
	$\displaystyle D_{\exp(\varepsilon_{0})}(\mathcal{A}(x_{1}^{1})\\|\mathcal{A}(x_% {j}))=0~{}~{}~{}\forall j\geq 2.$

Applying Lemma B.5 to the first equation, we obtain

(13)		$\displaystyle\mathcal{A}(x_{1}^{0})=(1-\beta)Q(x_{1}^{0})+\beta Q(x_{1}^{1})$
(14)		$\displaystyle\mathcal{A}(x_{1}^{1})=\beta Q(x_{1}^{0})+(1-\beta)Q(x_{1}^{1})$

where $\beta=\frac{1}{1+\exp(\varepsilon_{0}d)}$ . Applying the lemma to the second and third sets of equations, we obtain

(15)		$\displaystyle\mathcal{A}(x_{1}^{0})=(1-\gamma)R(x_{1}^{0},x_{j})+\gamma R^{% \prime}(x_{1}^{0},x_{j})~{}~{}~{}\forall j\geq 2$
(16)		$\displaystyle\mathcal{A}(x_{j})=\gamma R(x_{1}^{0},x_{j})+(1-\gamma)R^{\prime}% (x_{1}^{0},x_{j})~{}~{}~{}\forall j\geq 2$
(17)		$\displaystyle\mathcal{A}(x_{1}^{1})=(1-\gamma)R(x_{1}^{1},x_{j})+\gamma R^{% \prime}(x_{1}^{1},x_{j})~{}~{}~{}\forall j\geq 2$
(18)		$\displaystyle\mathcal{A}(x_{j})=\gamma R(x_{1}^{1},x_{j})+(1-\gamma)R^{\prime}% (x_{1}^{1},x_{j})~{}~{}~{}\forall j\geq 2.$

where $\gamma=\frac{1}{1+\exp(\varepsilon_{0})}$ . Subtracting 15 and 16, we obtain that

(19)

\displaystyle\mathcal{A}(x_{j})=\frac{\gamma}{1-\gamma}\mathcal{A}(x_{1}^{0})+% \frac{1-2\gamma}{1-\gamma}R^{\prime}(x_{1}^{0},x_{j})~{}~{}~{}\forall j\geq 2,

and likewise 17 and 18 imply

(20)

\displaystyle\mathcal{A}(x_{j})=\frac{\gamma}{1-\gamma}\mathcal{A}(x_{1}^{1})+% \frac{1-2\gamma}{1-\gamma}R^{\prime}(x_{1}^{1},x_{j})~{}~{}~{}\forall j\geq 2.

Taking the average of 19 and 20, we obtain

(21)

\displaystyle\mathcal{A}(x_{j})=\frac{\gamma}{2(1-\gamma)}\mathcal{A}(x_{1}^{0% })+\frac{\gamma}{2(1-\gamma)}\mathcal{A}(x_{1}^{1})+\frac{1-2\gamma}{1-\gamma}% Q(x_{j})~{}~{}~{}\forall j\geq 2,

where $Q(x_{j})=\frac{1}{2}R^{\prime}(x_{1}^{0},x_{j})+\frac{1}{2}R^{\prime}(x_{1}^{1% },x_{j})$ . Now, equations 13 and 14 imply that

\mathcal{A}(x_{1}^{0})+\mathcal{A}(x_{1}^{1})=Q(x_{1}^{0})+Q(x_{1}^{1}).

This implies

(22)

\displaystyle\mathcal{A}(x_{j})=\frac{\gamma}{2(1-\gamma)}Q(x_{1}^{0})+\frac{% \gamma}{2(1-\gamma)}Q(x_{1}^{1})+\frac{1-2\gamma}{1-\gamma}Q(x_{j})~{}~{}~{}% \forall j\geq 2.

Applying Lemma B.2, there exists a function $S$ such that

	$\displaystyle\mathsf{Shuffle}(\mathcal{A}(x_{1}^{0}),\mathcal{A}(x_{2}),\ldots% ,\mathcal{A}(x_{m}))=S(A+1-\Delta,C-A+\Delta)$
	$\displaystyle\mathsf{Shuffle}(\mathcal{A}(x_{1}^{1}),\mathcal{A}(x_{2}),\ldots% ,\mathcal{A}(x_{m}))=S(A+\Delta,C-A+1-\Delta),$

where $C\sim Bin(m-1,\frac{\gamma}{1-\gamma})=Bin(m-1,e^{-{\varepsilon_{0}}})$ , $A\sim Bin(C,\frac{1}{2})$ , and $\Delta\sim Bernoulli(\beta)$ . By the post-processing inequality, we have for any $\alpha\geq 1$ that

D_{\alpha}(\mathsf{Shuffle}(\mathcal{A}(x_{1}^{0}),\mathcal{A}(x_{2}),\ldots,% \mathcal{A}(x_{s}))\|\mathsf{Shuffle}(\mathcal{A}(x_{1}^{1}),\mathcal{A}(x_{2}% ),\\ \ldots,\mathcal{A}(x_{s})))\leq D_{\alpha}((A+1-\Delta,C-A+\Delta)\|(A+\Delta,% C-A+1-\Delta)).

Observe we can write

	$\displaystyle(A+1-\Delta,C-A+\Delta)=(1-\beta)(A+1,C-A)+\beta(A,C-A+1)$
	$\displaystyle(A+\Delta,C-A+1-\Delta)=\beta(A+1,C-A)+(1-\beta)(A,C-A+1).$

Define $X=(A+1,C-A)$ and $Y=(A,C-A+1)$ . We can rewrite the above as

	$\displaystyle(A+1-\Delta,C-A+\Delta)=2\beta\frac{X+Y}{2}+(1-2\beta)X$
	$\displaystyle(A+\Delta,C-A+1-\Delta)=2\beta\frac{X+Y}{2}+(1-2\beta)Y.$

Applying Lemma B.4, we have

D_{\alpha^{\prime}}((A+1-\Delta,C-A+\Delta)\|(A+\Delta,C-A+1-\Delta))\\ \leq(1-2\beta)D_{\alpha}(X\|(1-\eta)(\tfrac{X+Y}{2})+\eta Y),

where $\alpha^{\prime}=1+(1-2\beta)(\alpha-1)$ and $\eta=\frac{\alpha^{\prime}}{\alpha}$ . By convexity, the RHS above is at most

D_{\alpha^{\prime}}((A+1-\Delta,C-A+\Delta)\|(A+\Delta,C-A+1-\Delta))\leq(1-2% \beta)D_{\alpha}(X\|Y).

Now, we finally set $\alpha=1+\frac{8\sqrt{\exp(-\varepsilon_{0})\ln(4/\delta)}}{\sqrt{m}}+\frac{8% \exp(-\varepsilon_{0})}{m}$ . Lemma B.3 (using the assumption that $\varepsilon_{0}\leq\ln(\frac{m}{16\ln(2/\delta)})$ ) implies $D_{\alpha}(X\|Y)\leq\delta$ . From this, we obtain our desired result that

D_{\alpha^{\prime}}(\mathsf{Shuffle}(\mathcal{A}(x_{1}^{0}),\mathcal{A}(x_{2})% ,\ldots,\mathcal{A}(x_{m}))\|\mathsf{Shuffle}(\mathcal{A}(x_{1}^{1}),\mathcal{% A}(x_{2}),\\ \ldots,\mathcal{A}(x_{m})))\leq(1-2\beta)D_{\alpha}(X\|Y)\leq\delta,

where

\alpha^{\prime}=1+\frac{e^{\varepsilon_{0}d}-1}{e^{\varepsilon_{0}d}+1}\left(% \frac{8\sqrt{e^{\varepsilon_{0}}\ln(4/\delta)}}{\sqrt{m}}+\frac{8e^{% \varepsilon_{0}}}{m}\right).

B.5. Proof of Theorem 4.4

See 4.4

First, consider the local model. Fix any two itemsets $K=\{x_{1},\ldots,x_{m}\}$ and $K^{\prime}=\{x_{1},\ldots,x_{m}^{\prime}\}$ such that $d_{\textsf{EM}}(\tilde{K},\tilde{K}^{\prime})\leq w$ . By Lemma 2.1, there exists a permutation $\pi:[m]\rightarrow[m]$ such that

\sum_{i=1}^{m}d_{\mathcal{X}}(x_{i},x_{\pi(i)}^{\prime})=mw.

Let

(23)		$\displaystyle\tilde{L}=\mathrm{Shuffle}(\mathcal{A}(x_{1}),\ldots,\mathcal{A}(% x_{m}))$
(24)		$\displaystyle\tilde{L}^{\prime}=\mathrm{Shuffle}(\mathcal{A}(x_{\pi(i)}^{% \prime}),\ldots,\mathcal{A}(x_{\pi(m)}^{\prime})).$

By Theorem 4.3, we know that $D_{\exp(\alpha(w))}(\tilde{L}\|\tilde{L}^{\prime})\leq\delta e^{\alpha(w)}$ , where $\alpha(w)=h(m;m,mw)$ . The final privacy parameters for a fixed $w$ will be $\frac{\alpha(w)}{w}$ and $\delta e^{\alpha(w)}$ ; the worst-case privacy parameters are thus $\sup_{w\in[0,1]}\frac{\alpha(w)}{w}$ and $\sup_{w\in[0,1]}\delta e^{\alpha(w)}$ . Since $\alpha(w)$ is an increasing function, the latter term reduces to $\delta e^{\alpha(w)}$ .

In the bounded central model, the same logic applies, except that $\tilde{L},\tilde{L}^{\prime}$ have size $mn$ , differ in only $m$ coordinates, and

\sum_{i=1}^{mn}d_{\mathcal{X}}(x_{i},x_{\pi(i)}^{\prime})=mw.

We apply Theorem 4.3 to obtain $D_{\exp(\alpha(w))}(\tilde{L}\|\tilde{L}^{\prime})\leq\delta e^{\alpha(w)}$ , where $\alpha(w)=h(mn;m,mw)$ , and we complete the proof similarly.

Appendix C Omitted Proofs from Section 5

C.1. Proof of Lemma 5.2

See 5.2 For $i=1,\ldots,s$ , define $X_{i}=d_{\mathcal{X}}(x_{i},y_{i})$ , and observe that $d_{\textsf{EM}}(\tilde{L},\tilde{L}^{\prime})\leq\frac{1}{s}(X_{1}+\cdots+X_{s})$ . Now, let $\mu$ denote $d_{\textsf{EM}}(\tilde{K},\tilde{K}^{\prime})$ . Observe each $X_{i}$ is i.i.d. and satisfies $\mathbb{E}[X_{i}]=\mu$ and $0\leq X_{i}\leq 1$ . Due to the last two facts, we have $\mathbb{E}[X_{i}^{2}]\leq\mu$ . By Bernstein’s inequality, we have, for all $t\geq 0$ ,

\Pr\left[X_{1}+\cdots+X_{s}-s\mu\geq t\right]\leq e^{-t^{2}/2(v+bt/3)},

where $v=\sum_{i=1}^{s}\mathbb{E}[X_{i}^{2}]\leq s\mu$ and $b=1$ . By setting

t=\max\{\sqrt{4s\mu\ln(1/\delta)},\tfrac{4}{3}\ln(1/\delta)\},

we ensure that the probability is at most $\delta$ . We have

s\mu+t\leq s\mu+2\sqrt{s\mu\ln(1/\delta)}+\tfrac{4}{3}\ln(1/\delta)\leq(1+% \sqrt{2})s\mu+(\tfrac{4}{3}+\sqrt{2})\ln\tfrac{1}{\delta}.

Finally,

\Pr[d_{\textsf{EM}}(\tilde{L},\tilde{L}^{\prime})\geq(1+\sqrt{2})\mu+\tfrac{3}% {s}\ln\tfrac{1}{\delta}]\leq\\ \Pr[X_{1}+\cdots+X_{s}\geq(1+\sqrt{2})s\mu+3\ln\tfrac{1}{\delta}]\leq\delta.

C.2. Proof of Theorem 5.3

See 5.3 First, we will consider the local model. Let $K,K^{\prime}$ denote two datasets such that $d_{\textsf{EM}}(\tilde{K},\tilde{K}^{\prime})\leq r$ . Let $L$ , $L^{\prime}$ denote the set of $s$ samples when $K$ (resp. $K^{\prime}$ ) is used. Our goal is to show that $D_{\exp(\varepsilon)}(\mathcal{M}(L)\|\mathcal{M}(L^{\prime}))\leq\delta$ . Observe we may define the objects $\mathbf{L},\mathbf{L}^{\prime}\in\Delta^{\mathcal{X}^{s}}$ to be the probability distributions of $L,L^{\prime}$ (which lie in $\mathcal{X}^{s}$ ). By Lemma A.1, for any coupling $C\in\mathcal{C}(\mathbf{L},\mathbf{L}^{\prime})$ , we have

\displaystyle D_{\exp(\varepsilon)}(\mathcal{M}(L)\|\mathcal{M}(L^{\prime}))

\displaystyle\leq\mathbb{E}_{(L,L^{\prime})\sim C}[D_{\exp(\varepsilon)}(% \mathcal{M}(L)\|\mathcal{M}(L^{\prime}))].

Let $A$ denote the event that we have $d_{\textsf{EM}}(\tilde{L},\tilde{L}^{\prime})\leq(1+\sqrt{2})r+\frac{3}{s}\ln% \tfrac{1}{\delta}$ . When $A$ holds, then $D_{\exp(\varepsilon)}(\mathcal{M}(L)\|\mathcal{M}(L^{\prime}))\leq\delta$ by assumption. When this does not hold, then trivially $D_{\exp(\varepsilon)}(\mathcal{M}(L)\|\mathcal{M}(L^{\prime}))\leq 1$ . Conditioning on the above expectation, we have

	$\displaystyle\mathbb{E}_{(L,L^{\prime})\sim C}[D_{\exp(\varepsilon)}(\mathcal{% M}(L)\\|\mathcal{M}(L^{\prime}))]$	$\displaystyle\leq\delta\Pr[A]+\Pr[\overline{A}]$
		$\displaystyle\leq\delta+\Pr[\overline{A}].$

Now, let $C^{*}\in\Delta^{\mathcal{X}\times\mathcal{X}}$ denote the optimal coupling between $\tilde{K},\tilde{K}^{\prime}$ . We will take $C=(C^{*})^{s}\in\Delta^{\mathcal{X}^{s}\times\mathcal{X}^{s}}$ , the $s$ -fold Kronecker product of $C^{*}$ . Observe this is indeed a coupling between $\mathbf{L},\mathbf{L^{\prime}}$ , and each coordinate of $(L,L^{\prime})\sim C$ is simply a sample from $C^{*}$ . Thus, the event $A$ above is equivalent to

\Pr[A]=\Pr_{(L,L^{\prime})\sim(C^{*})^{s}}[d_{\textsf{EM}}(\tilde{L},\tilde{L}% ^{\prime})\leq(1+\sqrt{2})r+\frac{3}{s}\ln\frac{1}{\delta}],

where the notation $(L,L^{\prime})\sim(C^{*})^{s}$ indicates that $L=\{x_{1},\ldots,x_{s}\}$ and $L=\{y_{1},\ldots,y_{s}\}$ , and each $(x_{i},y_{i})\sim C^{*}$ . By Lemma 5.2, we know that $\Pr[A]\geq 1-\delta$ , and thus the above expectation is at most $2\delta$ . This proof may be generalized easily to the central model.

Appendix D Omitted Proofs from Section 6

D.1. Proof of Lemma 6.2

See 6.2 As the sensitivity of the query is bounded by $\|F\|_{2}$ , is easy to show (e.g. (Dwork et al., 2014)) that adding $d$ -dimensional Gaussian noise with width $\|F\|_{2}\frac{r\sqrt{1.25\ln\frac{1}{\delta}}}{\alpha}$ in each coordinate will satisfy $(\frac{\alpha}{r},\delta)$ local $d_{\textsf{EM}}$ -DP. The standard deviation in each coordinate of $\hat{q}$ is thus $\|F\|_{2}\frac{r\sqrt{1.25\ln\frac{1}{\delta}}}{\alpha\sqrt{n}}$ , and this gives the desired expected error.

D.2. Proof of Theorem 6.5

See 6.5 First, we will introduce notation. For a cluster label $b\in\mathcal{B}$ , let $\mathcal{X}[b]\subseteq\mathcal{X}$ denote the elements of $\mathcal{X}$ in cluster $b$ . Define $\tilde{F}[b]\in\mathbb{R}^{\mathcal{B}\times\mathcal{C}}$ to be the indices of $\tilde{F}$ in $\mathcal{X}[b]$ (so that indices outside $\mathcal{X}[b]$ are zeroed out). Define $\tilde{K}[b]$ similarly, and observe that $\tilde{F}[b],\tilde{K}[b]$ are not normalized.

For any estimate $\tilde{F}$ , consider the following transportation plan from $\tilde{F}$ to $\tilde{K}$ : For each $b\in\mathcal{B}$ , transfer $\tilde{F}[b]$ to $\tilde{K}[b]$ arbitrarily, and put any excess weight in the bin $(b,c^{\prime})$ for an arbitrary $c^{\prime}\in\mathcal{C}$ . The cost incurred by this is at most $r\|\tilde{F}[b]-\tilde{K}[b]\|_{1}+r|\mu(\tilde{F}[b])-\mu(\tilde{K}[b])|$ , where $\mu(\cdot)$ denotes total mass of its argument. Finally, equalize the weights in the coordinates $\{(b,c^{\prime}):b\in\mathcal{B}\}$ . The cost incurred for this step is at most $(1-r)\sum_{b\in\mathcal{B}}|\mu(\tilde{F}[b])-\mu(\tilde{K}[b])|$ . Thus, the total cost is

\sum_{b\in\mathcal{B}}r\|\tilde{F}[b]-\tilde{K}[b]\|_{1}+|\mu(\tilde{F}[b])-% \mu(\tilde{K}[b])|\\ =r\|\tilde{F}-\tilde{K}\|_{1}+\sum_{b\in\mathcal{B}}|\mu(\tilde{F}[b])-\mu(% \tilde{K}[b])|.

Observe that the term $\sum_{b\in\mathcal{B}}|\mu(\tilde{F}[b])-\mu(\tilde{K}[b])|$ is simply the $\ell_{1}$ distance between $\tilde{F}P$ and $\tilde{K}P$ , where $P\in\mathbb{R}^{(\mathcal{B}\times\mathcal{C})\times\mathcal{B}}$ is the matrix that maps a vector to its sum along each coordinate in $\mathcal{B}$ . Thus, we may form the the upper bound

	$\displaystyle\mathbb{E}[d_{\textsf{EM}}(\tilde{F},\tilde{K})]$	$\displaystyle\leq r\mathbb{E}[\\|\tilde{F}-\tilde{K}\\|_{1}]+\mathbb{E}[\\|(% \tilde{F}-\tilde{K})P\\|_{1}]$
		$\displaystyle\leq r\mathbb{E}[\sqrt{st}\\|\tilde{F}-\tilde{K}\\|_{2}]+\mathbb{E}% [\sqrt{s}\\|(\tilde{F}-\tilde{K})P\\|_{2}]$
(25)			$\displaystyle\leq r\sqrt{st\mathbb{E}[\\|\tilde{F}-\tilde{K}\\|_{2}^{2}]}+\sqrt{% s\mathbb{E}[\\|(\tilde{F}-\tilde{K})P\\|_{2}^{2}]}.$

Now, we will bound (25) given this estimator. In the following, let $A_{x}$ denote the $x$ th row of the matrix $A$ . Observe that

	$\displaystyle\tilde{F}-\tilde{K}$	$\displaystyle=\frac{1}{mn}\sum_{i=1}^{mn}z_{i}B-\tilde{K}AB$
		$\displaystyle=\frac{1}{mn}\sum_{i=1}^{mn}z_{i}B-\frac{1}{mn}\sum_{i=1}^{mn}e_{% k_{i}}AB$
		$\displaystyle=\frac{1}{mn}\sum_{i=1}^{mn}z_{i}B-\frac{1}{mn}\sum_{i=1}^{mn}A_{% k_{i}}B$
		$\displaystyle=\frac{1}{mn}\sum_{i=1}^{mn}(z_{i}-A_{k_{i}})B$

Define $w_{i}=z_{i}-A_{k_{i}}$ , and notice that $\mathbb{E}[w_{i}]=\mathbb{E}[z_{i}]-A_{k_{i}}=0$ . Thus,

	$\displaystyle\mathbb{E}[\\|\tilde{F}-\tilde{K}\\|_{2}^{2}]$	$\displaystyle=\mathbb{E}[(\tilde{F}-\tilde{K})(\tilde{F}-\tilde{K})^{T}]$
		$\displaystyle=\left(\frac{1}{mn}\right)^{2}\mathbb{E}\left[\left(\sum_{i=1}^{% mn}w_{i}B\right)\left(\sum_{i=1}^{mn}B^{T}w_{i}^{T}\right)\right]$
		$\displaystyle=\left(\frac{1}{mn}\right)^{2}\sum_{i,j=1}^{mn}\mathbb{E}[w_{i}BB% ^{T}w_{j}^{T}]$
		$\displaystyle=\left(\frac{1}{mn}\right)^{2}\sum_{i=1}^{mn}\mathbb{E}[w_{i}BB^{% T}w_{i}^{T}],$

where the last step holds because the $w_{i}$ are independent. Now, we have

	$\displaystyle\mathbb{E}[w_{i}BB^{T}w_{i}^{T}]$	$\displaystyle=\mathbb{E}[z_{i}BB^{T}z_{i}^{T}]-\mathbb{E}[A_{k_{i}}BB^{T}A_{k_% {i}}^{T}]$
		$\displaystyle=\mathbb{E}[z_{i}BB^{T}z_{i}^{T}]-e_{k_{i}}e_{k_{i}}^{T}$
		$\displaystyle\leq\\|B^{T}\\|_{1,2}^{2}-1.$

Putting it all together, we have

\mathbb{E}[\|\tilde{F}-\tilde{K}\|_{2}^{2}]\leq\frac{\|B^{T}\|_{1,2}^{2}-1}{mn}

To control the term $\|(\tilde{F}-\tilde{K})P\|_{2}^{2}$ in (25), using similar steps, we may write

\mathbb{E}[\|(\tilde{F}-\tilde{K})P\|_{2}^{2}]\leq\left(\frac{1}{mn}\right)^{2% }\sum_{i=1}^{mn}\mathbb{E}[w_{i}BPP^{T}B^{T}w_{i}^{T}].

Similarly, for any $i$ we have

	$\displaystyle\mathbb{E}[w_{i}BPP^{T}B^{T}w_{i}^{T}]$	$\displaystyle=\mathbb{E}[z_{i}BPP^{T}B^{T}z_{i}^{T}]-\mathbb{E}[A_{k_{i}}BPP^{% T}BA_{k_{i}}^{T}]$
		$\displaystyle\leq\\|P^{T}B^{T}\\|_{1,2}^{2}-1,$

and this implies

\mathbb{E}[\|(\tilde{F}-\tilde{K})P\|_{2}^{2}]\leq\frac{\|P^{T}B^{T}\|_{1,2}^{% 2}-1}{mn}.

Substituting into (25), we obtain the desired bound.

D.3. Proof of Theorem 6.6

See 6.6 For positive constants $a,b,c$ , the matrix $A$ is given by

A=aI_{\mathcal{X}}+(bI_{\mathcal{B}}+c\textbf{1}_{\mathcal{B}})\otimes\textbf{% 1}_{\mathcal{C}},

where

	$\displaystyle a$	$\displaystyle=\frac{e^{\alpha_{0}}-e^{(1-r)\alpha_{0}}}{e^{\alpha_{0}}+(t-1)e^% {(1-r)\alpha_{0}}+(s-1)t}$
	$\displaystyle b$	$\displaystyle=\frac{e^{(1-r)\alpha_{0}}-1}{e^{\alpha_{0}}+(t-1)e^{(1-r)\alpha_% {0}}+(s-1)t}$
	$\displaystyle c$	$\displaystyle=\frac{1}{e^{\alpha_{0}}+(t-1)e^{(1-r)\alpha_{0}}+(s-1)t}.$

The matrix $A$ is actually invertible, and

A^{-1}=a^{\prime}I_{\mathcal{X}}+(b^{\prime}I_{\mathcal{B}}+c^{\prime}\textbf{% 1}_{\mathcal{B}})\otimes\textbf{1}_{\mathcal{C}},

where

	$\displaystyle a^{\prime}$	$\displaystyle=\frac{e^{\alpha_{0}}+(t-1)e^{(1-r)\alpha_{0}}+(s-1)t}{e^{\alpha_% {0}}-e^{(1-r)\alpha_{0}}}$
	$\displaystyle b^{\prime}$	$\displaystyle=-\frac{(e^{(1-r)\alpha_{0}}-1)(e^{\alpha_{0}}+(t-1)e^{(1-r)% \alpha_{0}}+(s-1)t)}{(e^{\alpha_{0}}-e^{(1-r)\alpha_{0}})(e^{\alpha_{0}}+(t-1)% e^{(1-r)\alpha_{0}}-t)}$
	$\displaystyle c^{\prime}$	$\displaystyle=-\frac{1}{e^{\alpha_{0}}+(t-1)e^{(1-r)\alpha_{0}}-t}.$

It is easy to show the identity that $a^{\prime}+tb^{\prime}+stc^{\prime}=1$ . Each row of $A^{-1}$ looks like one copy of $a^{\prime}+b^{\prime}+c^{\prime}$ , $t-1$ copies of $b^{\prime}+c^{\prime}$ , and $(s-1)t$ copies of $c^{\prime}$ . Thus,

	$\displaystyle\\|(A^{-1})^{T}\\|_{1\rightarrow 2}^{2}-1$
	$\displaystyle=(a^{\prime}+b^{\prime}+c^{\prime})^{2}+(t-1)(b^{\prime}+c^{% \prime})^{2}+(s-1)t(c^{\prime})^{2}-1$
	$\displaystyle\;=(1-(t-1)b^{\prime}-(st-1)c^{\prime})^{2}+(t-1)(b^{\prime})^{2}$
	$\displaystyle\qquad+2(t-1)b^{\prime}c^{\prime}+(t-1)(c^{\prime})^{2}+(s-1)t(c^% {\prime})^{2}-1$
	$\displaystyle\;=-2(t-1)b^{\prime}-2(st-1)c^{\prime}+2(t-1)(st-1)c^{\prime}b^{\prime}$
	$\displaystyle\qquad+(t-1)^{2}(b^{\prime})^{2}+(st-1)^{2}(c^{\prime})^{2}+(t-1)% (b^{\prime})^{2}$
	$\displaystyle\qquad+2(t-1)b^{\prime}c^{\prime}+(t-1)(c^{\prime})^{2}+(s-1)t(c^% {\prime})^{2}$
	$\displaystyle\leq(tb^{\prime})^{2}+2st^{2}b^{\prime}c^{\prime}+(stc^{\prime})^% {2}-2tb^{\prime}-2stc^{\prime}$
	$\displaystyle\leq(tb^{\prime}+stc^{\prime})^{2}-2(tb^{\prime}+stc^{\prime})$
	$\displaystyle=(a^{\prime})^{2}-1.$

Substituting, we obtain

	$\displaystyle(a^{\prime})^{2}-1$	$\displaystyle\leq\left(\frac{te^{(1-r)\alpha_{0}}+(s-1)t}{e^{\alpha_{0}}-e^{(1% -r)\alpha_{0}}}\right)^{2}+2\left(\frac{te^{(1-r)\alpha_{0}}+(s-1)t}{e^{\alpha% _{0}}-e^{(1-r)\alpha_{0}}}\right)$
		$\displaystyle\leq\frac{t^{2}e^{2\alpha_{0}}+2(s-1)t^{2}e^{\alpha_{0}}+(s-1)^{2% }t^{2}+2te^{2\alpha_{0}}+2(s-1)te^{\alpha_{0}}}{(e^{\alpha_{0}}-e^{(1-r)\alpha% _{0}})^{2}}$
		$\displaystyle\leq\left(t\frac{e^{\alpha_{0}}+s}{e^{\alpha_{0}}-e^{(1-r)\alpha_% {0}}}\right)^{2}$

Next, it’s easy to see that

	$\displaystyle A^{-1}P$	$\displaystyle=\left(a^{\prime}I_{\mathcal{X}}+((b^{\prime}I_{\mathcal{B}}+c^{% \prime}\textbf{1}_{\mathcal{B}})\otimes\textbf{1}_{\mathcal{C}})\right)(I_{% \mathcal{B}}\otimes 1_{\mathcal{C}})$
		$\displaystyle=a^{\prime}I_{\mathcal{B}}\otimes 1_{\mathcal{C}}+(b^{\prime}I_{% \mathcal{B}}+c^{\prime}\textbf{1}_{\mathcal{B}})\otimes t1_{\mathcal{C}}$

Each row of the latter consists of one copy of $a^{\prime}+tb^{\prime}+tc^{\prime}$ and $s-1$ copies of $tc^{\prime}$ . This gives us

	$\displaystyle\\|(A^{-1}P)^{T}\\|_{1\rightarrow 2}^{2}-1$	$\displaystyle=(a^{\prime}+tb^{\prime}+tc^{\prime})^{2}+(s-1)(tc^{\prime})^{2}-1$
		$\displaystyle=(1-(s-1)tc^{\prime})^{2}+(s-1)(tc^{\prime})^{2}-1$
		$\displaystyle=s(s-1)(tc^{\prime})^{2}-2(s-1)(tc^{\prime})$
		$\displaystyle\leq(stc^{\prime})^{2}-2(stc^{\prime}).$

Substituting, we obtain

	$\displaystyle(stc^{\prime})^{2}-2(stc^{\prime})$	$\displaystyle=\frac{st(st+2(e^{\alpha_{0}}+(t-1)e^{(1-r)\alpha_{0}}-t))}{(e^{% \alpha_{0}}+(t-1)e^{(1-r)\alpha_{0}}-t)^{2}}$
		$\displaystyle\leq\frac{st^{2}(s+2(e^{\alpha_{0}}-1))}{(e^{\alpha_{0}}+(t-1)e^{% (1-r)\alpha_{0}}-t)^{2}}$

Applying Theorem 6.5, we obtain

	$\displaystyle\mathbb{E}[d_{\textsf{EM}}(\tilde{F},\tilde{K})]$
	$\displaystyle\;\leq r\sqrt{\frac{st((a^{\prime})^{2}-1)}{mn}}+\sqrt{\frac{s^{2% }t(st(c^{\prime})^{2}-2c^{\prime})}{mn}}$
	$\displaystyle\;\leq r\sqrt{\frac{st^{3}}{mn}}\left(\frac{e^{\alpha_{0}}+s}{e^{% \alpha_{0}}-e^{(1-r)\alpha_{0}}}\right)+\sqrt{\frac{s^{2}t^{2}}{mn}}\left(% \frac{\sqrt{s+2(e^{\alpha_{0}}-1)}}{e^{\alpha_{0}}+(t-1)e^{(1-r)\alpha_{0}}-t}% \right),$

finishing the claim. To obtain an asymptotic bound (with budget $\alpha=\varepsilon/r$ ), we plug in (6), which says that we may set

\alpha_{0}=\begin{cases}\frac{\alpha}{32\sqrt{m\ln(4m\exp(\alpha)/\delta)}}&% \text{ if $\alpha\leq 32\sqrt{m\ln(4m\exp(\alpha)/\delta)}$}\\ 2\ln\left(\frac{\varepsilon}{16r\sqrt{m\ln(4m\exp(\alpha)/\delta)}}\right)&32r% \sqrt{m\ln(4m\exp(\alpha)/\delta)}\leq\varepsilon\leq rm\end{cases}.

In the first case, we have

	$\displaystyle\frac{e^{\alpha_{0}}+s}{e^{\alpha_{0}}-e^{(1-r)\alpha_{0}}}\leq% \frac{s}{r\alpha_{0}}$
	$\displaystyle\frac{\sqrt{s+2(e^{\alpha_{0}}-1)}}{e^{\alpha_{0}}+(t-1)e^{(1-r)% \alpha_{0}}-t}\leq\frac{2\sqrt{s}}{\alpha_{0}t},$

and this implies

	$\displaystyle\mathbb{E}[d_{\textsf{EM}}(\tilde{K},\tilde{F})]$	$\displaystyle\leq\sqrt{\frac{s^{3}t^{3}}{mn}}\frac{1}{\alpha_{0}}+\sqrt{\frac{% s^{3}t^{2}}{mn}}\frac{2}{\alpha_{0}}$
		$\displaystyle\leq\frac{64r(st)^{3/2}\sqrt{\ln(4m\exp(\tfrac{\varepsilon}{r})/% \delta)}}{\varepsilon\sqrt{n}}.$

In the second, we have

	$\displaystyle\frac{e^{\alpha_{0}}+s}{e^{\alpha_{0}}-e^{(1-r)\alpha_{0}}}=\frac% {1+s/e^{\alpha_{0}}}{1-e^{-r\alpha_{0}}}\leq 2\frac{1+se^{-\alpha_{0}}}{\min\{% 1,r\alpha_{0}\}}$
	$\displaystyle\frac{\sqrt{s+2(e^{\alpha_{0}}-1)}}{e^{\alpha_{0}}+(t-1)e^{(1-r)% \alpha_{0}}-t}\leq\frac{\sqrt{2(s+e^{\alpha_{0}})}}{e^{\alpha_{0}}}.$

This implies

	$\displaystyle\mathbb{E}[d_{\textsf{EM}}(\tilde{K},\tilde{F})]$
	$\displaystyle\;\leq 2\left(1+\tfrac{1}{r\alpha_{0}}\right)(1+se^{-\alpha_{0}})% r\sqrt{\frac{st^{3}}{mn}}+2\left(e^{-\alpha_{0}}\sqrt{s}+e^{-\alpha_{0}/2}% \right)\sqrt{\frac{s^{2}t^{2}}{mn}}$
	$\displaystyle\;\leq 2(1+se^{-\alpha_{0}})\sqrt{\frac{st^{3}}{mn}}+2\left(e^{-% \alpha_{0}}\sqrt{s}+e^{-\alpha_{0}/2}\right)\sqrt{\frac{s^{2}t^{2}}{mn}}$
	$\displaystyle\;\leq 2(1+\sqrt{s}e^{-\alpha_{0}/2}+se^{-\alpha_{0}})\sqrt{\frac% {st^{3}}{mn}}$
	$\displaystyle\;\leq 4(1+se^{-\alpha_{0}})\sqrt{\frac{st^{3}}{mn}}$
	$\displaystyle\;\leq 4\sqrt{\frac{st^{3}}{mn}}+1024\frac{r^{2}\sqrt{ms^{3}t^{3}% }}{\varepsilon^{2}\sqrt{n}}\ln(4m\exp(\varepsilon/r)/\delta)$
	$\displaystyle\;\leq 4\sqrt{\frac{st^{3}}{mn}}+32\frac{r\sqrt{s^{3}t^{3}}}{% \varepsilon\sqrt{n}}\sqrt{\ln(4m\exp(\varepsilon/r)/\delta)}.$

In both cases, the desired bound has been shown.

D.4. Proof of Lemma 6.7

We use the bound that $d_{\textsf{EM}}(\tilde{K},\tilde{F})\leq\|\tilde{K}-\tilde{F}\|_{1}$ . In each coordinate, the expected error introduced by the Laplace noise is at most $O(\frac{1}{n\varepsilon})$ , and thus $\mathbb{E}[\|\tilde{K}-\tilde{F}\|_{1}]\leq O(\frac{k}{n\varepsilon})$ . Normalizing will only reduce this error.

D.5. Proof of Corollary 6.8

See 6.8 Our mechanism will simply combine the itemsets into one large itemset $K$ with $mn$ elements (and one global user), and then apply the algorithm of Theorem 6.6. By Theorem 4.4, the privacy budget is $(\alpha,\delta)$ , where

\displaystyle\alpha_{0}=\begin{cases}\frac{\alpha\sqrt{n}}{32\sqrt{m\ln(4me^{% \alpha}/\delta)}}&\text{ if $\alpha\sqrt{n}\leq 32\sqrt{m\ln(4me^{\alpha}/% \delta)}$}\\ 2\ln\left(\frac{\alpha\sqrt{n}}{16\sqrt{m\ln(4me^{\alpha}/\delta)}}\right)&32% \sqrt{m\ln(4me^{\alpha}/\delta)}<\alpha\sqrt{n}<m\sqrt{n}\end{cases}

Following the proof in Section D.3, (and setting $\alpha=\frac{\varepsilon}{r}$ ), we can show that

\mathbb{E}[d_{\textsf{EM}}(\tilde{K},\tilde{F})]\leq 4\sqrt{\frac{st^{3}}{mn}}% +64\frac{r\sqrt{s^{3}t^{3}}}{\varepsilon n}\sqrt{\ln(4m\exp(\varepsilon/r)/% \delta)}.

	$\displaystyle D_{\exp(\alpha(1)+\cdots+\alpha(m^{\prime}))}(S(\mathbf{x}_{m^{% \prime}},\mathbf{x}_{0}^{\prime})\\|S(\mathbf{x}_{0},\mathbf{x}_{m^{\prime}}^{% \prime}))$
	$\displaystyle\leq D_{\exp(\alpha(m^{\prime}))}(S(\mathbf{x}_{m^{\prime}},% \mathbf{x}_{0}^{\prime})\\|S(\mathbf{x}_{m^{\prime}-1},\mathbf{x}_{1}^{\prime}))$
	$\displaystyle+e^{\alpha(m^{\prime})}D_{\exp(\alpha(m^{\prime}-1))}(S(\mathbf{x% }_{m^{\prime}-1},\mathbf{x}_{1}^{\prime})\\|S(\mathbf{x}_{m^{\prime}-2},\mathbf% {x}_{2}^{\prime}))$
	$\displaystyle+\cdots$
	$\displaystyle+e^{\alpha(2)+\cdots+\alpha(m^{\prime})}D_{\exp(\alpha(1))}(S(% \mathbf{x}_{1},\mathbf{x}_{m^{\prime}-1}^{\prime})\\|S(\mathbf{x}_{0},\mathbf{x% }_{m^{\prime}}^{\prime}))$
	$\displaystyle\leq e^{\alpha(1)+\cdots+\alpha(m^{\prime})}\sum_{i=1}^{m^{\prime% }}D_{\exp(\alpha(i))}(S(\mathbf{x}_{i-1},\mathbf{x}_{m^{\prime}-i+1}^{\prime})% \\|S(\mathbf{x}_{i},\mathbf{x}_{m^{\prime}-i}^{\prime}))$
	$\displaystyle\leq e^{\alpha(1)+\cdots+\alpha(m^{\prime})}\delta.$

	$\displaystyle\mathbb{E}[d_{\textsf{EM}}(\tilde{F},\tilde{K})]$	$\displaystyle\leq r\mathbb{E}[\\|\tilde{F}-\tilde{K}\\|_{1}]+\mathbb{E}[\\|(% \tilde{F}-\tilde{K})P\\|_{1}]$
		$\displaystyle\leq r\mathbb{E}[\sqrt{st}\\|\tilde{F}-\tilde{K}\\|_{2}]+\mathbb{E}% [\sqrt{s}\\|(\tilde{F}-\tilde{K})P\\|_{2}]$
(25)			$\displaystyle\leq r\sqrt{st\mathbb{E}[\\|\tilde{F}-\tilde{K}\\|_{2}^{2}]}+\sqrt{% s\mathbb{E}[\\|(\tilde{F}-\tilde{K})P\\|_{2}^{2}]}.$

Metric Differential Privacy at the User-Level

Abstract.

1. Introduction

1.1. Details of Our Contributions

1.1.1. Mechanism Design

Linear Query

Theorem 1.1.

Unordered Release of Item-wise Queries

Theorem 1.2.

1.1.2. Extending dEMsubscript𝑑EMd_{\textsf{EM}}italic_d start_POSTSUBSCRIPT EM end_POSTSUBSCRIPT-DP to the Unbounded Setting

Theorem 1.3.

1.1.3. Demonstrating Improvements Over User-level DP

2. Background

2.1. Differential Privacy

Definition 2.1 (Unbounded User-level Local DP (Acharya et al., 2023)).

Definition 2.2 (Unbounded User-level Central DP (Liu et al., 2023)).

Definition 2.3 (Local d𝒳subscript𝑑𝒳d_{\mathcal{X}}italic_d start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT-DP (Alvim et al., 2018)).

2.2. Earth-Mover’s Distance

Definition 2.4.

Lemma 2.1.

3. Definition of dEMsubscript𝑑EMd_{\textsf{EM}}italic_d start_POSTSUBSCRIPT EM end_POSTSUBSCRIPT-DP

Definition 3.1 ((Un)Bounded Local dEMsubscript𝑑EMd_{\textsf{EM}}italic_d start_POSTSUBSCRIPT EM end_POSTSUBSCRIPT-DP).

Definition 3.2 (Bounded Central dEMsubscript𝑑EMd_{\textsf{EM}}italic_d start_POSTSUBSCRIPT EM end_POSTSUBSCRIPT-DP).

4. Mechanisms for dEMsubscript𝑑EMd_{\textsf{EM}}italic_d start_POSTSUBSCRIPT EM end_POSTSUBSCRIPT-DP

4.1. Linear Queries

Theorem 4.1.

Lemma 4.2.

4.2. Unordered Release of Item-wise Queries

Theorem 4.3.

Theorem 4.4.

5. Generalization to Unbounded DP

Definition 5.1.

Fact 5.1.

Definition 5.2.

Lemma 5.2.

Theorem 5.3.

6. Applications of Proposed Mechanisms

6.1. Linear Embedding Queries

6.1.1. Local Model

Lemma 6.1.

Lemma 6.2.

6.1.2. Central Model

Lemma 6.3.

6.2. Frequency Estimation

6.2.1. Algorithms in the Local Model

Lemma 6.4.

Theorem 6.5.

Theorem 6.6.

6.2.2. Algorithms in the Central Model

Lemma 6.7.

Corollary 6.8.

7. Related Work

8. Conclusion

References

Appendix A Omitted Technical Details

Definition A.1.

Lemma A.1.

Proof.

Lemma A.2.

Proof.

Appendix B Omitted Proofs from Section 4

B.1. Proof of Theorem 4.1

B.2. Proof of Lemma 4.2

B.3. Proof of Theorem 4.3

Lemma B.1.

B.4. Proof of Lemma B.1

B.4.1. Preliminary Lemmas

Lemma B.2.

Proof.

Lemma B.3.

Lemma B.4.

Lemma B.5.

B.4.2. Completing the proof of Lemma B.1

B.5. Proof of Theorem 4.4

Appendix C Omitted Proofs from Section 5

C.1. Proof of Lemma 5.2

C.2. Proof of Theorem 5.3

Appendix D Omitted Proofs from Section 6

D.1. Proof of Lemma 6.2

D.2. Proof of Theorem 6.5

1.1.2. Extending $d_{\textsf{EM}}$ -DP to the Unbounded Setting

Definition 2.3 (Local $d_{\mathcal{X}}$ -DP (Alvim et al., 2018)).

3. Definition of $d_{\textsf{EM}}$ -DP

Definition 3.1 ((Un)Bounded Local $d_{\textsf{EM}}$ -DP).

Definition 3.2 (Bounded Central $d_{\textsf{EM}}$ -DP).

4. Mechanisms for $d_{\textsf{EM}}$ -DP