Obtaining (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-differential privacy guarantees when using a Poisson mechanism to synthesize contingency tables

James Jackson Lancaster University, Lancaster, UK Robin Mitra University College London, London, UK Brian Francis Lancaster University, Lancaster, UK Iain Dove
Abstract

We show that differential privacy type guarantees can be obtained when using a Poisson synthesis mechanism to protect counts in contingency tables. Specifically, we show how to obtain (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-probabilistic differential privacy guarantees via the Poisson distribution’s cumulative distribution function. We demonstrate this empirically with the synthesis of an administrative-type confidential database.

1 Introduction

Differential privacy (DP) (Dwork et al., 2006) is a property of a perturbation mechanism that formally quantifies how accurately any individual’s true values can be established, given all other individuals’ true values are known. Originally developed as a way to protect the privacy of summary statistics (queries), it soon expanded as a way to protect entire data sets. Differentially private data synthesis (DIPS) has since become a popular area of research; see, for example, Abowd and Vilhuber (2008); Machanavajjhala et al. (2008); Charest (2011); McClure and Reiter (2012); Bowen and Liu (2020); Quick (2021); Drechsler (2023).

In Jackson et al. (2022b, a), we proposed a synthesis approach for \replacedcontingency tablescategorical data sets, which takes place at the tabular level, and that uses saturated count models. This approach effectively uses a count distribution to apply noise to the counts in the original data’s contingency table, and therefore shares traits with DP mechanisms which apply noise in a similar way. \addedNote that as microdata composed entirely of categorical variables can be expressed in contingency table format, this approach is suitable in the case of categorical data more generally.

In this paper, we consider the ability to obtain DP-guarantees when using the Poisson distribution to synthesize counts in \replacedcontingency tablestabular data (contingency tables). We show that although ϵitalic-ϵ\epsilonitalic_ϵ-DP cannot be satisfied, (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP guarantees can be obtained through the use of the Poisson’s cumulative distribution function (CDF).

\added

The motivation behind this work is that, with the exception of Quick (2021), the use of count distributions has largely been overlooked as a way to satisfy DP. An obvious benefit of using count distributions is that negative counts cannot be obtained. As the Poisson has only one parameter and hence is likely to be sub-optimal, the intention is that in the future the Poisson could be replaced with more complex count distributions, such as the (discretised) gamma family distribution, where additional parameters provide scope for fine-tuning.

The paper is structured as follows. Section 2 introduces some terminology and definitions. Section 3 looks at existing DP mechanisms for contingency tables, such as the (discretised) Laplace and Gaussian mechanisms. Section 4 gives our novel contribution, the ability to obtain (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP guarantees when using a Poisson synthesis mechanism. Section 5 gives an empirical example using an administrative database. Section 6 gives some concluding remarks.

2 Terminology and definitions

Rinott et al. (2018) set out how DP extends into a contingency table setting. Following their notation, let 𝐚=(ak,,aK)𝒜𝐚subscript𝑎𝑘subscript𝑎𝐾𝒜\mathbf{a}=(a_{k},\ldots,a_{K})\in\mathcal{A}bold_a = ( italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) ∈ caligraphic_A and 𝐛=(bk,,bK)𝐛subscript𝑏𝑘subscript𝑏𝐾\mathbf{b}=(b_{k},\ldots,b_{K})\in\mathcal{B}bold_b = ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , … , italic_b start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) ∈ caligraphic_B denote vectors of \deletedcell counts in the original and synthetic data’s contingency tables, respectively, where K𝐾Kitalic_K denotes the number of cells and 𝒜𝒜\mathcal{A}caligraphic_A and \mathcal{B}caligraphic_B denote the range of obtainable original and synthetic counts (respectively). For contingency tables, we suppose that 𝒜==0K𝒜superscriptsubscriptabsent0𝐾\mathcal{A}=\mathcal{B}=\mathbb{Z}_{\geq 0}^{K}caligraphic_A = caligraphic_B = blackboard_Z start_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT, where 0subscriptabsent0\mathbb{Z}_{\geq 0}blackboard_Z start_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT is the set of non-negative integers.

Moreover, we describe 𝐚𝐚\mathbf{a}bold_a and 𝐚superscript𝐚\mathbf{a}^{\prime}bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT as neighbours, denoted by 𝐚𝐚similar-to𝐚superscript𝐚\mathbf{a}\sim\mathbf{a}^{\prime}bold_a ∼ bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, whenever all but one of the counts in 𝐚𝐚\mathbf{a}bold_a and 𝐚superscript𝐚\mathbf{a}^{\prime}bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are identical and the differing count differs by exactly one. Henceforth, without loss of generality, we suppose 𝐚𝐚\mathbf{a}bold_a and 𝐚superscript𝐚\mathbf{a}^{\prime}bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT differ in their k𝑘kitalic_kth element only, i.e. ak=ak1superscriptsubscript𝑎𝑘subscript𝑎𝑘1a_{k}^{\prime}={a}_{k}-1italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 and ai=aisubscript𝑎𝑖superscriptsubscript𝑎𝑖a_{i}=a_{i}^{\prime}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT for i=1,,K𝑖1𝐾i=1,\ldots,Kitalic_i = 1 , … , italic_K, ik𝑖𝑘i\neq kitalic_i ≠ italic_k. Thus 𝐚superscript𝐚\mathbf{a}^{\prime}bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT represents the data held by the intruder (who knows all but one of the individuals’ true values) and 𝐚𝐚\mathbf{a}bold_a represents the completed data where the “unknown individual” has been added to the cell in which they truly belong.

The ϵitalic-ϵ\epsilonitalic_ϵ-DP definition revolves around the likelihood ratio, or, more accurately, around a series of likelihood ratios.

Definition 1 (ϵitalic-ϵ\epsilonitalic_ϵ-DP)

A perturbation mechanism \mathcal{M}caligraphic_M satisfies ϵitalic-ϵ\epsilonitalic_ϵ-DP (ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0) if:

exp(ϵ)((𝐚)=𝐛)((𝐚)=𝐛)exp(ϵ),italic-ϵ𝐚𝐛superscript𝐚𝐛italic-ϵ\displaystyle\exp{(-\epsilon)}\leq\frac{\mathbb{P}\left(\mathcal{M}(\mathbf{a}% )=\mathbf{b}\right)}{\mathbb{P}\left(\mathcal{M}(\mathbf{a^{\prime}})=\mathbf{% b}\right)}\leq\exp{(\epsilon)},roman_exp ( - italic_ϵ ) ≤ divide start_ARG blackboard_P ( caligraphic_M ( bold_a ) = bold_b ) end_ARG start_ARG blackboard_P ( caligraphic_M ( bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = bold_b ) end_ARG ≤ roman_exp ( italic_ϵ ) , (1)
𝐚𝐚𝒜,𝐛.formulae-sequencesimilar-tofor-all𝐚superscript𝐚𝒜for-all𝐛\displaystyle\quad\forall\;\mathbf{a}\sim\mathbf{a}^{\prime}\in\mathcal{A},\;% \forall\;\mathbf{b}\in\mathcal{B}.∀ bold_a ∼ bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_A , ∀ bold_b ∈ caligraphic_B .

Definition 1 is the special case of the standard DP definition, given in Dwork et al. (2006), for when the range of 𝒜𝒜\mathcal{A}caligraphic_A and \mathcal{B}caligraphic_B are discrete. \addedAlthough we appreciate that in some instances the denominator in (1) could be equal to zero, for the mechanisms we consider here this probability is always non-zero.

For any 𝐚𝐚\mathbf{a}bold_a, 𝐚superscript𝐚\mathbf{a}^{\prime}bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and 𝐛𝐛\mathbf{b}bold_b, whenever the ratio ((𝐚)=b)/((𝐚)=b)𝐚𝑏superscript𝐚𝑏{\mathbb{P}(\mathcal{M}(\mathbf{a})=b)}/{\mathbb{P}(\mathcal{M}(\mathbf{a^{% \prime}})=b)}blackboard_P ( caligraphic_M ( bold_a ) = italic_b ) / blackboard_P ( caligraphic_M ( bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_b ) is either small or large, relatively too much is gleaned about the unknown individual’s true values. It is worth noting, too, that the above definition considers all possible synthetic data sets in \mathcal{B}caligraphic_B, illustrating that DP is not a risk metric for a particular synthetic data set but rather a property of a synthesis mechanism.

Somewhat confusingly, there are two similar but different relaxations of ϵitalic-ϵ\epsilonitalic_ϵ-DP. The first is (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-differential privacy (Dwork and Roth, 2014). The second is known as (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-probabilistic differential privacy (Machanavajjhala et al., 2008). These are given below in Definitions 2 and 3. In the remainder of this paper, we focus on (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-probabilistic DP. Yet whenever (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-probabilistic DP is satisfied, (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP is also satisfied (Goetz et al., 2012).

Definition 2 ((ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP)

A perturbation mechanism \mathcal{M}caligraphic_M satisfies (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP (ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0; 0δ10𝛿10\leq\delta\leq 10 ≤ italic_δ ≤ 1) if:

((𝐚)=b)δ((𝐚)=b)𝐚𝑏𝛿superscript𝐚𝑏\displaystyle\frac{\mathbb{P}(\mathcal{M}(\mathbf{a})=b)-\delta}{\mathbb{P}(% \mathcal{M}(\mathbf{a^{\prime}})=b)}divide start_ARG blackboard_P ( caligraphic_M ( bold_a ) = italic_b ) - italic_δ end_ARG start_ARG blackboard_P ( caligraphic_M ( bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_b ) end_ARG exp(ϵ)and((𝐚)=b)δ((𝐚)=b)exp(ϵ)formulae-sequenceabsentitalic-ϵandsuperscript𝐚𝑏𝛿𝐚𝑏italic-ϵ\displaystyle\leq\exp{(\epsilon)}\quad\text{and}\quad\frac{\mathbb{P}(\mathcal% {M}(\mathbf{a}^{\prime})=b)-\delta}{\mathbb{P}(\mathcal{M}(\mathbf{a})=b)}\leq% \exp{(\epsilon)}≤ roman_exp ( italic_ϵ ) and divide start_ARG blackboard_P ( caligraphic_M ( bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_b ) - italic_δ end_ARG start_ARG blackboard_P ( caligraphic_M ( bold_a ) = italic_b ) end_ARG ≤ roman_exp ( italic_ϵ ) (2)
𝐚𝐚𝒜,𝐛.formulae-sequencesimilar-tofor-all𝐚superscript𝐚𝒜𝐛\displaystyle\quad\forall\;\mathbf{a}\sim\mathbf{a}^{\prime}\in\mathcal{A},\;% \mathbf{b}\in\mathcal{B}.∀ bold_a ∼ bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_A , bold_b ∈ caligraphic_B .
Definition 3 ((ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-probabilistic DP)

A perturbation mechanism \mathcal{M}caligraphic_M satisfies (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-probabilistic DP (ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0; 0δ10𝛿10\leq\delta\leq 10 ≤ italic_δ ≤ 1) if:

[1exp(ϵ)\displaystyle\mathbb{P}\bigg{[}\frac{1}{\exp{(\epsilon)}}blackboard_P [ divide start_ARG 1 end_ARG start_ARG roman_exp ( italic_ϵ ) end_ARG ((𝐚)=b)((𝐚)=b)exp(ϵ)]>1δ𝐚𝐚𝒜,𝐛.\displaystyle\leq\frac{\mathbb{P}(\mathcal{M}(\mathbf{a})=b)}{\mathbb{P}(% \mathcal{M}(\mathbf{a^{\prime}})=b)}\leq\exp{(\epsilon)}\bigg{]}>1-\delta\quad% \forall\;\mathbf{a}\sim\mathbf{a}^{\prime}\in\mathcal{A},\;\mathbf{b}\in% \mathcal{B}.≤ divide start_ARG blackboard_P ( caligraphic_M ( bold_a ) = italic_b ) end_ARG start_ARG blackboard_P ( caligraphic_M ( bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_b ) end_ARG ≤ roman_exp ( italic_ϵ ) ] > 1 - italic_δ ∀ bold_a ∼ bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_A , bold_b ∈ caligraphic_B . (3)
Theorem 1 ((ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-probabilistic DP implies (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP)

If a perturbation mechanism \mathcal{M}caligraphic_M satisfies (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-probabilistic DP, then it also satisfies (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP. (Proof: see Goetz et al. (2012))

3 Examples of existing DP mechanisms

We now give examples of existing DP mechanisms suitable for synthesizing counts in contingency tables. \addedNote that for the Laplace and Gaussian mechanisms, discretised noise needs to be added (unless one is willing to accept non-integer “counts”). This can simply involve adding continuous noise before rounding the adjusted values to the nearest integer. Similarly, negative values can be rounded to zero.

Example 1 (The Laplace mechanism)

A random variable Xsimilar-to𝑋absentX\simitalic_X ∼ Laplace(μ,d)𝜇𝑑(\mu,d)( italic_μ , italic_d ) has probability density function fLsubscript𝑓𝐿f_{L}italic_f start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT:

fL(x;μ,d)subscript𝑓𝐿𝑥𝜇𝑑\displaystyle f_{L}(x;\mu,d)italic_f start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( italic_x ; italic_μ , italic_d ) =12dexp(|xμ|d).absent12𝑑𝑥𝜇𝑑\displaystyle={\frac{1}{2d}}\exp\left(-{\frac{|x-\mu|}{d}}\right).= divide start_ARG 1 end_ARG start_ARG 2 italic_d end_ARG roman_exp ( - divide start_ARG | italic_x - italic_μ | end_ARG start_ARG italic_d end_ARG ) .

The Laplace mechanism Lsubscript𝐿\mathcal{M}_{L}caligraphic_M start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT satisfies ϵitalic-ϵ\epsilonitalic_ϵ-DP by using the Laplace distribution to add random noise to the original counts 𝐚𝐚\mathbf{a}bold_a. \addedSpecifically, for every original count aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the Laplace mechanism generates a Laplace(ai,1/ϵ)subscript𝑎𝑖1italic-ϵ(a_{i},1/\epsilon)( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 / italic_ϵ ) random variate. To show that this mechanism does indeed satisfy DP, we suppose that ai=aisubscript𝑎𝑖superscriptsubscript𝑎𝑖a_{i}=a_{i}^{\prime}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT for i=1,,k1,k+1,,K𝑖1𝑘1𝑘1𝐾i=1,\ldots,k-1,k+1,\ldots,Kitalic_i = 1 , … , italic_k - 1 , italic_k + 1 , … , italic_K and that ak=ak1superscriptsubscript𝑎𝑘subscript𝑎𝑘1a_{k}^{\prime}=a_{k}-1italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 (\addedi.e. the assumptions made in Section 2). Firstly, when bk>aksubscript𝑏𝑘subscript𝑎𝑘b_{k}>a_{k}italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT:

(L(𝐚)=𝐛)(L(𝐚)=𝐛)subscript𝐿𝐚𝐛subscript𝐿superscript𝐚𝐛\displaystyle\frac{\mathbb{P}(\mathcal{M}_{L}(\mathbf{a})=\mathbf{b})}{\mathbb% {P}(\mathcal{M}_{L}(\mathbf{a}^{\prime})=\mathbf{b})}divide start_ARG blackboard_P ( caligraphic_M start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( bold_a ) = bold_b ) end_ARG start_ARG blackboard_P ( caligraphic_M start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = bold_b ) end_ARG =exp(ϵ|bkak|)exp(ϵ|bkak|)absentitalic-ϵsubscript𝑏𝑘subscript𝑎𝑘italic-ϵsubscript𝑏𝑘superscriptsubscript𝑎𝑘\displaystyle=\frac{\exp\left(-{\epsilon|b_{k}-a_{k}|}\right)}{\exp\left(-{% \epsilon|b_{k}-a_{k}^{\prime}|}\right)}= divide start_ARG roman_exp ( - italic_ϵ | italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | ) end_ARG start_ARG roman_exp ( - italic_ϵ | italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | ) end_ARG
=exp(ϵ|bkak|)exp(ϵ|bk(ak1)|)absentitalic-ϵsubscript𝑏𝑘subscript𝑎𝑘italic-ϵsubscript𝑏𝑘subscript𝑎𝑘1\displaystyle=\frac{\exp\left(-{\epsilon|b_{k}-a_{k}|}\right)}{\exp\left(-{% \epsilon|b_{k}-(a_{k}-1)|}\right)}= divide start_ARG roman_exp ( - italic_ϵ | italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | ) end_ARG start_ARG roman_exp ( - italic_ϵ | italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - ( italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 ) | ) end_ARG
=exp(ϵ).absentitalic-ϵ\displaystyle=\exp{(\epsilon)}.= roman_exp ( italic_ϵ ) . (4)

Similarly, when ak>bksubscript𝑎𝑘subscript𝑏𝑘a_{k}>b_{k}italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, (4) is equal to exp(-ϵitalic-ϵ\epsilonitalic_ϵ), and when ak=bksubscript𝑎𝑘subscript𝑏𝑘a_{k}=b_{k}italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT it is equal to exp(0). Hence the DP definition in (1) holds.

Example 2 (The Gaussian mechanism)

A random variable Xsimilar-to𝑋absentX\simitalic_X ∼ Normal(μ,σ2)𝜇superscript𝜎2(\mu,\sigma^{2})( italic_μ , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) has probability density function fGsubscript𝑓𝐺f_{G}italic_f start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT:

fG(x;μ,σ2)subscript𝑓𝐺𝑥𝜇superscript𝜎2\displaystyle f_{G}(x;\mu,\sigma^{2})italic_f start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_x ; italic_μ , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) =1σ2πexp[12(xμσ)2]absent1𝜎2𝜋12superscript𝑥𝜇𝜎2\displaystyle=\frac{1}{\sigma\sqrt{2\pi}}\exp\left[-\frac{1}{2}\left({\frac{x-% \mu}{\sigma}}\right)^{2}\right]= divide start_ARG 1 end_ARG start_ARG italic_σ square-root start_ARG 2 italic_π end_ARG end_ARG roman_exp [ - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG italic_x - italic_μ end_ARG start_ARG italic_σ end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]

In a similar way to the Laplace mechanism, the Gaussian mechanism, say Gsubscript𝐺\mathcal{M}_{G}caligraphic_M start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT, applies Normal(0,  σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT) random noise to the original counts, resulting in a mechanism that satisfies (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-differential privacy. Using the same assumptions and notation as previous, it follows that:

(G(𝐚)=b)(G(𝐚)=b)subscript𝐺𝐚𝑏subscript𝐺superscript𝐚𝑏\displaystyle\frac{\mathbb{P}(\mathcal{M}_{G}(\mathbf{a})=b)}{\mathbb{P}(% \mathcal{M}_{G}(\mathbf{a^{\prime}})=b)}divide start_ARG blackboard_P ( caligraphic_M start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( bold_a ) = italic_b ) end_ARG start_ARG blackboard_P ( caligraphic_M start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_b ) end_ARG =1σ2πexp[12(bkakσ)2]1σ2πexp[12(bkak+1σ)2]absent1𝜎2𝜋12superscriptsubscript𝑏𝑘subscript𝑎𝑘𝜎21𝜎2𝜋12superscriptsubscript𝑏𝑘subscript𝑎𝑘1𝜎2\displaystyle=\frac{{\frac{1}{\sigma{\sqrt{2\pi}}}}\exp\left[{-{\frac{1}{2}}% \left({\frac{b_{k}-a_{k}}{\sigma}}\right)^{2}}\right]}{{\frac{1}{\sigma{\sqrt{% 2\pi}}}}\exp\left[{-{\frac{1}{2}}\left({\frac{b_{k}-a_{k}+1}{\sigma}}\right)^{% 2}}\right]}= divide start_ARG divide start_ARG 1 end_ARG start_ARG italic_σ square-root start_ARG 2 italic_π end_ARG end_ARG roman_exp [ - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_σ end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG italic_σ square-root start_ARG 2 italic_π end_ARG end_ARG roman_exp [ - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + 1 end_ARG start_ARG italic_σ end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG
=exp[12σ2(2ak2bk1)].absent12superscript𝜎22subscript𝑎𝑘2subscript𝑏𝑘1\displaystyle=\exp\left[-{\frac{1}{2\sigma^{2}}}(2a_{k}-2b_{k}-1)\right].= roman_exp [ - divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( 2 italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 2 italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 ) ] .
Recall that (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-probabilistic DP is satisfied whenever
1exp(ϵ)1italic-ϵ\displaystyle\frac{1}{\exp{(\epsilon)}}divide start_ARG 1 end_ARG start_ARG roman_exp ( italic_ϵ ) end_ARG ((𝐚)=b)((𝐚)=b)exp(ϵ)with probability 1δ,formulae-sequenceabsent𝐚𝑏superscript𝐚𝑏italic-ϵwith probability 1δ,\displaystyle\leq\frac{\mathbb{P}(\mathcal{M}(\mathbf{a})=b)}{\mathbb{P}(% \mathcal{M}(\mathbf{a^{\prime}})=b)}\leq\exp{(\epsilon)}\quad\text{with % probability $1-\delta$,}≤ divide start_ARG blackboard_P ( caligraphic_M ( bold_a ) = italic_b ) end_ARG start_ARG blackboard_P ( caligraphic_M ( bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_b ) end_ARG ≤ roman_exp ( italic_ϵ ) with probability 1 - italic_δ ,
which, in this instance, occurs whenever
ϵitalic-ϵ\displaystyle-\epsilon- italic_ϵ 12σ2(2ak2bk1)ϵwith probability 1δ.formulae-sequenceabsent12superscript𝜎22subscript𝑎𝑘2subscript𝑏𝑘1italic-ϵwith probability 1δ\displaystyle\leq-{\frac{1}{2\sigma^{2}}}(2a_{k}-2b_{k}-1)\leq\epsilon\quad% \text{with probability $1-\delta$}.≤ - divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( 2 italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 2 italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 ) ≤ italic_ϵ with probability 1 - italic_δ .

The probability 1δ1𝛿1-\delta1 - italic_δ can be obtained from ΦΦ\Phiroman_Φ, the normal distribution’s CDF (Balle and Wang, 2018), as bkNormal(ak,σ2)similar-tosubscript𝑏𝑘Normalsubscript𝑎𝑘superscript𝜎2b_{k}\sim\text{Normal}(a_{k},\sigma^{2})italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∼ Normal ( italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ).

1δ1𝛿\displaystyle 1-\delta1 - italic_δ =(ϵ12σ2(2ak2bk1)ϵ)absentitalic-ϵ12superscript𝜎22subscript𝑎𝑘2subscript𝑏𝑘1italic-ϵ\displaystyle=\mathbb{P}(-\epsilon\leq-{\frac{1}{2\sigma^{2}}}(2a_{k}-2b_{k}-1% )\leq\epsilon)= blackboard_P ( - italic_ϵ ≤ - divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( 2 italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 2 italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 ) ≤ italic_ϵ )
=(akσ2ϵ1/2bkak+σ2ϵ1/2)absentsubscript𝑎𝑘superscript𝜎2italic-ϵ12subscript𝑏𝑘subscript𝑎𝑘superscript𝜎2italic-ϵ12\displaystyle=\mathbb{P}(a_{k}-\sigma^{2}\epsilon-{1}/{2}\leq b_{k}\leq a_{k}+% \sigma^{2}\epsilon-{1}/{2})= blackboard_P ( italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϵ - 1 / 2 ≤ italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϵ - 1 / 2 )
=Φ(ak+σ2ϵ1/2akσ)Φ(akσ2ϵ1/2akσ)absentΦsubscript𝑎𝑘superscript𝜎2italic-ϵ12subscript𝑎𝑘𝜎Φsubscript𝑎𝑘superscript𝜎2italic-ϵ12subscript𝑎𝑘𝜎\displaystyle=\Phi\left(\frac{a_{k}+\sigma^{2}\epsilon-{1}/{2}-a_{k}}{\sigma}% \right)-\Phi\left(\frac{a_{k}-\sigma^{2}\epsilon-{1}/{2}-a_{k}}{\sigma}\right)= roman_Φ ( divide start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϵ - 1 / 2 - italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_σ end_ARG ) - roman_Φ ( divide start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϵ - 1 / 2 - italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_σ end_ARG )
=Φ(σϵ1/(2σ))Φ(σϵ1/(2σ))absentΦ𝜎italic-ϵ12𝜎Φ𝜎italic-ϵ12𝜎\displaystyle=\Phi\left({\sigma\epsilon-{1}/{(2\sigma)}}\right)-\Phi\left(-% \sigma\epsilon-{1}/{(2\sigma)}\right)= roman_Φ ( italic_σ italic_ϵ - 1 / ( 2 italic_σ ) ) - roman_Φ ( - italic_σ italic_ϵ - 1 / ( 2 italic_σ ) )
Example 3 (Multinomial-Dirichlet synthesizer)

A multinomial-Dirichlet synthesis mechanism (Abowd and Vilhuber, 2008), say MDsubscript𝑀𝐷\mathcal{M}_{MD}caligraphic_M start_POSTSUBSCRIPT italic_M italic_D end_POSTSUBSCRIPT, can also yield DP guarantees. The original counts 𝐚𝐚\mathbf{a}bold_a can be converted to cell probabilities 𝝅𝝅\boldsymbol{\pi}bold_italic_π simply by dividing by n𝑛nitalic_n (the number of individuals in the data). A Dirichlet prior with concentration parameters 𝜶=(αk,α2,,αK)𝜶subscript𝛼𝑘subscript𝛼2subscript𝛼𝐾\boldsymbol{\alpha}=(\alpha_{k},\alpha_{2},\ldots,\alpha_{K})bold_italic_α = ( italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_α start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) is placed on 𝝅𝝅\boldsymbol{\pi}bold_italic_π (see Abowd and Vilhuber (2008) for more on this approach). Using the same “without loss of generality” assumptions as previous, it follows that

(MD(𝐚)=𝐛)(MD(𝐚)=𝐛)subscript𝑀𝐷𝐚𝐛subscript𝑀𝐷superscript𝐚𝐛\displaystyle\frac{\mathbb{P}(\mathcal{M}_{MD}(\mathbf{a})=\mathbf{b})}{% \mathbb{P}(\mathcal{M}_{MD}(\mathbf{a}^{\prime})=\mathbf{b})}divide start_ARG blackboard_P ( caligraphic_M start_POSTSUBSCRIPT italic_M italic_D end_POSTSUBSCRIPT ( bold_a ) = bold_b ) end_ARG start_ARG blackboard_P ( caligraphic_M start_POSTSUBSCRIPT italic_M italic_D end_POSTSUBSCRIPT ( bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = bold_b ) end_ARG =Γ(bk+ak+αk)Γ(ak+αk)Γ(ak+αk)Γ(bk+ak+αk)absentΓsubscript𝑏𝑘subscript𝑎𝑘subscript𝛼𝑘Γsubscript𝑎𝑘subscript𝛼𝑘Γsuperscriptsubscript𝑎𝑘subscript𝛼𝑘Γsubscript𝑏𝑘superscriptsubscript𝑎𝑘subscript𝛼𝑘\displaystyle=\frac{\Gamma(b_{k}+a_{k}+\alpha_{k})}{\Gamma(a_{k}+\alpha_{k})}% \cdot\frac{\Gamma(a_{k}^{\prime}+\alpha_{k})}{\Gamma(b_{k}+a_{k}^{\prime}+% \alpha_{k})}= divide start_ARG roman_Γ ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG roman_Γ ( italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG ⋅ divide start_ARG roman_Γ ( italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG roman_Γ ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG
=Γ(bk+ak+αk)Γ(ak+αk)Γ(ak1+αk)Γ(bk+ak1+αk)absentΓsubscript𝑏𝑘subscript𝑎𝑘subscript𝛼𝑘Γsubscript𝑎𝑘subscript𝛼𝑘Γsubscript𝑎𝑘1subscript𝛼𝑘Γsubscript𝑏𝑘subscript𝑎𝑘1subscript𝛼𝑘\displaystyle=\frac{\Gamma(b_{k}+a_{k}+\alpha_{k})}{\Gamma(a_{k}+\alpha_{k})}% \cdot\frac{\Gamma(a_{k}-1+\alpha_{k})}{\Gamma(b_{k}+a_{k}-1+\alpha_{k})}= divide start_ARG roman_Γ ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG roman_Γ ( italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG ⋅ divide start_ARG roman_Γ ( italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG roman_Γ ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG
=bk+ak1+αkak1+αk.absentsubscript𝑏𝑘subscript𝑎𝑘1subscript𝛼𝑘subscript𝑎𝑘1subscript𝛼𝑘\displaystyle=\frac{b_{k}+a_{k}-1+\alpha_{k}}{a_{k}-1+\alpha_{k}}.= divide start_ARG italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG . (5)
Recall again that DP is satisfied whenever
1exp(ϵ)1italic-ϵ\displaystyle\frac{1}{\exp{(\epsilon)}}divide start_ARG 1 end_ARG start_ARG roman_exp ( italic_ϵ ) end_ARG ((𝐚)=𝐛)((𝐚)=𝐛)exp(ϵ).absent𝐚𝐛superscript𝐚𝐛italic-ϵ\displaystyle\leq\frac{\mathbb{P}(\mathcal{M}(\mathbf{a})=\mathbf{b})}{\mathbb% {P}(\mathcal{M}(\mathbf{a}^{\prime})=\mathbf{b})}\leq{\exp{(\epsilon)}}.≤ divide start_ARG blackboard_P ( caligraphic_M ( bold_a ) = bold_b ) end_ARG start_ARG blackboard_P ( caligraphic_M ( bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = bold_b ) end_ARG ≤ roman_exp ( italic_ϵ ) .
As the expression in (5) is always greater than or equal to one, and hence always greater than 1/exp(ϵitalic-ϵ\epsilonitalic_ϵ), DP is satisfied whenever
bk+ak1+αkak1+αksubscript𝑏𝑘subscript𝑎𝑘1subscript𝛼𝑘subscript𝑎𝑘1subscript𝛼𝑘\displaystyle\frac{b_{k}+a_{k}-1+\alpha_{k}}{a_{k}-1+\alpha_{k}}divide start_ARG italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG exp(ϵ).absentitalic-ϵ\displaystyle\leq{\exp{(\epsilon)}}.≤ roman_exp ( italic_ϵ ) .
As ak1subscript𝑎𝑘1a_{k}\geq 1italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≥ 1 and bknsubscript𝑏𝑘𝑛b_{k}\leq nitalic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_n, this simplifies to
n+αkαk𝑛subscript𝛼𝑘subscript𝛼𝑘\displaystyle\frac{n+\alpha_{k}}{\alpha_{k}}divide start_ARG italic_n + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG exp(ϵ)αknexp(ϵ)1.formulae-sequenceabsentitalic-ϵsubscript𝛼𝑘𝑛italic-ϵ1\displaystyle\leq{\exp{(\epsilon)}}\quad\Rightarrow\quad\alpha_{k}\geq\frac{n}% {\exp{(\epsilon)}-1}.≤ roman_exp ( italic_ϵ ) ⇒ italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≥ divide start_ARG italic_n end_ARG start_ARG roman_exp ( italic_ϵ ) - 1 end_ARG .
Considering all counts a1,,aKsubscript𝑎1subscript𝑎𝐾a_{1},\ldots,a_{K}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT gives that DP is satisfied whenever
maxiαisubscriptmax𝑖subscript𝛼𝑖\displaystyle\text{max}_{i}\alpha_{i}max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT nexp(ϵ)1,a result from Machanavajjhala et al. (2008).absent𝑛italic-ϵ1a result from Machanavajjhala et al. (2008).\displaystyle\geq\frac{n}{\exp{(\epsilon)}-1},\quad\text{a result from \cite[c% ite]{\@@bibref{Authors Phrase1YearPhrase2}{machanavajjhala2008privacy}{% \@@citephrase{(}}{\@@citephrase{)}}}.}≥ divide start_ARG italic_n end_ARG start_ARG roman_exp ( italic_ϵ ) - 1 end_ARG , a result from .

4 Satisfying (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-probabilistic DP with a Poisson synthesis mechanism

When using saturated count models to synthesize contingency tables, as set out in Jackson et al. (2022b), a count distribution, e.g. the Poisson, applies noise to original counts. We assume that a constant pseudocount α>0𝛼0\alpha>0italic_α > 0 is added to every element of 𝐚𝐚\mathbf{a}bold_a (i.e. to all original counts, not just to zero counts as in Jackson et al. (2022b)), which opens up the possibility that original counts of zero can be synthesized to non-zeros. When using the Poisson we apply the following mechanism, which we denote by Psubscript𝑃\mathcal{M}_{P}caligraphic_M start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT, to obtain a set of synthetic counts:

biai,αconditionalsubscript𝑏𝑖subscript𝑎𝑖𝛼\displaystyle{b}_{i}\mid{a}_{i},\alphaitalic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_α Poisson(ai+α),i=1,,K,formulae-sequencesimilar-toabsentPoissonsubscript𝑎𝑖𝛼𝑖1𝐾\displaystyle\sim\text{Poisson}(a_{i}+\alpha),\quad i=1,\ldots,K,∼ Poisson ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_α ) , italic_i = 1 , … , italic_K ,
i.e.(P(ai)=bi)i.e.subscript𝑃subscript𝑎𝑖subscript𝑏𝑖\displaystyle\text{i.e.}\quad\mathbb{P}(\mathcal{M}_{P}(a_{i})=b_{i})i.e. blackboard_P ( caligraphic_M start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) =exp(aiα)(ai+α)bibi!,i=1,,K.formulae-sequenceabsentsubscript𝑎𝑖𝛼superscriptsubscript𝑎𝑖𝛼subscript𝑏𝑖subscript𝑏𝑖𝑖1𝐾\displaystyle=\frac{\exp{(-a_{i}-\alpha)}(a_{i}+\alpha)^{b_{i}}}{b_{i}!},\quad i% =1,\ldots,K.= divide start_ARG roman_exp ( - italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_α ) ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_α ) start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ! end_ARG , italic_i = 1 , … , italic_K .
Supposing once again that 𝐚𝐚\mathbf{a}bold_a and 𝐚superscript𝐚\mathbf{a}^{\prime}bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT differ in their k𝑘kitalic_kth element only, we have:
(P(𝐚)=𝐛)(P(𝐚)=𝐛)subscript𝑃𝐚𝐛subscript𝑃superscript𝐚𝐛\displaystyle\frac{\mathbb{P}(\mathcal{M}_{P}(\mathbf{a})=\mathbf{b})}{\mathbb% {P}(\mathcal{M}_{P}(\mathbf{a}^{\prime})=\mathbf{b})}divide start_ARG blackboard_P ( caligraphic_M start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( bold_a ) = bold_b ) end_ARG start_ARG blackboard_P ( caligraphic_M start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = bold_b ) end_ARG =exp(1)(ak+αak1+α)bk.absent1superscriptsubscript𝑎𝑘𝛼subscript𝑎𝑘1𝛼subscript𝑏𝑘\displaystyle=\exp{(-1)}\bigg{(}\frac{a_{k}+\alpha}{a_{k}-1+\alpha}\bigg{)}^{b% _{k}}.= roman_exp ( - 1 ) ( divide start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_α end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 + italic_α end_ARG ) start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT . (6)

This quantity is bounded below by exp(-1), with this minimum occurring when bk=0subscript𝑏𝑘0b_{k}=0italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0. It is unbounded above, however, as bksubscript𝑏𝑘b_{k}italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT can take any integer up to infinity; i.e. the expression in (6) tends to infinity as bksubscript𝑏𝑘b_{k}italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT tends to infinity. Thus ϵitalic-ϵ\epsilonitalic_ϵ-DP cannot be satisfied.

Instead, we now consider the (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-probabilistic DP relaxation, first considering the left-hand inequality of the DP definition (Def. 1):

1exp(ϵ)1italic-ϵabsent\displaystyle\frac{1}{\exp{(\epsilon)}}\leqdivide start_ARG 1 end_ARG start_ARG roman_exp ( italic_ϵ ) end_ARG ≤ (P(𝐚)=𝐛)(P(𝐚)=𝐛)bk1ϵlog(ak+αak1+α).subscript𝑃𝐚𝐛subscript𝑃superscript𝐚𝐛subscript𝑏𝑘1italic-ϵlogsubscript𝑎𝑘𝛼subscript𝑎𝑘1𝛼\displaystyle\;\frac{\mathbb{P}(\mathcal{M}_{P}(\mathbf{a})=\mathbf{b})}{% \mathbb{P}(\mathcal{M}_{P}(\mathbf{a}^{\prime})=\mathbf{b})}\quad\Rightarrow% \quad b_{k}\geq\frac{1-\epsilon}{\text{log}\left(\frac{a_{k}+\alpha}{a_{k}-1+% \alpha}\right)}.divide start_ARG blackboard_P ( caligraphic_M start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( bold_a ) = bold_b ) end_ARG start_ARG blackboard_P ( caligraphic_M start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = bold_b ) end_ARG ⇒ italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≥ divide start_ARG 1 - italic_ϵ end_ARG start_ARG log ( divide start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_α end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 + italic_α end_ARG ) end_ARG .

When ϵ1italic-ϵ1\epsilon\geq 1italic_ϵ ≥ 1, this inequality holds with probability 1. When 0<ϵ<10italic-ϵ10<\epsilon<10 < italic_ϵ < 1, the probability that this inequality holds can be determined through the Poisson’s CDF, since bksubscript𝑏𝑘b_{k}italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a realization from a Poisson random variable. This probability is given as:

1Fak+αP[1ϵlog(ak+αak1+α)],1superscriptsubscript𝐹subscript𝑎𝑘𝛼𝑃delimited-[]1italic-ϵlogsubscript𝑎𝑘𝛼subscript𝑎𝑘1𝛼\displaystyle 1-F_{a_{k}+\alpha}^{P}\left[\frac{1-\epsilon}{\text{log}\left(% \frac{a_{k}+\alpha}{a_{k}-1+\alpha}\right)}\right],1 - italic_F start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT [ divide start_ARG 1 - italic_ϵ end_ARG start_ARG log ( divide start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_α end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 + italic_α end_ARG ) end_ARG ] , (7)

where Fak+αPsuperscriptsubscript𝐹subscript𝑎𝑘𝛼𝑃F_{a_{k}+\alpha}^{P}italic_F start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT is the CDF of the Poisson distribution with mean ak+αsubscript𝑎𝑘𝛼a_{k}+\alphaitalic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_α.

We next consider the right-hand inequality of Def. 1:

(P(𝐚)=𝐛)(P(𝐚)=𝐛)subscript𝑃𝐚𝐛subscript𝑃superscript𝐚𝐛absent\displaystyle\frac{\mathbb{P}(\mathcal{M}_{P}(\mathbf{a})=\mathbf{b})}{\mathbb% {P}(\mathcal{M}_{P}(\mathbf{a}^{\prime})=\mathbf{b})}\leqdivide start_ARG blackboard_P ( caligraphic_M start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( bold_a ) = bold_b ) end_ARG start_ARG blackboard_P ( caligraphic_M start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( bold_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = bold_b ) end_ARG ≤ exp(ϵ)bk1+ϵlog(ak+αak1+α).italic-ϵsubscript𝑏𝑘1italic-ϵlogsubscript𝑎𝑘𝛼subscript𝑎𝑘1𝛼\displaystyle\;{\exp{(\epsilon)}}\quad\Rightarrow\quad b_{k}\leq\frac{1+% \epsilon}{\text{log}\left(\frac{a_{k}+\alpha}{a_{k}-1+\alpha}\right)}.roman_exp ( italic_ϵ ) ⇒ italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ divide start_ARG 1 + italic_ϵ end_ARG start_ARG log ( divide start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_α end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 + italic_α end_ARG ) end_ARG .

For all ϵitalic-ϵ\epsilonitalic_ϵ, this inequality holds with probability

Fak+αP[1+ϵlog(ak+αak1+α)].superscriptsubscript𝐹subscript𝑎𝑘𝛼𝑃delimited-[]1italic-ϵlogsubscript𝑎𝑘𝛼subscript𝑎𝑘1𝛼\displaystyle F_{a_{k}+\alpha}^{P}\left[\frac{1+\epsilon}{\text{log}\left(% \frac{a_{k}+\alpha}{a_{k}-1+\alpha}\right)}\right].italic_F start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT [ divide start_ARG 1 + italic_ϵ end_ARG start_ARG log ( divide start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_α end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 + italic_α end_ARG ) end_ARG ] . (8)

Recall that in (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-probabilistic DP, 1δ1𝛿1-\delta1 - italic_δ is the probability that DP is satisfied, i.e. the probability that both inequalities hold. A non-trivial question when 0<ϵ<10italic-ϵ10<\epsilon<10 < italic_ϵ < 1 is how to combine the probabilities given in (7) and (8) and hence compute δ𝛿\deltaitalic_δ? This is an area of future research.

When ϵ>1italic-ϵ1\epsilon>1italic_ϵ > 1, however, the left-hand inequality of Def. 1 always holds, thus we need only focus on (8). Although non-trivial for any ϵ1italic-ϵ1\epsilon\geq 1italic_ϵ ≥ 1 and α>0𝛼0\alpha>0italic_α > 0, (8) is minimised when ak=1subscript𝑎𝑘1a_{k}=1italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 (when ak=0superscriptsubscript𝑎𝑘0a_{k}^{\prime}=0italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 0). Note, a formal proof has been omitted here but extensive empirical simulation results have been undertaken. Thus,

1δ=F1+αP[1+ϵlog(1+αα)].1𝛿superscriptsubscript𝐹1𝛼𝑃delimited-[]1italic-ϵlog1𝛼𝛼\displaystyle 1-\delta=F_{1+\alpha}^{P}\left[\frac{1+\epsilon}{\text{log}\left% (\frac{1+\alpha}{\alpha}\right)}\right].1 - italic_δ = italic_F start_POSTSUBSCRIPT 1 + italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT [ divide start_ARG 1 + italic_ϵ end_ARG start_ARG log ( divide start_ARG 1 + italic_α end_ARG start_ARG italic_α end_ARG ) end_ARG ] . (9)

This also demonstrates the role of α𝛼\alphaitalic_α as a tuning parameter for risk. In general, a larger α𝛼\alphaitalic_α value corresponds to a lower δ𝛿\deltaitalic_δ value. Yet δ𝛿\deltaitalic_δ is not a decreasing function of α𝛼\alphaitalic_α. For a very brief explanation, this is because increasing α𝛼\alphaitalic_α increases the value of the expression inside the squared bracket in (9), but it also increases the mean of the Poisson random variable from which a synthetic count is drawn. Figure 1 illustrates the nature of the relationship between α𝛼\alphaitalic_α and δ𝛿\deltaitalic_δ for different values of ϵitalic-ϵ\epsilonitalic_ϵ. For example, setting α=0.1𝛼0.1\alpha=0.1italic_α = 0.1 satisfies approximately (3,0.3)-probabilistic DP and (1.5,0.6)-probabilistic DP.

Refer to caption
Figure 1: The relationship between α𝛼\alphaitalic_α and δ𝛿\deltaitalic_δ in the Poisson synthesis mechanism for ϵ=1.5italic-ϵ1.5\epsilon=1.5italic_ϵ = 1.5 and ϵ=3italic-ϵ3\epsilon=3italic_ϵ = 3.

In contingency tables where there are no zero counts, a (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP guarantee can be obtained when α=0𝛼0\alpha=0italic_α = 0. In this instance, δ𝛿\deltaitalic_δ is determined by the smallest original count, i.e.:

1δ=Fai+αP[1+ϵlog(miniai+1miniai)].1𝛿superscriptsubscript𝐹subscript𝑎𝑖𝛼𝑃delimited-[]1italic-ϵlogsubscriptmin𝑖subscript𝑎𝑖1subscriptmin𝑖subscript𝑎𝑖\displaystyle 1-\delta=F_{a_{i}+\alpha}^{P}\left[\frac{1+\epsilon}{\text{log}% \left(\frac{\text{min}_{i}a_{i}+1}{\text{min}_{i}a_{i}}\right)}\right].1 - italic_δ = italic_F start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT [ divide start_ARG 1 + italic_ϵ end_ARG start_ARG log ( divide start_ARG min start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + 1 end_ARG start_ARG min start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) end_ARG ] . (10)

In a sense, in this example we have violated the traditional (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-probabilistic DP definition given in (3) because δ𝛿\deltaitalic_δ is dependent on a particular set of original counts 𝐚𝐚\mathbf{a}bold_a – not all original counts.

We can easily replace the Poisson with any other count distribution (e.g. the negative binomial, Poisson inverse-Gaussian, Delaporte, Sichel, etc.), which of course would lead to a different expression for the ratio in (6).

5 An empirical example

5.1 The English School Census administrative database

The English School Census (ESC) is a large administrative database belonging to the UK’s Department for Education (DfE), which holds information about pupils attending state-funded schools in the UK. Owing to the presence of sensitive data, strict privacy guarantees would be required for data from the ESC to be made available to researchers. There is therefore great appeal to DP-type approaches, where more formal guarantees of privacy can be obtained.

Access to the real ESC data is currently restricted, even for the sake of demonstrating the effectiveness of privacy methods. For this reason, staff at the Office for National Statistics (ONS) created a substitute data set using publicly-available data sources, such as published ESC data and 2011 UK census data. A key feature of this data set, ESCreprep{}_{\text{rep}}start_FLOATSUBSCRIPT rep end_FLOATSUBSCRIPT, is that it replicates some of the statistical properties present in the actual ESC. We take a subset of this data which has approximately 8×1068superscript1068\times 10^{6}8 × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT individuals (rows) and 5 categorical variables (columns). As all variables are categorical, the data set can be expressed as a contingency table with around 3.5×1063.5superscript1063.5\times 10^{6}3.5 × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT cells. More information about the data set – as well as the data set itself – is available at Blanchard et al. (2022).

5.2 Applying the Poisson synthesis mechanism

We now apply the Poisson synthesis mechanism to the ESCreprep{}_{\text{rep}}start_FLOATSUBSCRIPT rep end_FLOATSUBSCRIPT data, considering different values of α𝛼\alphaitalic_α, and considering ϵ>1italic-ϵ1\epsilon>1italic_ϵ > 1 values.

Figure 2 gives combinations of (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ ) values that can be achieved for the ESCreprep{}_{\text{rep}}start_FLOATSUBSCRIPT rep end_FLOATSUBSCRIPT data when using α𝛼\alphaitalic_α values of 0.1, 0.2, 0.5 and 1. For example, when ϵ=2italic-ϵ2\epsilon=2italic_ϵ = 2, an α𝛼\alphaitalic_α value of 1 is required to obtain a δ𝛿\deltaitalic_δ value of 0.05; when α=0.1𝛼0.1\alpha=0.1italic_α = 0.1, a δ𝛿\deltaitalic_δ value of 0.05, is obtained only for ϵitalic-ϵ\epsilonitalic_ϵ values greater than 6.

DP methods,in general are known to have a detrimental effect on utility. To gain a simple insight into general utility (Snoke et al., 2018), the boxplots in Figure 3 compare the percentage differences between original and synthetic counts for various values of α𝛼\alphaitalic_α, and for original counts between 1 and 10. Unsurprisingly, increasing α𝛼\alphaitalic_α increasing the percentage differences, i.e. has an adverse effect on utility. This loss of utility is more magnified in specific analyses, especially when the analyst wishes to quantify uncertainty.

Refer to caption
Figure 2: Combinations of δ𝛿\deltaitalic_δ such that (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-probabilistic DP is achieved when the Poisson is used, for various maxaiisubscriptsubscript𝑎𝑖𝑖{}_{i}a_{i}start_FLOATSUBSCRIPT italic_i end_FLOATSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and ϵitalic-ϵ\epsilonitalic_ϵ equal to 1.5, 2, 2.5 and 3.
Refer to caption
Figure 3: For different values of α𝛼\alphaitalic_α, boxplots showing percentage differences between original and synthetic counts (utility) for original counts in the range 1–10.

6 Discussion

To summarise, in this paper we have shown how to obtain (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP guarantees when using a Poisson synthesis mechanism to protect the privacy of counts in contingency tables. \addedFor a given ϵ>1italic-ϵ1\epsilon>1italic_ϵ > 1, the corresponding value of δ𝛿\deltaitalic_δ that is achievable with the Poisson is relatively high; much higher than that which is achievable with other DP mechanisms. Going forward, we believe other count distributions, such as the negative binomial, are likely to be more favourable (i.e. will give better utility results), while also providing the same DP-type risk guarantees, because such distributions would introduce further tuning parameters in addition to α𝛼\alphaitalic_α. Previous work suggests that such tuning parameter apply noise in a more efficient fashion (Jackson et al., 2022a). These tuning parameters could be set to obtain certain ϵitalic-ϵ\epsilonitalic_ϵ or δ𝛿\deltaitalic_δ values.

We end with an interesting note in relation to DP. Somewhat counterintuitively, the reason why multinomial-based synthesis mechanisms (e.g. the multinomial Dirichlet synthesizer) can satisfy ϵitalic-ϵ\epsilonitalic_ϵ-DP – but the Poisson cannot – is because with multinomial mechanisms have a maximum synthetic count that any original count can take, namely n𝑛nitalic_n. With count distributions, any original count can be synthesized to any non-negative integer. To help explain why this causes the DP definition to fail, recall that, with contingency tables, DP definitions effectively assume that the intruder is trying to locate the cell to which just one individual belongs; i.e. in the intruder’s data set one, and only one, cell count is one less than it actually is. Suppose that a particular count in the intruder’s data set is equal to 1, but that the corresponding synthetic count – generated by simulating from the Poisson with α=0𝛼0\alpha=0italic_α = 0 – has a count of 5. It is 11.7 times more likely that this synthetic count originated from a cell with a count of 2 than from a count of 1, therefore the intruder can infer that that particular cell is a likely origin of the target. It is interesting therefore that, with DP, disclosure risk is deemed to be at its greatest when the scope for potential movement between original and synthetic counts is at its greatest. This largely goes against the objectives of traditional SDC methods, which typically reduce risk by increasing the divergence from the original counts.

References

  • Abowd and Vilhuber (2008) Abowd, J. M. and Vilhuber, L. (2008) How Protective Are Synthetic Data? In Privacy in Statistical Databases 2008 (eds. J. Domingo-Ferrer and Y. Saygın), 239–246. Berlin, Heidelberg: Springer.
  • Balle and Wang (2018) Balle, B. and Wang, Y.-X. (2018) Improving the Gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In Proceedings of the 35th International Conference on Machine Learning (eds. J. Dy and A. Krause), vol. 80 of Proceedings of Machine Learning Research, 394–403. PMLR. URL: https://proceedings.mlr.press/v80/balle18a.html.
  • Blanchard et al. (2022) Blanchard, S., Jackson, J. E., Mitra, R., Francis, B. J. and Dove, I. (2022) A constructed English School Census substitute. URL: 10.17635/lancaster/researchdata/533.
  • Bowen and Liu (2020) Bowen, C. M. and Liu, F. (2020) Comparative Study of Differentially Private Data Synthesis Methods. Statistical Science, 35, 280 – 307. URL: https://doi.org/10.1214/19-STS742.
  • Charest (2011) Charest, A.-S. (2011) How can we analyze differentially-private synthetic datasets? Journal of Privacy and Conf., 2. URL: https://journalprivacyconfidentiality.org/index.php/jpc/article/view/589.
  • Drechsler (2023) Drechsler, J. (2023) Differential privacy for government agencies—are we there yet? Journal of the American Statistical Association, 118, 761–773. URL: https://doi.org/10.1080/01621459.2022.2161385.
  • Dwork et al. (2006) Dwork, C., McSherry, F., Nissim, K. and Smith, A. (2006) Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography (eds. S. Halevi and T. Rabin), 265–284. Berlin, Heidelberg: Springer.
  • Dwork and Roth (2014) Dwork, C. and Roth, A. (2014) The Algorithmic Foundations of Differential Privacy. Foundations and Trends® in Theoretical Computer Science, 9, 211–407. URL: http://dx.doi.org/10.1561/0400000042.
  • Goetz et al. (2012) Goetz, M., Machanavajjhala, A., Wang, G., Xiao, X. and Gehrke, J. (2012) Publishing Search Logs - A Comparative Study of Privacy Guarantees. IEEE Trans. Knowl. Data Eng., 24, 520–532.
  • Jackson et al. (2022a) Jackson, J., Mitra, R., Francis, B. and Dove, I. (2022a) On integrating the number of synthetic data sets m𝑚mitalic_m into the a priori synthesis approach. In Privacy in Statistical Databases 2022 (eds. J. Domingo-Ferrer and M. Laurent), 205–219. Cham: Springer International Publishing.
  • Jackson et al. (2022b) — (2022b) Using Saturated Count Models for User-Friendly Synthesis of Large Confidential Administrative Databases. Journal of the Royal Statistical Society Series A: Statistics in Society, 185, 1613–1643. URL: https://doi.org/10.1111/rssa.12876.
  • Machanavajjhala et al. (2008) Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J. and Vilhuber, L. (2008) Privacy: Theory meets practice on the map. In 2008 IEEE 24th international conference on data engineering, 277–286. IEEE.
  • McClure and Reiter (2012) McClure, D. and Reiter, J. P. (2012) Differential Privacy and Statistical Disclosure Risk Measures: An Investigation with Binary Synthetic Data. Transactions on Data Privacy, 5, 535––552.
  • Quick (2021) Quick, H. (2021) Generating Poisson-distributed differentially private synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society), 184, 1093–1108. URL: https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssa.12711.
  • Rinott et al. (2018) Rinott, Y., O’Keefe, C. M., Shlomo, N., Skinner, C. et al. (2018) Confidentiality and Differential Privacy in the Dissemination of Frequency Tables. Statistical Science, 33, 358–385.
  • Snoke et al. (2018) Snoke, J., Raab, G. M., Nowok, B., Dibben, C. and Slavkovic, A. (2018) General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society), 181, 663–688.