Rate-Distortion-Perception Tradeoff for Gaussian Vector Sources

**g**g Qian, Sadaf Salehkalaibar, Jun Chen, Ashish Khisti, Wei Yu,
Wuxian Shi, Yiqun Ge and Wen Tong
**g**g Qian and Jun Chen are with the Department of Electrical and Computer Engineering at McMaster University, Hamilton, ON L8S 4K1, Canada (email: {qianj40, chenjun}@mcmaster.ca).Sadaf Salehkalaibar, Ashish Khisti and Wei Yu are with the Department of Electrical and Computer Engineering at the University of Toronto, Toronto, M5S 3G4, Canada (email:{sadafs, akhisti, weiyu}@ece.utoronto.ca),Wuxian Shi, Yiqun Ge and Wen Tong are with the Ottawa Research Center, Huawei Technologies, Ottawa, ON K2K 3J1, Canada (email: {wuxian.shi, yiqun.ge, tongwen}@huawei.com)
Abstract

This paper studies the rate-distortion-perception (RDP) tradeoff for a Gaussian vector source coding problem where the goal is to compress the multi-component source subject to distortion and perception constraints. The purpose of imposing a perception constraint is to ensure visually pleasing reconstructions. This paper studies this RDP setting with either the Kullback-Leibler (KL) divergence or Wasserstein-2 metric as the perception loss function, and shows that for Gaussian vector sources, jointly Gaussian reconstructions are optimal. We further demonstrate that the optimal tradeoff can be expressed as an optimization problem, which can be explicitly solved. An interesting property of the optimal solution is as follows. Without the perception constraint, the traditional reverse water-filling solution for characterizing the rate-distortion (RD) tradeoff of a Gaussian vector source states that the optimal rate allocated to each component depends on a constant, called the water-level. If the variance of a specific component is below the water-level, it is assigned a zero compression rate. However, with active distortion and perception constraints, we show that the optimal rates allocated to the different components are always positive. Moreover, the water-levels that determine the optimal rate allocation for different components are unequal. We further treat the special case of perceptually perfect reconstruction and study its RDP function in the high-distortion and low-distortion regimes to obtain insight to the structure of the optimal solution.

Index Terms:
Rate-distortion-perception function, lossy source coding, lossy compression, Gaussian vector sources, reverse water-filling

I Introduction

The rate-distortion-perception (RDP) function is a generalization of Shannon’s rate-distortion function that incorporates an additional perception loss function which measures the distance between the distributions of the source and the reconstruction. It has been observed that in the neural compression framework [1, 2, 3, 4], improving realism in the reconstruction comes at the price of increased distortion. In this framework, realism is controlled by a perception loss function between the distributions of the source and the reconstruction, while distortion is controlled via a standard distortion loss function on the samples of the source and its reconstruction, e.g., in terms of mean squared error. The RDP function introduced in Blau and Michaeli [5] formalizes this tradeoff.

The extension of classical rate-distortion (RD) theory to incorporate constraints on the distribution of the reconstruction samples has been studied in various works in the information theory literature; see e.g., [6] and references therein. More recently, Theis and Wagner [7] present a one-shot coding theorem by means of the strong functional representation lemma (SFRL) [8] to establish the operational validity of the RDP function [5]. In [9], the authors establish analytic properties of the RDP function for the special case of (scalar) Gaussian sources, with a quadratic distortion function and a perception loss function of either Kullback–Leibler (KL) divergence or Wasserstein-2 distance between the source and the reconstruction distributions. The role of common randomness in the study of RDP function has been studied in [10, 11]. Furthermore, the distortion-perception tradeoff with a squared error distortion and Wasserstein-2 perception loss, but without an explicit compression rate constraint, has been studied in [12, 13], where it is shown that the entire tradeoff curve can be achieved by interpolating the two extremal reconstructions based on a given representation. Other related works include [14, 15].

This paper studies the RDP function of a Gaussian vector source under a squared error distortion and either KL divergence or Wasserstein-2 distance as the perception loss metric. Our result is thus an extension of prior work [9] on scalar Gaussian sources to the case of vector sources. We start by demonstrating the optimality of jointly Gaussian reconstructions for Gaussian vector sources in the RDP setting. We then show that by decomposing the Gaussian vector source using the unitary transformation obtained from the eigenvalue decomposition of its covariance matrix, it is possible to derive an achievable RDP function of the Gaussian vector source in term of the RDP functions of its constituent scalar components. The optimality of this achievable scheme can be established by a converse proof. This means that the characterization of the optimal RDP function can be formulated as an optimization problem. We explicitly derive the solution of the optimization problem and investigate structural properties of the optimal solution.

The optimal RDP function for the Gaussian vector source has the following interesting property. Without the perception constraint, the rate-distortion function of a parallel Gaussian source model has a classical reverse water-filling characterization [16, Thm 10.3], where the optimal rate allocation across the components is computed according to a distortion dependent parameter called water-level. A positive rate is assigned to those components that have a variance above this parameter. Any component whose variance is below the water-level has a zero rate; see Fig. 1(a). However, with a perception constraint, we observe a qualitatively different solution as shown in Fig. 1(b). First, unlike the case of reverse water-filling, the associated water-level for each component can be different and is characterized as a solution to a set of equations. Second, while reverse water-filling assigns zero rate to those source components whose variances are below the water-level, all components in the RDP setting are assigned a non-zero rate as long as both the distortion and perception constraints are active.

Refer to caption
Figure 1: (a) Without a perception constraint, the traditional reverse water-filling solution for a parallel Gaussian source fixes a constant water-level. When the variance of a specific component is less than the water-level, it is assigned zero rate. (b) With an active perception constraint, unequal water-levels are assigned to different components. The variance of each component is always greater than the corresponding water-level. Every component has a positive rate.

We further consider the special case of zero perception loss (so the source and reconstruction distributions are identical) and establish analytical results in this case. Moreover, we present asymptotic results on high and low distortion cases with zero perception, and shed additional insights into the difference between the RDP function and the RD function.

The rest of the paper is organized as follows. In Section II, we introduce the system model and some preliminaries. Some basics on the traditional reverse water-filling solution are provided in Section III. We discuss the generalized water-filling solution in Section IV for both KL-divergence and Wasserstein-2 distance as perception metrics; some properties of the RDP function are also discussed for perfect perceptual reconstruction; the asymptotic analysis is provided for both low and high distortion regimes.

Notation: We denote entropy, differential entropy and mutual information by H(.)H(.)italic_H ( . ), h(.)h(.)italic_h ( . ) and I(.;.)I(.;.)italic_I ( . ; . ), respectively. The cardinality of the set 𝒳𝒳\mathcal{X}caligraphic_X is written as |𝒳|𝒳|\mathcal{X}|| caligraphic_X |. We use PXsubscript𝑃𝑋P_{X}italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT to denote the probability distribution function of a random vector X𝑋Xitalic_X. We use 𝒩(μ,Σ)𝒩𝜇Σ\mathcal{N}(\mu,\Sigma)caligraphic_N ( italic_μ , roman_Σ ) to denote the Gaussian distribution with mean μ𝜇\muitalic_μ and covariance matrix ΣΣ\Sigmaroman_Σ. We use 𝔼[]𝔼delimited-[]\mathbb{E}[\cdot]blackboard_E [ ⋅ ] to denote the expectation operator, and \mathbb{R}blackboard_R to denote the set of real numbers. Throughout this paper, the base of the logarithm function is e𝑒eitalic_e.

II System Model and Preliminaries

Let XPXsimilar-to𝑋subscript𝑃𝑋X\sim P_{X}italic_X ∼ italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT be an L𝐿Litalic_L-dimensional Gaussian vector source with mean 00 and covariance matrix ΣX0succeedssubscriptΣ𝑋0\Sigma_{X}\succ 0roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ≻ 0. Consider the eigenvalue decomposition of ΣXsubscriptΣ𝑋\Sigma_{X}roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT as follows:

ΣX=ΘTΛXΘ,subscriptΣ𝑋superscriptΘ𝑇subscriptΛ𝑋Θ\displaystyle\Sigma_{X}=\Theta^{T}\Lambda_{X}\Theta,roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT = roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT roman_Θ , (1)

where ΘΘ\Thetaroman_Θ is unitary and ΛXsubscriptΛ𝑋\Lambda_{X}roman_Λ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT is a diagonal matrix of positive eigenvalues111 Note that if some of the eigenvalues are zero, the corresponding columns of the unitary matrix ΘΘ\Thetaroman_Θ can be removed, and we have a diagonal ΛXsubscriptΛ𝑋\Lambda_{X}roman_Λ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT of lower dimension. The rest of the derivations follows the same way.

ΛX=diagL(λ1,,λL).subscriptΛ𝑋superscriptdiag𝐿subscript𝜆1subscript𝜆𝐿\displaystyle\Lambda_{X}=\text{diag}^{L}(\lambda_{1},\ldots,\lambda_{L}).roman_Λ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT = diag start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_λ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) . (2)

We assume that there is unlimited common randomness K𝒦𝐾𝒦K\in\mathcal{K}italic_K ∈ caligraphic_K shared between the encoder and the decoder. Consider the following one-shot encoding and decoding functions where the source samples are encoded one at a time:

f𝑓\displaystyle fitalic_f ::\displaystyle\colon: L×𝒦,superscript𝐿𝒦\displaystyle\mathbbm{R}^{L}\times\mathcal{K}\to\mathcal{M},blackboard_R start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT × caligraphic_K → caligraphic_M , (3)
g𝑔\displaystyle gitalic_g ::\displaystyle\colon: ×𝒦L.𝒦superscript𝐿\displaystyle\mathcal{M}\times\mathcal{K}\to\mathbbm{R}^{L}.caligraphic_M × caligraphic_K → blackboard_R start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT . (4)

Here, \mathcal{M}caligraphic_M denotes the set of messages. Let PX^subscript𝑃^𝑋P_{\hat{X}}italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT be the distribution of the reconstruction induced by the encoding and decoding mechanisms. In this paper, we measure distortion using a squared-error loss function d:L×L0:𝑑superscript𝐿superscript𝐿subscriptabsent0d\colon\mathbbm{R}^{L}\times\mathbbm{R}^{L}\to\mathbbm{R}_{\geq 0}italic_d : blackboard_R start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT → blackboard_R start_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT where d(x,x^):=xx^2assign𝑑𝑥^𝑥superscriptnorm𝑥^𝑥2d(x,\hat{x}):=\|x-\hat{x}\|^{2}italic_d ( italic_x , over^ start_ARG italic_x end_ARG ) := ∥ italic_x - over^ start_ARG italic_x end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. From a perceptual perspective, for given probability distributions PXsubscript𝑃𝑋P_{X}italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT and PX^subscript𝑃^𝑋P_{\hat{X}}italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT, we use ϕ(PX,PX^)italic-ϕsubscript𝑃𝑋subscript𝑃^𝑋\phi(P_{X},P_{\hat{X}})italic_ϕ ( italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ) to denote the perception loss function capturing the difference between the two distributions. For the two perception metrics that we consider in the following discussion, we have ϕ(PX,PX^)=0italic-ϕsubscript𝑃𝑋subscript𝑃^𝑋0\phi(P_{X},P_{\hat{X}})=0italic_ϕ ( italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ) = 0 if and only if PX=PX^subscript𝑃𝑋subscript𝑃^𝑋P_{X}=P_{\hat{X}}italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT.

The above framework is referred to as the one-shot setting, because it compresses one sample at a time. We can also define the setting of encoding n𝑛nitalic_n independently and identically distributed (i.i.d.) samples Xn=(X1,,Xn)superscript𝑋𝑛subscript𝑋1subscript𝑋𝑛X^{n}=(X_{1},\ldots,X_{n})italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) and reconstructing X^n=(X^1,,X^n)superscript^𝑋𝑛subscript^𝑋1subscript^𝑋𝑛\hat{X}^{n}=(\hat{X}_{1},\ldots,\hat{X}_{n})over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = ( over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), and consider the asymptotic setting with n𝑛n\to\inftyitalic_n → ∞.

Definition 1 (Operational RDP Functions)

Let XPXsimilar-to𝑋subscript𝑃𝑋X\sim P_{X}italic_X ∼ italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT. For given distortion-perception constraints (D,P)𝐷𝑃(D,P)( italic_D , italic_P ), a rate R𝑅Ritalic_R is said to be achievable if there exist encoding and decoding functions satisfying

𝔼[(M)]𝔼delimited-[]𝑀\displaystyle\mathbbm{E}[\ell(M)]blackboard_E [ roman_ℓ ( italic_M ) ] \displaystyle\leq R,𝑅\displaystyle R,italic_R , (5)
𝔼[XX^2]𝔼delimited-[]superscriptnorm𝑋^𝑋2\displaystyle\mathbbm{E}[\|X-\hat{X}\|^{2}]blackboard_E [ ∥ italic_X - over^ start_ARG italic_X end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] \displaystyle\leq D,𝐷\displaystyle D,italic_D , (6)
ϕ(PX,PX^)italic-ϕsubscript𝑃𝑋subscript𝑃^𝑋\displaystyle\phi(P_{X},P_{\hat{X}})italic_ϕ ( italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ) \displaystyle\leq P,𝑃\displaystyle P,italic_P , (7)

where (M)𝑀\ell(M)roman_ℓ ( italic_M ) denotes the length of the message M𝑀Mitalic_M for encoding one sample. The infimum of all achievable rates R𝑅Ritalic_R is called the one-shot rate-distortion-perception (RDP) function, denoted as Ro(D,P)superscript𝑅𝑜𝐷𝑃R^{o}(D,P)italic_R start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ( italic_D , italic_P ).

For the asymptotic setting, given distortion-perception constraints (D,P)𝐷𝑃(D,P)( italic_D , italic_P ), a rate R𝑅Ritalic_R is said to be achievable if there exist encoding and decoding functions such that

limn1ni=1n𝔼[XiX^i2]subscript𝑛1𝑛superscriptsubscript𝑖1𝑛𝔼delimited-[]superscriptnormsubscript𝑋𝑖subscript^𝑋𝑖2\displaystyle\lim_{n\to\infty}\frac{1}{n}\sum_{i=1}^{n}\mathbbm{E}[\|X_{i}-% \hat{X}_{i}\|^{2}]roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E [ ∥ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] \displaystyle\leq D,𝐷\displaystyle D,italic_D , (8)
limn1ni=1nϕ(PXi,PX^i)subscript𝑛1𝑛superscriptsubscript𝑖1𝑛italic-ϕsubscript𝑃subscript𝑋𝑖subscript𝑃subscript^𝑋𝑖\displaystyle\lim_{n\to\infty}\frac{1}{n}\sum_{i=1}^{n}\phi(P_{X_{i}},P_{\hat{% X}_{i}})roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ϕ ( italic_P start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) \displaystyle\leq P,𝑃\displaystyle P,italic_P , (9)

with the message M𝑀Mitalic_M that encodes Xnsuperscript𝑋𝑛X^{n}italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT satisfying

limn1n𝔼[(M)]subscript𝑛1𝑛𝔼delimited-[]𝑀\displaystyle\lim_{n\to\infty}\frac{1}{n}\mathbbm{E}[\ell(M)]roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG blackboard_E [ roman_ℓ ( italic_M ) ] \displaystyle\leq R.𝑅\displaystyle R.italic_R . (10)

The infimum of all achievable rates is called the asymptotic RDP function, denoted as R(D,P)superscript𝑅𝐷𝑃R^{\infty}(D,P)italic_R start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( italic_D , italic_P ).

Definition 2 (Information RDP Function)

For given XPXsimilar-to𝑋subscript𝑃𝑋X\sim P_{X}italic_X ∼ italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT, let 𝒫X^|X(D,P)subscript𝒫conditional^𝑋𝑋𝐷𝑃\mathcal{P}_{\hat{X}|X}(D,P)caligraphic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG | italic_X end_POSTSUBSCRIPT ( italic_D , italic_P ) be the set of conditional distributions PX^|Xsubscript𝑃conditional^𝑋𝑋P_{\hat{X}|X}italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG | italic_X end_POSTSUBSCRIPT such that for a fixed (D,P)𝐷𝑃(D,P)( italic_D , italic_P ), we have

𝔼[XX^2]D,ϕ(PX,PX^)P.formulae-sequence𝔼delimited-[]superscriptnorm𝑋^𝑋2𝐷italic-ϕsubscript𝑃𝑋subscript𝑃^𝑋𝑃\mathbbm{E}[\|X-\hat{X}\|^{2}]\leq D,\qquad\phi(P_{X},P_{\hat{X}})\leq P.blackboard_E [ ∥ italic_X - over^ start_ARG italic_X end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_D , italic_ϕ ( italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ) ≤ italic_P . (11)

The information rate-distortion-perception (RDP) function is defined as

R(D,P)=infPX^|X𝒫X^|X(D,P)I(X;X^).𝑅𝐷𝑃subscriptinfimumsubscript𝑃conditional^𝑋𝑋subscript𝒫conditional^𝑋𝑋𝐷𝑃𝐼𝑋^𝑋R(D,P)=\inf_{P_{\hat{X}|X}\in\mathcal{P}_{\hat{X}|X}(D,P)}I(X;\hat{X}).italic_R ( italic_D , italic_P ) = roman_inf start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG | italic_X end_POSTSUBSCRIPT ∈ caligraphic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG | italic_X end_POSTSUBSCRIPT ( italic_D , italic_P ) end_POSTSUBSCRIPT italic_I ( italic_X ; over^ start_ARG italic_X end_ARG ) . (12)

As explained in detail later, using the SFRL as in [8] and following similar steps to Theorem 2 and Theorem 5 in Appendix A.2 of [9], one can show that

R(D,P)Ro(D,P)R(D,P)+log(R(D,P)+1)+5𝑅𝐷𝑃superscript𝑅𝑜𝐷𝑃𝑅𝐷𝑃𝑅𝐷𝑃15R(D,P)\leq R^{o}(D,P)\leq R(D,P)+\log(R(D,P)+1)+5italic_R ( italic_D , italic_P ) ≤ italic_R start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ( italic_D , italic_P ) ≤ italic_R ( italic_D , italic_P ) + roman_log ( italic_R ( italic_D , italic_P ) + 1 ) + 5 (13)

and

R(D,P)=R(D,P).superscript𝑅𝐷𝑃𝑅𝐷𝑃R^{\infty}(D,P)=R(D,P).italic_R start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( italic_D , italic_P ) = italic_R ( italic_D , italic_P ) . (14)

Consequently, the one-shot operational RDP function Ro(D,P)superscript𝑅𝑜𝐷𝑃R^{o}(D,P)italic_R start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ( italic_D , italic_P ) is asymptotically close to the information RDP function R(D,P)𝑅𝐷𝑃R(D,P)italic_R ( italic_D , italic_P ) and the asymptotic RDP function R(D,P)superscript𝑅𝐷𝑃R^{\infty}(D,P)italic_R start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( italic_D , italic_P ) at high rate.

In the rest of the paper, the perception metric ϕ(PX,PX^)italic-ϕsubscript𝑃𝑋subscript𝑃^𝑋\phi(P_{X},P_{\hat{X}})italic_ϕ ( italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ) is assumed to be either the KL-divergence, i.e.,

D(PX^PX)=xPX^(x)logPX^(x)PX(x)dx,𝐷conditionalsubscript𝑃^𝑋subscript𝑃𝑋subscript𝑥subscript𝑃^𝑋𝑥subscript𝑃^𝑋𝑥subscript𝑃𝑋𝑥𝑑𝑥\displaystyle D(P_{\hat{X}}\|P_{X})=\int_{x}P_{\hat{X}}(x)\log\frac{P_{\hat{X}% }(x)}{P_{X}(x)}dx,italic_D ( italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ∥ italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ) = ∫ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ( italic_x ) roman_log divide start_ARG italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) end_ARG italic_d italic_x , (15)

or the (squared) Wasserstein-2 distance, i.e.,

W22(PX,PX^)=inf𝔼[XX^2],superscriptsubscript𝑊22subscript𝑃𝑋subscript𝑃^𝑋infimum𝔼delimited-[]superscriptnorm𝑋^𝑋2\displaystyle W_{2}^{2}(P_{X},P_{\hat{X}})=\inf\mathbbm{E}[\|X-\hat{X}\|^{2}],italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ) = roman_inf blackboard_E [ ∥ italic_X - over^ start_ARG italic_X end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] , (16)

where the infimum is taken over all joint distributions of (X,X^)𝑋^𝑋(X,\hat{X})( italic_X , over^ start_ARG italic_X end_ARG ) with marginals PXsubscript𝑃𝑋P_{X}italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT and PX^subscript𝑃^𝑋P_{\hat{X}}italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT.

Before characterizing the RDP function, we first review the case of no perception constraint, which corresponds to traditional reverse water-filling for the classical rate-distortion function.

III Traditional Reverse Water-Filling

The classical rate-distortion theory for a parallel Gaussian source states that the optimal rate allocated to each component depends on a constant parameter, called water-level, as shown in Fig. 1(a). The water-level also represents the distortion allowed at those components whose variances are above the water-level. For a given distortion D𝐷Ditalic_D, let ν(D)𝜈𝐷\nu(D)italic_ν ( italic_D ) be the solution to the equation

=1L[λν(D)]+superscriptsubscript1𝐿superscriptdelimited-[]subscript𝜆𝜈𝐷\displaystyle\sum_{\ell=1}^{L}\left[\lambda_{\ell}-\nu(D)\right]^{+}∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT [ italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_ν ( italic_D ) ] start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT =\displaystyle== [=1LλD]+,superscriptdelimited-[]superscriptsubscript1𝐿subscript𝜆𝐷\displaystyle\left[\sum_{\ell=1}^{L}\lambda_{\ell}-D\right]^{+},[ ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_D ] start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , (17)

where [x]+:=max{0,x}assignsuperscriptdelimited-[]𝑥0𝑥[x]^{+}:=\max\{0,x\}[ italic_x ] start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT := roman_max { 0 , italic_x }. Now, let

γ(D,)={λifν(D)λ,ν(D)ifν(D)<λ.superscriptsubscript𝛾𝐷casessubscript𝜆if𝜈𝐷subscript𝜆𝜈𝐷if𝜈𝐷subscript𝜆\displaystyle\gamma_{\ell}^{*}(D,\infty)=\left\{\begin{array}[]{ll}\lambda_{% \ell}&\;\;\text{if}\;\;\nu(D)\geq\lambda_{\ell},\\ \nu(D)&\;\;\text{if}\;\;\nu(D)<\lambda_{\ell}.\end{array}\right.italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , ∞ ) = { start_ARRAY start_ROW start_CELL italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_CELL start_CELL if italic_ν ( italic_D ) ≥ italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_ν ( italic_D ) end_CELL start_CELL if italic_ν ( italic_D ) < italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT . end_CELL end_ROW end_ARRAY (20)

The rate-distortion function for the Gaussian vector source with variance λsubscript𝜆\lambda_{\ell}italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT for its \ellroman_ℓ-th component, {1,,L}1𝐿\ell\in\{1,\ldots,L\}roman_ℓ ∈ { 1 , … , italic_L }, is as follows.

Theorem 1 (Thm 10.3 in [16])

For a Gaussian vector source, we have

R(D,)=12=1Llogλγ(D,).𝑅𝐷12superscriptsubscript1𝐿subscript𝜆subscriptsuperscript𝛾𝐷\displaystyle R(D,\infty)=\frac{1}{2}\sum_{\ell=1}^{L}\log\frac{\lambda_{\ell}% }{\gamma^{*}_{\ell}(D,\infty)}.italic_R ( italic_D , ∞ ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , ∞ ) end_ARG . (21)

To simplify notation, we can redefine the water-level as γ(D,)superscriptsubscript𝛾𝐷\gamma_{\ell}^{*}(D,\infty)italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , ∞ ) in order to account for the components whose variances are below the water-level. If λsubscript𝜆\lambda_{\ell}italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT is below ν(D)𝜈𝐷\nu(D)italic_ν ( italic_D ) for some \ellroman_ℓ, then we set γ(D,)=λsuperscriptsubscript𝛾𝐷subscript𝜆\gamma_{\ell}^{*}(D,\infty)=\lambda_{\ell}italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , ∞ ) = italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT and assign zero rate to this component. Two special cases of the above theorem are of special interest.

Proposition 1 (High-Distortion Compression)

In the high-distortion regime, we have that for sufficiently small ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0

R(=1Lλϵ,)=ϵ2λmax+O(ϵ2),𝑅superscriptsubscript1𝐿subscript𝜆italic-ϵitalic-ϵ2superscript𝜆𝑂superscriptitalic-ϵ2\displaystyle R\left(\sum_{\ell=1}^{L}\lambda_{\ell}-\epsilon,\infty\right)=% \frac{\epsilon}{2\lambda^{\max}}+O(\epsilon^{2}),italic_R ( ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_ϵ , ∞ ) = divide start_ARG italic_ϵ end_ARG start_ARG 2 italic_λ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT end_ARG + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (22)

where λmax=maxλsuperscript𝜆subscriptsubscript𝜆\lambda^{\max}=\max_{\ell}\lambda_{\ell}italic_λ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT = roman_max start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT. Let Lmaxsuperscript𝐿L^{\max}italic_L start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT denote the set of indices where their corresponding eigenvalues are equal to λmaxsuperscript𝜆\lambda^{\max}italic_λ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT. Then, the water-levels are given by

γ(=1Lλϵ,)superscriptsubscript𝛾superscriptsubscript1𝐿subscript𝜆italic-ϵ\displaystyle\gamma_{\ell}^{*}\left(\sum_{\ell=1}^{L}\lambda_{\ell}-\epsilon,% \infty\right)italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_ϵ , ∞ ) =\displaystyle== λ,{1,,L}\Lmax,subscript𝜆for-all\1𝐿superscript𝐿\displaystyle\lambda_{\ell},\qquad\forall\ell\in\{1,\ldots,L\}\backslash L^{% \max},italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , ∀ roman_ℓ ∈ { 1 , … , italic_L } \ italic_L start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT , (23a)
γmax(=1Lλϵ,)subscriptsuperscript𝛾superscriptsuperscriptsubscript1𝐿subscript𝜆italic-ϵ\displaystyle\gamma^{*}_{\ell^{\max}}\left(\sum_{\ell=1}^{L}\lambda_{\ell}-% \epsilon,\infty\right)italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_ϵ , ∞ ) =\displaystyle== λmaxϵ|Lmax|,maxLmax.superscript𝜆italic-ϵsuperscript𝐿for-allsuperscriptsuperscript𝐿\displaystyle\lambda^{\max}-\frac{\epsilon}{|L^{\max}|},\;\;\forall\ell^{\max}% \in L^{\max}.italic_λ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT - divide start_ARG italic_ϵ end_ARG start_ARG | italic_L start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT | end_ARG , ∀ roman_ℓ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT ∈ italic_L start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT . (23b)
Proof:

See Appendix A-1. ∎

The above proposition states that in the high-distortion compression, a positive rate is only assigned to the components with the largest eigenvalue.

Proposition 2 (Low-Distortion Compression)

In the low-distortion regime, we have that for a sufficiently small ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0

R(ϵ,)=12=1LlogLλϵ,𝑅italic-ϵ12superscriptsubscript1𝐿𝐿subscript𝜆italic-ϵ\displaystyle R(\epsilon,\infty)=\frac{1}{2}\sum_{\ell=1}^{L}\log\frac{L% \lambda_{\ell}}{\epsilon},italic_R ( italic_ϵ , ∞ ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log divide start_ARG italic_L italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_ϵ end_ARG , (24)

where the water-levels are given by

γ(ϵ,)=ϵL,{1,,L}.formulae-sequencesuperscriptsubscript𝛾italic-ϵitalic-ϵ𝐿for-all1𝐿\displaystyle\gamma_{\ell}^{*}(\epsilon,\infty)=\frac{\epsilon}{L},\qquad% \forall\ell\in\{1,\ldots,L\}.italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_ϵ , ∞ ) = divide start_ARG italic_ϵ end_ARG start_ARG italic_L end_ARG , ∀ roman_ℓ ∈ { 1 , … , italic_L } . (25)
Proof:

See Appendix A-2. ∎

For low-distortion compression, according to the above proposition, the same water-level is assigned to all components.

IV Rate-Distortion-Perception Function

IV-A Optimality of Gaussian Reconstruction

We first present a result indicating that for the two perception metrics (15) and (16) considered in this paper and for a Gaussian vector source, jointly Gaussian reconstruction is optimal.

Theorem 2

For a zero-mean Gaussian source X𝑋Xitalic_X, if the perception metric is either the KL-divergence or the Wasserstein-2 distance, without loss of optimality, in the optimization problem (12), we can restrict the reconstruction X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG to have mean zero and be jointly Gaussian with X𝑋Xitalic_X.

Proof:

See Appendix B. ∎

A common property of the two perception metrics that enables the above theorem to hold is that if the source is Gaussian distributed, conditional Gaussian reconstruction minimizes both metrics among those with the same first- and second-order joint statistics. Theorem 2 implies that the optimization of RDP function can be restricted to jointly Gaussian distributions that satisfy the distortion and perception constraints.

IV-B RDP Function with KL Divergence as Perception Metric

In this section, we present the RDP function with the KL-divergence as the perception metric, i.e., ϕ(PX,PX^)=D(PX^PX)italic-ϕsubscript𝑃𝑋subscript𝑃^𝑋𝐷conditionalsubscript𝑃^𝑋subscript𝑃𝑋\phi(P_{X},P_{\hat{X}})=D(P_{\hat{X}}\|P_{X})italic_ϕ ( italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ) = italic_D ( italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ∥ italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ). The results for the Wasserstein-2 distance as the perception metric is stated in the subsequent section. We present both one-shot and asymptotic RDP functions. As already mentioned, the one-shot RDP function Ro(D,P)superscript𝑅𝑜𝐷𝑃R^{o}(D,P)italic_R start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ( italic_D , italic_P ) is close to the information RDP function R(D,P)𝑅𝐷𝑃R(D,P)italic_R ( italic_D , italic_P ) at high rate. Here we provide explicit constructions of both one-shot and asymptotic coding strategies for achieving (close to) R(D,P)𝑅𝐷𝑃R(D,P)italic_R ( italic_D , italic_P ).

The first step is to decompose the source using eigenvalue decomposition as in (1) and define

Z=ΘX.𝑍Θ𝑋\displaystyle Z=\Theta X.italic_Z = roman_Θ italic_X . (26)

The main idea is to construct a new Gaussian random vector Z^^𝑍\hat{Z}over^ start_ARG italic_Z end_ARG and to use the channel simulation result of [8] to communicate Z^^𝑍\hat{Z}over^ start_ARG italic_Z end_ARG to the decoder at a rate of R𝑅Ritalic_R. The new random vector Z^^𝑍\hat{Z}over^ start_ARG italic_Z end_ARG is designed to be correlated with Z𝑍Zitalic_Z in a very specific way in order to satisfy the distortion and perception constraints D𝐷Ditalic_D and P𝑃Pitalic_P, respectively. The correlation between Z𝑍Zitalic_Z and Z^^𝑍\hat{Z}over^ start_ARG italic_Z end_ARG is controlled by two sets of parameters, {γ}=1Lsuperscriptsubscriptsubscript𝛾1𝐿\{\gamma_{\ell}\}_{\ell=1}^{L}{ italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT and {λ^}=1Lsuperscriptsubscriptsubscript^𝜆1𝐿\{\hat{\lambda}_{\ell}\}_{\ell=1}^{L}{ over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT, such that 0<γλ0subscript𝛾subscript𝜆0<\gamma_{\ell}\leq\lambda_{\ell}0 < italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ≤ italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT and 0<λ^λ0subscript^𝜆subscript𝜆0<\hat{\lambda}_{\ell}\leq\lambda_{\ell}0 < over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ≤ italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT. The optimal values of these parameters will be determined later.

In effect, instead of the classical rate-distortion setting where Z^^𝑍\hat{Z}over^ start_ARG italic_Z end_ARG is chosen to minimize the rate subject to the distortion constraint, here we choose Z^^𝑍\hat{Z}over^ start_ARG italic_Z end_ARG to satisfy both distortion and perception constraints. We construct this noisy version of Z𝑍Zitalic_Z at the decoder by taking advantage of the availability of common randomness.

Specifically, Z^^𝑍\hat{Z}over^ start_ARG italic_Z end_ARG is a zero-mean random vector with a joint Gaussian distribution with Z𝑍Zitalic_Z such that (Z,Z^)subscript𝑍subscript^𝑍(Z_{\ell},\hat{Z}_{\ell})( italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) for different {1,,L}1𝐿\ell\in\{1,\ldots,L\}roman_ℓ ∈ { 1 , … , italic_L }, are mutually independent and

cov(Z,Z^)=[λλ^(λγ)λ^(λγ)λ^].covsubscript𝑍subscript^𝑍delimited-[]subscript𝜆subscript^𝜆subscript𝜆subscript𝛾subscript^𝜆subscript𝜆subscript𝛾subscript^𝜆\mathrm{cov}(Z_{\ell},\hat{Z}_{\ell})=\left[\begin{array}[]{cc}\lambda_{\ell}&% \sqrt{\hat{\lambda}_{\ell}(\lambda_{\ell}-\gamma_{\ell})}\\ \sqrt{\hat{\lambda}_{\ell}(\lambda_{\ell}-\gamma_{\ell})}&\hat{\lambda}_{\ell}% \end{array}\right].roman_cov ( italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) = [ start_ARRAY start_ROW start_CELL italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_CELL start_CELL square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_ARG end_CELL end_ROW start_ROW start_CELL square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_ARG end_CELL start_CELL over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] . (27)

With the above covariance structure, we can verify that γsubscript𝛾\gamma_{\ell}italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT is the minimum mean-squared error (MMSE) of estimating Zsubscript𝑍Z_{\ell}italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT based on Z^subscript^𝑍\hat{Z}_{\ell}over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, i.e.,

γ=𝔼[(Z𝔼[Z|Z^])2].subscript𝛾𝔼delimited-[]superscriptsubscript𝑍𝔼delimited-[]conditionalsubscript𝑍subscript^𝑍2\displaystyle\gamma_{\ell}=\mathbbm{E}[(Z_{\ell}-\mathbbm{E}[Z_{\ell}|\hat{Z}_% {\ell}])^{2}].italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = blackboard_E [ ( italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - blackboard_E [ italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] . (28)

Now, to derive the one-shot RDP function Ro(D,P)superscript𝑅𝑜𝐷𝑃R^{o}(D,P)italic_R start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ( italic_D , italic_P ), we can make use a consequence of the SFRL [8, Theorem 1] to show that when common randomness K𝐾Kitalic_K is available at both the encoder and decoder, there exists a channel simulation scheme that allows Z^subscript^𝑍\hat{Z}_{\ell}over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT to be reconstructed at the decoder at a communication rate of

I(Z;Z^)+log(I(Z;Z^)+1)+5.𝐼subscript𝑍subscript^𝑍𝐼subscript𝑍subscript^𝑍15\displaystyle I(Z_{\ell};\hat{Z}_{\ell})+\log(I(Z_{\ell};\hat{Z}_{\ell})+1)+5.italic_I ( italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ; over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) + roman_log ( italic_I ( italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ; over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) + 1 ) + 5 . (29)

After the reconstruction of Z^subscript^𝑍\hat{Z}_{\ell}over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT at the decoder, we use the same unitary matrix to transform it into X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG, i.e.,

X^=ΘTZ^.^𝑋superscriptΘ𝑇^𝑍\displaystyle\hat{X}=\Theta^{T}\hat{Z}.over^ start_ARG italic_X end_ARG = roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG italic_Z end_ARG . (30)

The above scheme leads to the one-shot rate, distortion, and perception loss for the \ellroman_ℓ-th component of Z𝑍Zitalic_Z as functions of λsubscript𝜆\lambda_{\ell}italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, λ^subscript^𝜆\hat{\lambda}_{\ell}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT and γsubscript𝛾\gamma_{\ell}italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT as follows:

R(γ)subscript𝑅subscript𝛾\displaystyle{R}_{\ell}(\gamma_{\ell})italic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) =\displaystyle== 12log(λγ)+log(12log(λγ)+1)+5,12subscript𝜆subscript𝛾12subscript𝜆subscript𝛾15\displaystyle\frac{1}{2}\log\left(\frac{\lambda_{\ell}}{\gamma_{\ell}}\right)+% \log\left(\frac{1}{2}\log\left(\frac{\lambda_{\ell}}{\gamma_{\ell}}\right)+1% \right)+5,divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log ( divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) + roman_log ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log ( divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) + 1 ) + 5 , (31)
D(γ,λ^)subscript𝐷subscript𝛾subscript^𝜆\displaystyle{D}_{\ell}(\gamma_{\ell},\hat{\lambda}_{\ell})italic_D start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) =\displaystyle== λ2λ^(λγ)+λ^,subscript𝜆2subscript^𝜆subscript𝜆subscript𝛾subscript^𝜆\displaystyle\lambda_{\ell}-2\sqrt{\hat{\lambda}_{\ell}(\lambda_{\ell}-\gamma_% {\ell})}+\hat{\lambda}_{\ell},italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - 2 square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , (32)
P(λ^)subscript𝑃subscript^𝜆\displaystyle{P}_{\ell}(\hat{\lambda}_{\ell})italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) =\displaystyle== 12(λ^λ1+logλλ^).12subscript^𝜆subscript𝜆1subscript𝜆subscript^𝜆\displaystyle\frac{1}{2}\left(\frac{\hat{\lambda}_{\ell}}{\lambda_{\ell}}-1+% \log\frac{\lambda_{\ell}}{\hat{\lambda}_{\ell}}\right).divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - 1 + roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) . (33)

This allows a characterization of an achievable one-shot RDP function of a Gaussian vector source as an optimization problem over λ^subscript^𝜆\hat{\lambda}_{\ell}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT and γsubscript𝛾\gamma_{\ell}italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT across its components.

For the asymptotic setting, the achievable scheme is identical, except that we compress a block of n𝑛nitalic_n samples together. As n𝑛n\rightarrow\inftyitalic_n → ∞, the logarithm and the constant terms in  (31) can be neglected. This leads to an upper bound for R(D,P)superscript𝑅𝐷𝑃R^{\infty}(D,P)italic_R start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( italic_D , italic_P ), which is equal to R(D,P)𝑅𝐷𝑃R(D,P)italic_R ( italic_D , italic_P ). This upper bound turns out to be tight, i.e., a converse can be proved. This gives the following characterization of R(D,P)𝑅𝐷𝑃R(D,P)italic_R ( italic_D , italic_P ).

Theorem 3

The rate-distortion-perception function R(D,P)𝑅𝐷𝑃R(D,P)italic_R ( italic_D , italic_P ) for a Gaussian vector source with parameters defined by (1) and (2), and with KL-divergence as the perception metric, is given by the solution to the following optimization problem:

R(D,P)=𝑅𝐷𝑃absent\displaystyle R(D,P)=italic_R ( italic_D , italic_P ) = min{λ^,γ}=1Lsubscriptsuperscriptsubscriptsubscript^𝜆subscript𝛾1𝐿\displaystyle\min_{\{\hat{\lambda}_{\ell},\gamma_{{\ell}}\}_{\ell=1}^{L}}roman_min start_POSTSUBSCRIPT { over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT end_POSTSUBSCRIPT 12=1Llogλγ12superscriptsubscript1𝐿subscript𝜆subscript𝛾\displaystyle\frac{1}{2}\sum_{\ell=1}^{L}\log\frac{\lambda_{\ell}}{\gamma_{{% \ell}}}divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG (34a)
s.t. 0<γλ,0subscript𝛾subscript𝜆\displaystyle 0<\gamma_{\ell}\leq\lambda_{\ell},0 < italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ≤ italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , (34e)
0λ^,0subscript^𝜆\displaystyle 0\leq\hat{\lambda}_{\ell},0 ≤ over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ,
=1LD(γ,λ^)D,superscriptsubscript1𝐿subscript𝐷subscript𝛾subscript^𝜆𝐷\displaystyle\sum_{\ell=1}^{L}{D}_{\ell}(\gamma_{\ell},\hat{\lambda}_{\ell})% \leq D,∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) ≤ italic_D ,
=1LP(λ^)P.superscriptsubscript1𝐿subscript𝑃subscript^𝜆𝑃\displaystyle\sum_{\ell=1}^{L}{P}_{\ell}(\hat{\lambda}_{\ell})\leq P.∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) ≤ italic_P .
Proof:

See Appendix C. ∎

An interpretation of the above is as follows. For a given (D,P)𝐷𝑃(D,P)( italic_D , italic_P ), let γ(D,P)superscriptsubscript𝛾𝐷𝑃\gamma_{\ell}^{*}(D,P)italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) and λ^(D,P)subscriptsuperscript^𝜆𝐷𝑃\hat{\lambda}^{*}_{\ell}(D,P)over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ), {1,,L}1𝐿\ell\in\{1,\ldots,L\}roman_ℓ ∈ { 1 , … , italic_L }, be the optimal solution to (34). Comparing this with (21), it can be seen that γ(D,P)subscriptsuperscript𝛾𝐷𝑃\gamma^{*}_{\ell}(D,P)italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) can be interpreted as the water-level for the \ellroman_ℓ-th component, which determines the rate allocated to that component according to (34a); see Fig. 1(b).

IV-C Generalized Water-filling with KL Divergence as Perception Metric

We now proceed to analyze the solution to the optimization program in Theorem 3. It can be shown that the optimization problem (34) is convex. Let ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, {ξ}=1Lsuperscriptsubscriptsubscript𝜉1𝐿\{\xi_{\ell}\}_{\ell=1}^{L}{ italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT, {η}=1Lsuperscriptsubscriptsubscript𝜂1𝐿\{\eta_{\ell}\}_{\ell=1}^{L}{ italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT be nonnegative Lagrange multipliers. For {1,,L}1𝐿\ell\in\{1,\ldots,L\}roman_ℓ ∈ { 1 , … , italic_L }, we have the first-order conditions:

12γ(D,P)ν1λ^(D,P)λγ(D,P)ξ=0,12subscriptsuperscript𝛾𝐷𝑃subscript𝜈1subscriptsuperscript^𝜆𝐷𝑃subscript𝜆subscriptsuperscript𝛾𝐷𝑃subscript𝜉0\frac{1}{2\gamma^{*}_{{\ell}}(D,P)}-\nu_{1}\sqrt{\frac{\hat{\lambda}^{*}_{{% \ell}}(D,P)}{\lambda_{{\ell}}-\gamma^{*}_{{\ell}}(D,P)}}-\xi_{\ell}=0,divide start_ARG 1 end_ARG start_ARG 2 italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) end_ARG - italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT square-root start_ARG divide start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) end_ARG end_ARG - italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = 0 , (35)

and

ν1(λγ(D,P)λ^(D,P)+1)+ν22(1λ1λ^(D,P))η=0.subscript𝜈1subscript𝜆subscriptsuperscript𝛾𝐷𝑃subscriptsuperscript^𝜆𝐷𝑃1subscript𝜈221subscript𝜆1subscriptsuperscript^𝜆𝐷𝑃subscript𝜂0\nu_{1}\left(-\sqrt{\frac{\lambda_{{\ell}}-\gamma^{*}_{{\ell}}(D,P)}{\hat{% \lambda}^{*}_{{\ell}}(D,P)}}+1\right)+\frac{\nu_{2}}{2}\left(\frac{1}{\lambda_% {{\ell}}}-\frac{1}{\hat{\lambda}^{*}_{{\ell}}(D,P)}\right)-\eta_{\ell}=0.italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( - square-root start_ARG divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) end_ARG end_ARG + 1 ) + divide start_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ( divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) end_ARG ) - italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = 0 . (36)

We first focus on the most interesting regime where the distortion and the perception constraints are both active so ν1,ν2>0subscript𝜈1subscript𝜈20\nu_{1},\nu_{2}>0italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0, and γ<λsubscript𝛾subscript𝜆\gamma_{\ell}<\lambda_{\ell}italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT < italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, λ^>0subscript^𝜆0\hat{\lambda}_{\ell}>0over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT > 0 so that ξ=η=0subscript𝜉subscript𝜂0\xi_{\ell}=\eta_{\ell}=0italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = 0 for all {1,,L}1𝐿\ell\in\{1,\ldots,L\}roman_ℓ ∈ { 1 , … , italic_L }. In this case, (35) implies that λ^(D,P)subscriptsuperscript^𝜆𝐷𝑃\hat{\lambda}^{*}_{\ell}(D,P)over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) can be expressed as

λ^(D,P)=λγ(D,P)4γ2(D,P)ν12.superscriptsubscript^𝜆𝐷𝑃subscript𝜆subscriptsuperscript𝛾𝐷𝑃4subscriptsuperscript𝛾absent2𝐷𝑃superscriptsubscript𝜈12\displaystyle\hat{\lambda}_{\ell}^{*}(D,P)=\frac{\lambda_{\ell}-\gamma^{*}_{% \ell}(D,P)}{4\gamma^{*2}_{\ell}(D,P)\nu_{1}^{2}}.over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) = divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) end_ARG start_ARG 4 italic_γ start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (37)

Together with (36), this means that γ(D,P)superscriptsubscript𝛾𝐷𝑃\gamma_{\ell}^{*}(D,P)italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) is the positive solution to the following equation

ν1(12ν1γ(D,P))=12ν2(4γ2(D,P)ν12λγ(D,P)1λ),subscript𝜈112subscript𝜈1subscriptsuperscript𝛾𝐷𝑃12subscript𝜈24subscriptsuperscript𝛾absent2𝐷𝑃superscriptsubscript𝜈12subscript𝜆subscriptsuperscript𝛾𝐷𝑃1subscript𝜆\displaystyle\nu_{1}(1-2\nu_{1}\gamma^{*}_{\ell}(D,P))=\frac{1}{2}\nu_{2}\left% (\frac{4\gamma^{*2}_{\ell}(D,P)\nu_{1}^{2}}{\lambda_{\ell}-\gamma^{*}_{\ell}(D% ,P)}-\frac{1}{\lambda_{\ell}}\right),italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 - 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( divide start_ARG 4 italic_γ start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) end_ARG - divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) , (38)

which is quadratic in γ(D,P)subscriptsuperscript𝛾𝐷𝑃\gamma^{*}_{\ell}(D,P)italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) and can be solved analytically as follows:

γ(D,P)subscriptsuperscript𝛾𝐷𝑃\displaystyle\gamma^{*}_{\ell}(D,P)italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) =\displaystyle== 2λν1(1+2λν1)ν2+(ν2+2λν1+4λ2ν12)2+16λ2ν12(ν2+2λν1)(ν21)8λν12(1+ν2).2subscript𝜆subscript𝜈112subscript𝜆subscript𝜈1subscript𝜈2superscriptsubscript𝜈22subscript𝜆subscript𝜈14superscriptsubscript𝜆2superscriptsubscript𝜈12216superscriptsubscript𝜆2superscriptsubscript𝜈12subscript𝜈22subscript𝜆subscript𝜈1subscript𝜈218subscript𝜆superscriptsubscript𝜈121subscript𝜈2\displaystyle\frac{-2\lambda_{\ell}\nu_{1}(1+2\lambda_{\ell}\nu_{1})-\nu_{2}+% \sqrt{(\nu_{2}+2\lambda_{\ell}\nu_{1}+4\lambda_{\ell}^{2}\nu_{1}^{2})^{2}+16% \lambda_{\ell}^{2}\nu_{1}^{2}(\nu_{2}+2\lambda_{\ell}\nu_{1})(\nu_{2}-1)}}{8% \lambda_{\ell}\nu_{1}^{2}(-1+\nu_{2})}.divide start_ARG - 2 italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 + 2 italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + square-root start_ARG ( italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 2 italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 4 italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 16 italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 2 italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - 1 ) end_ARG end_ARG start_ARG 8 italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( - 1 + italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG .

There is an alternative expression for γ(D,P)subscriptsuperscript𝛾𝐷𝑃\gamma^{*}_{\ell}(D,P)italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) in term of λ^(D,P)superscriptsubscript^𝜆𝐷𝑃\hat{\lambda}_{\ell}^{*}(D,P)over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) that can be obtained by solving (37) as a quadratic equation in γ(D,P)subscriptsuperscript𝛾𝐷𝑃\gamma^{*}_{\ell}(D,P)italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) as below:

γ(D,P)=2λ1+1+16λλ^(D,P)ν12.subscriptsuperscript𝛾𝐷𝑃2subscript𝜆1116subscript𝜆superscriptsubscript^𝜆𝐷𝑃superscriptsubscript𝜈12\gamma^{*}_{\ell}(D,P)=\frac{2\lambda_{\ell}}{1+\sqrt{1+16\lambda_{\ell}\hat{% \lambda}_{\ell}^{*}(D,P)\nu_{1}^{2}}}.italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) = divide start_ARG 2 italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG 1 + square-root start_ARG 1 + 16 italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG . (40)

This expression is useful later in Corollary 1.

The expressions (IV-C) and (37) give us the following generalized reverse water-filling interpretation of the optimal RDP solution. At given distortion constraint D𝐷Ditalic_D and perception constraint P𝑃Pitalic_P, each component of the source with variance λsubscript𝜆\lambda_{\ell}italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT is reconstructed by Z^subscript^𝑍\hat{Z}_{\ell}over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT having a variance λ^(D,P)subscriptsuperscript^𝜆𝐷𝑃\hat{\lambda}^{*}_{\ell}(D,P)over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ). Because γ(D,P)subscriptsuperscript𝛾𝐷𝑃\gamma^{*}_{\ell}(D,P)italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) is the variance of the MMSE estimate of Zsubscript𝑍Z_{\ell}italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT given Z^subscript^𝑍\hat{Z}_{\ell}over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, this requires a rate of 12log(λγ(D,P))12subscript𝜆subscriptsuperscript𝛾𝐷𝑃\frac{1}{2}\log\left(\frac{\lambda_{\ell}}{\gamma^{*}_{\ell}(D,P)}\right)divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log ( divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) end_ARG ). The parameters λ^(D,P)subscriptsuperscript^𝜆𝐷𝑃\hat{\lambda}^{*}_{\ell}(D,P)over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) and γ(D,P)subscriptsuperscript𝛾𝐷𝑃\gamma^{*}_{\ell}(D,P)italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) are chosen to satisfy the distortion and perception constraints. As already mentioned, γ(D,P)subscriptsuperscript𝛾𝐷𝑃\gamma^{*}_{\ell}(D,P)italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) can be thought of as the water-level, cf. (21).

When both the distortion and the perception constraints are active, i.e., ν1,ν2>0subscript𝜈1subscript𝜈20\nu_{1},\nu_{2}>0italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0, it is possible to prove (as shown in the theorem below) that

γ(D,P)<λ,{1,,L},formulae-sequencesubscriptsuperscript𝛾𝐷𝑃subscript𝜆for-all1𝐿\gamma^{*}_{\ell}(D,P)<\lambda_{\ell},\quad\forall\ell\in\{1,\cdots,L\},italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) < italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , ∀ roman_ℓ ∈ { 1 , ⋯ , italic_L } , (41)

so every component of the source is always allocated a non-zero rate regardless of the distortion constraint—unlike the traditional reverse water-filling solution, where a component may be allocated zero rate if its variance is below the water-level. Moreover, in contrast to the traditional reverse water-filling, the distortion of each component (i.e., D(γ(D,P),λ^(D,P))subscript𝐷superscriptsubscript𝛾𝐷𝑃superscriptsubscript^𝜆𝐷𝑃D_{\ell}(\gamma_{\ell}^{*}(D,P),\hat{\lambda}_{\ell}^{*}(D,P))italic_D start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) )) may not be the same across the different components. So, an unequal-distortion allocation may be optimal when both perception and distortion constraints are active.

It is also possible that either the distortion or the perception constraint is not active. If the distortion constraint is active while the perception constraint is inactive, i.e., ν1>0subscript𝜈10\nu_{1}>0italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0 and ν2=0subscript𝜈20\nu_{2}=0italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0, and η=η=0subscript𝜂subscriptsuperscript𝜂0\eta_{\ell}=\eta^{\prime}_{\ell}=0italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = italic_η start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = 0 for all {1,,L}1𝐿\ell\in\{1,\ldots,L\}roman_ℓ ∈ { 1 , … , italic_L }, then (35) and (36) yield the traditional reverse water-filling solution. Specifically, the water-level is given by min{12ν1,λ}12subscript𝜈1subscript𝜆\min\{\frac{1}{2\nu_{1}},\lambda_{\ell}\}roman_min { divide start_ARG 1 end_ARG start_ARG 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG , italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } where ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT satisfies the following:

=1L[λ12ν1]+=[=1LλD]+.superscriptsubscript1𝐿superscriptdelimited-[]subscript𝜆12subscript𝜈1superscriptdelimited-[]superscriptsubscript1𝐿subscript𝜆𝐷\displaystyle\sum_{\ell=1}^{L}\left[\lambda_{\ell}-\frac{1}{2\nu_{1}}\right]^{% +}=\left[\sum_{\ell=1}^{L}\lambda_{\ell}-D\right]^{+}.∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT [ italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ] start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT = [ ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_D ] start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT . (42)

By redefining 12ν112subscript𝜈1\frac{1}{2\nu_{1}}divide start_ARG 1 end_ARG start_ARG 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG as ν(D)𝜈𝐷\nu(D)italic_ν ( italic_D ), we see that the above expression is the same as (17).

If the distortion constraint is inactive, i.e., ν1=0subscript𝜈10\nu_{1}=0italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0, based on (35), we have ξ>0subscript𝜉0\xi_{\ell}>0italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT > 0 which yields

γ(D,P)=λ,{1,,L}.formulae-sequencesubscriptsuperscript𝛾𝐷𝑃subscript𝜆for-all1𝐿\gamma^{*}_{\ell}(D,P)=\lambda_{\ell},\qquad\forall\ell\in\{1,\ldots,L\}.italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) = italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , ∀ roman_ℓ ∈ { 1 , … , italic_L } . (43)

This implies that every component of the source is assigned a zero rate if the distortion constraint is not active. The decoder simply generates the reconstruction independent of the source using a distribution that satisfies the perception constraint. Such a distribution may not be unique, as shown in the theorem below.

An interesting observation is that based on (41) and (43), we see that when the perception constraint is active, it is either that all the components are allocated positive rate, or that all the components are allocated zero rate. This means that the situation in the traditional reverse water-filling, where some of the water-levels are below the eigenvalues while others are equal to the eigenvalues, cannot happen, when the perception constraint is active.

The above discussion is summarized in the following.

Theorem 4

Let (D,P)𝐷𝑃(D,P)( italic_D , italic_P ) be a given distortion and perception constraints that are strictly feasible. The optimal solution of (34) with KL divergence as the perception metric is given as follows:

  1. 1.

    If both the distortion and perception constraints are active222A constraint of a minimization problem is said to be inactive if the optimization problem with the same objective function but with the said constraint removed (while kee** all the other constraints) has at least one optimal solution that already satisfies all the original constraints., then there exist ν1,ν2>0subscript𝜈1subscript𝜈20\nu_{1},\nu_{2}>0italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0 such that γ(D,P)subscriptsuperscript𝛾𝐷𝑃\gamma^{*}_{\ell}(D,P)italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) is as expressed in (IV-C) and λ^(D,P)subscriptsuperscript^𝜆𝐷𝑃\hat{\lambda}^{*}_{\ell}(D,P)over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) is as expressed in (37). Here, ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are chosen such that

    =1LD(γ(D,P),λ^(D,P))superscriptsubscript1𝐿subscript𝐷subscriptsuperscript𝛾𝐷𝑃superscriptsubscript^𝜆𝐷𝑃\displaystyle\sum_{\ell=1}^{L}D_{\ell}(\gamma^{*}_{\ell}(D,P),\hat{\lambda}_{% \ell}^{*}(D,P))∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) ) =\displaystyle== D,𝐷\displaystyle D,italic_D , (44)
    =1LP(λ^(D,P))superscriptsubscript1𝐿subscript𝑃superscriptsubscript^𝜆𝐷𝑃\displaystyle\sum_{\ell=1}^{L}P_{\ell}(\hat{\lambda}_{\ell}^{*}(D,P))∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) ) =\displaystyle== P.𝑃\displaystyle P.italic_P . (45)

    In this case, every component has a positive rate.

  2. 2.

    If the distortion constraint is active but the perception constraint is inactive, then there exists ν1>0subscript𝜈10\nu_{1}>0italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0 such that γ(D,P)=min{12ν1,λ}subscriptsuperscript𝛾𝐷𝑃12subscript𝜈1subscript𝜆\gamma^{*}_{\ell}(D,P)=\min\{\frac{1}{2\nu_{1}},\lambda_{\ell}\}italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) = roman_min { divide start_ARG 1 end_ARG start_ARG 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG , italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT }, λ^(D,P)=λmin{12ν1,λ}subscriptsuperscript^𝜆𝐷𝑃subscript𝜆12subscript𝜈1subscript𝜆\hat{\lambda}^{*}_{\ell}(D,P)=\lambda_{\ell}-\min\{\frac{1}{2\nu_{1}},\lambda_% {\ell}\}over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) = italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - roman_min { divide start_ARG 1 end_ARG start_ARG 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG , italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } and

    =1L[λ12ν1]+=[=1LλD]+.superscriptsubscript1𝐿superscriptdelimited-[]subscript𝜆12subscript𝜈1superscriptdelimited-[]superscriptsubscript1𝐿subscript𝜆𝐷\displaystyle\sum_{\ell=1}^{L}\left[\lambda_{\ell}-\frac{1}{2\nu_{1}}\right]^{% +}=\left[\sum_{\ell=1}^{L}\lambda_{\ell}-D\right]^{+}.∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT [ italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ] start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT = [ ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_D ] start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT . (46)

    In this case, some components may have zero rate.

  3. 3.

    If the distortion constraint is inactive, then γ(D,P)=λsubscriptsuperscript𝛾𝐷𝑃subscript𝜆\gamma^{*}_{\ell}(D,P)=\lambda_{\ell}italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) = italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, and λ^(D,P)subscriptsuperscript^𝜆𝐷𝑃\hat{\lambda}^{*}_{\ell}(D,P)over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) can be any value in the set

    {λ^|=1LP(λ^)P,=1Lλ+λ^D,λ^0}.conditional-setsubscript^𝜆formulae-sequencesuperscriptsubscript1𝐿subscript𝑃subscript^𝜆𝑃formulae-sequencesuperscriptsubscript1𝐿subscript𝜆subscript^𝜆𝐷subscript^𝜆0\left\{\hat{\lambda}_{\ell}\ \left|\ \sum_{\ell=1}^{L}{P}_{\ell}(\hat{\lambda}% _{\ell})\leq P,\ \ \sum_{\ell=1}^{L}\lambda_{\ell}+\hat{\lambda}_{\ell}\leq D,% \ \ \hat{\lambda}_{\ell}\geq 0\right.\right\}.{ over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) ≤ italic_P , ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ≤ italic_D , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ≥ 0 } . (47)

    In this case, every component has zero rate.

Proof:

See Appendix D. ∎

IV-D RDP Function and Generalized Reverse Water-filling with Wasserstein-2 Distance as Perception Metric

Next, consider the Wasserstein-2 distance as the perception metric, i.e., ϕ(PX,PX^)=W22(PX,\phi(P_{X},P_{\hat{X}})=W_{2}^{2}(P_{X},italic_ϕ ( italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ) = italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , PX^)P_{\hat{X}})italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ). To that end, we have the following definitions for distortion and perception loss functions. Let the distortion loss function of the \ellroman_ℓ-th component be as in (32). Replace the perception loss function in (33) by the following:

P(λ^)=(λλ^)2.subscript𝑃subscript^𝜆superscriptsubscript𝜆subscript^𝜆2\displaystyle{P}_{\ell}(\hat{\lambda}_{\ell})=\left(\sqrt{\lambda_{\ell}}-% \sqrt{\hat{\lambda}_{\ell}}\right)^{2}.italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) = ( square-root start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (48)

The following theorem characterizes the RDP function with Wasserstein-2 perception loss in terms of an optimization problem.

Theorem 5

The rate-distortion-perception function R(D,P)𝑅𝐷𝑃R(D,P)italic_R ( italic_D , italic_P ) with Wasserstein-2 distance as the perception metric is given by the optimization program in (34) with the perception loss function (33) replaced by (48).

Proof:

The proof is similar to that of Theorem 3 with some differences which are highlighted in Appendix E. ∎

Similar to the KL-divergence case, the optimization program for the Wasserstein-2 distance is convex. For {1,,L}1𝐿\ell\in\{1,\ldots,L\}roman_ℓ ∈ { 1 , … , italic_L }, we have the following first-order conditions:

12γ(D,P)ν1λ^(D,P)λγ(D,P)ξ=0,12superscriptsubscript𝛾𝐷𝑃subscript𝜈1subscriptsuperscript^𝜆𝐷𝑃subscript𝜆subscriptsuperscript𝛾𝐷𝑃subscript𝜉0\displaystyle\frac{1}{2\gamma_{\ell}^{*}(D,P)}-\nu_{1}\sqrt{\frac{\hat{\lambda% }^{*}_{\ell}(D,P)}{\lambda_{\ell}-\gamma^{*}_{\ell}(D,P)}}-\xi_{\ell}=0,divide start_ARG 1 end_ARG start_ARG 2 italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) end_ARG - italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT square-root start_ARG divide start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) end_ARG end_ARG - italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = 0 , (49)

and

ν1(λγ(D,P)λ^(D,P)+1)+ν2(1λλ^(D,P))+η=0.subscript𝜈1subscript𝜆subscriptsuperscript𝛾𝐷𝑃subscriptsuperscript^𝜆𝐷𝑃1subscript𝜈21subscript𝜆subscriptsuperscript^𝜆𝐷𝑃subscript𝜂0\displaystyle\nu_{1}\left(-\sqrt{\frac{\lambda_{{\ell}}-\gamma^{*}_{{\ell}}(D,% P)}{\hat{\lambda}^{*}_{{\ell}}(D,P)}}+1\right)+\nu_{2}\left(1-\sqrt{\frac{% \lambda_{\ell}}{\hat{\lambda}^{*}_{\ell}(D,P)}}\right)+\eta_{\ell}=0.italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( - square-root start_ARG divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) end_ARG end_ARG + 1 ) + italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 1 - square-root start_ARG divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) end_ARG end_ARG ) + italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = 0 . (50)

Consider the case where both distortion and perception constraints are active, i.e., ν1,ν2>0subscript𝜈1subscript𝜈20\nu_{1},\nu_{2}>0italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0 and ξ=η=0subscript𝜉subscript𝜂0\xi_{\ell}=\eta_{\ell}=0italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = 0 for all {1,,L}1𝐿\ell\in\{1,\ldots,L\}roman_ℓ ∈ { 1 , … , italic_L }. In this case, (49) and (50) yield the following solutions

γ(D,P)superscriptsubscript𝛾𝐷𝑃\displaystyle\gamma_{\ell}^{*}(D,P)italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) =\displaystyle== θ2ν1,subscript𝜃2subscript𝜈1\displaystyle\frac{\theta_{\ell}}{2\nu_{1}},divide start_ARG italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG , (51)
λ^(D,P)superscriptsubscript^𝜆𝐷𝑃\displaystyle\hat{\lambda}_{\ell}^{*}(D,P)over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) =\displaystyle== λ(1+(1θ)ν1ν2)2,subscript𝜆superscript11subscript𝜃subscript𝜈1subscript𝜈22\displaystyle\frac{\lambda_{\ell}}{\left(1+\frac{(1-\theta_{\ell})\nu_{1}}{\nu% _{2}}\right)^{2}},divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + divide start_ARG ( 1 - italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (52)

where θsubscript𝜃\theta_{\ell}italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT is defined to be the unique solution of the following equation:

θ1+(1θ)ν1ν2=1θ2ν1λ.subscript𝜃11subscript𝜃subscript𝜈1subscript𝜈21subscript𝜃2subscript𝜈1subscript𝜆\displaystyle\frac{\theta_{\ell}}{1+\frac{(1-\theta_{\ell})\nu_{1}}{\nu_{2}}}=% \sqrt{1-\frac{\theta_{\ell}}{2\nu_{1}\lambda_{\ell}}}.divide start_ARG italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG 1 + divide start_ARG ( 1 - italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG = square-root start_ARG 1 - divide start_ARG italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG end_ARG . (53)

As in the case of KL divergence, it is possible to prove that when both the distortion and the perception constraints are active we have γ(D,P)<λsubscriptsuperscript𝛾𝐷𝑃subscript𝜆\gamma^{*}_{\ell}(D,P)<\lambda_{\ell}italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) < italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT. Thus, every component is compressed at a positive rate.

When the distortion constraint is active but the perception constraint is not active, the problem reduces to traditional reverse water-filling. Finally, when the distortion constraint is not active, i.e., ν1=0subscript𝜈10\nu_{1}=0italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0, a zero rate is assigned to all components. This discussion is summarized in the following.

Theorem 6

Let (D,P)𝐷𝑃(D,P)( italic_D , italic_P ) be a given distortion and perception constraints that are strictly feasible. The optimal solution of (34) with the perception metric (33) replaced by (48) is given as follows:

  1. 1.

    If both the distortion and perception constraints are active, then there exist ν1,ν2>0subscript𝜈1subscript𝜈20\nu_{1},\nu_{2}>0italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0 such that γ(D,P)subscriptsuperscript𝛾𝐷𝑃\gamma^{*}_{\ell}(D,P)italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) is as expressed in (51) and λ^(D,P)subscriptsuperscript^𝜆𝐷𝑃\hat{\lambda}^{*}_{\ell}(D,P)over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) is as expressed in (52). Here, ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are chosen such that

    =1LD(γ(D,P),λ^(D,P))superscriptsubscript1𝐿subscript𝐷subscriptsuperscript𝛾𝐷𝑃superscriptsubscript^𝜆𝐷𝑃\displaystyle\sum_{\ell=1}^{L}D_{\ell}(\gamma^{*}_{\ell}(D,P),\hat{\lambda}_{% \ell}^{*}(D,P))∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) ) =\displaystyle== D,𝐷\displaystyle D,italic_D , (54)
    =1LP(λ^(D,P))superscriptsubscript1𝐿subscript𝑃superscriptsubscript^𝜆𝐷𝑃\displaystyle\sum_{\ell=1}^{L}P_{\ell}(\hat{\lambda}_{\ell}^{*}(D,P))∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) ) =\displaystyle== P.𝑃\displaystyle P.italic_P . (55)

    In this case, every component has a positive rate.

  2. 2.

    If the distortion constraint is active but the perception constraint is inactive, then there exists ν1>0subscript𝜈10\nu_{1}>0italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0 such that γ(D,P)=min{12ν1,λ}subscriptsuperscript𝛾𝐷𝑃12subscript𝜈1subscript𝜆\gamma^{*}_{\ell}(D,P)=\min\{\frac{1}{2\nu_{1}},\lambda_{\ell}\}italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) = roman_min { divide start_ARG 1 end_ARG start_ARG 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG , italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT }, λ^(D,P)=λmin{12ν1,λ}subscriptsuperscript^𝜆𝐷𝑃subscript𝜆12subscript𝜈1subscript𝜆\hat{\lambda}^{*}_{\ell}(D,P)=\lambda_{\ell}-\min\{\frac{1}{2\nu_{1}},\lambda_% {\ell}\}over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) = italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - roman_min { divide start_ARG 1 end_ARG start_ARG 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG , italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } and

    =1L[λ12ν1]+=[=1LλD]+.superscriptsubscript1𝐿superscriptdelimited-[]subscript𝜆12subscript𝜈1superscriptdelimited-[]superscriptsubscript1𝐿subscript𝜆𝐷\displaystyle\sum_{\ell=1}^{L}\left[\lambda_{\ell}-\frac{1}{2\nu_{1}}\right]^{% +}=\left[\sum_{\ell=1}^{L}\lambda_{\ell}-D\right]^{+}.∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT [ italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ] start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT = [ ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_D ] start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT . (56)

    In this case, some components may have zero rate.

  3. 3.

    If the distortion constraint is inactive, then γ(D,P)=λsubscriptsuperscript𝛾𝐷𝑃subscript𝜆\gamma^{*}_{\ell}(D,P)=\lambda_{\ell}italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) = italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, and λ^(D,P)subscriptsuperscript^𝜆𝐷𝑃\hat{\lambda}^{*}_{\ell}(D,P)over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , italic_P ) can be any value in the set

    {λ^|=1LP(λ^)P,=1Lλ+λ^D,λ^0}.conditional-setsubscript^𝜆formulae-sequencesuperscriptsubscript1𝐿subscript𝑃subscript^𝜆𝑃formulae-sequencesuperscriptsubscript1𝐿subscript𝜆subscript^𝜆𝐷subscript^𝜆0\left\{\hat{\lambda}_{\ell}\ \left|\ \sum_{\ell=1}^{L}{P}_{\ell}(\hat{\lambda}% _{\ell})\leq P,\ \ \sum_{\ell=1}^{L}\lambda_{\ell}+\hat{\lambda}_{\ell}\leq D,% \ \ \hat{\lambda}_{\ell}\geq 0\right.\right\}.{ over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) ≤ italic_P , ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ≤ italic_D , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ≥ 0 } . (57)

    In this case, every component has zero rate.

Proof:

See Appendix F. ∎

IV-E Perceptually Perfect Reconstruction

In this section, we focus on the special case of perfect perceptual quality, and study the properties of the RDP function with P=0𝑃0P=0italic_P = 0.

Refer to caption
Figure 2: Generalized reverse water-filling solution for the perceptually perfect reconstruction. The source is first compressed to a representation whose components have distortion levels γ(D,0)superscriptsubscript𝛾𝐷0\gamma_{\ell}^{*}(D,0)italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , 0 ), =1,,L1𝐿\ell=1,\cdots,Lroman_ℓ = 1 , ⋯ , italic_L. After compression, each component has a variance given by λγ(D,0)subscript𝜆subscriptsuperscript𝛾𝐷0\lambda_{\ell}-\gamma^{*}_{\ell}(D,0)italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , 0 ). Each component is then scaled to generate a reconstruction whose distribution matches that of the original source.
Corollary 1

The RDP function of a Gaussian vector source with P=0𝑃0P=0italic_P = 0 is

R(D,0)=12=1Llog1+1+16ν12λ22,𝑅𝐷012superscriptsubscript1𝐿1116superscriptsubscript𝜈12superscriptsubscript𝜆22\displaystyle R(D,0)=\frac{1}{2}\sum_{\ell=1}^{L}\log\frac{1+\sqrt{1+16\nu_{1}% ^{2}\lambda_{\ell}^{2}}}{2},italic_R ( italic_D , 0 ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log divide start_ARG 1 + square-root start_ARG 1 + 16 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG start_ARG 2 end_ARG , (58)

for some positive ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT that satisfies

D==1L[2λ2λ(λγ(D,0))],𝐷superscriptsubscript1𝐿delimited-[]2subscript𝜆2subscript𝜆subscript𝜆subscriptsuperscript𝛾𝐷0\displaystyle D=\sum_{\ell=1}^{L}\left[2\lambda_{\ell}-2\sqrt{\lambda_{\ell}% \left(\lambda_{\ell}-\gamma^{*}_{\ell}(D,0)\right)}\right],italic_D = ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT [ 2 italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - 2 square-root start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , 0 ) ) end_ARG ] , (59)

where

γ(D,0)=2λ1+1+16ν12λ2,{1,,L}.formulae-sequencesubscriptsuperscript𝛾𝐷02subscript𝜆1116superscriptsubscript𝜈12superscriptsubscript𝜆21𝐿\displaystyle\gamma^{*}_{\ell}(D,0)=\frac{2\lambda_{\ell}}{1+\sqrt{1+16\nu_{1}% ^{2}\lambda_{\ell}^{2}}},\qquad\ell\in\{1,\ldots,L\}.italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , 0 ) = divide start_ARG 2 italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG 1 + square-root start_ARG 1 + 16 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG , roman_ℓ ∈ { 1 , … , italic_L } . (60)
Proof:

See Appendix G. ∎

An interpretation of the optimal rate allocation in this P=0𝑃0P=0italic_P = 0 case is as follows. By (58), the optimal rate allocated to the \ellroman_ℓ-th component is controlled by the expression 1+1+16ν12λ221116superscriptsubscript𝜈12superscriptsubscript𝜆22\frac{1+\sqrt{1+16\nu_{1}^{2}\lambda_{\ell}^{2}}}{2}divide start_ARG 1 + square-root start_ARG 1 + 16 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG start_ARG 2 end_ARG. So, if a component has a larger variance, it is compressed at a higher rate. Further, by (60) it also has a higher water-level.

Under general perception and distortion constraints, the encoding and decoding strategy adopted in this paper (which involves constructing Z^subscript^𝑍\hat{Z}_{\ell}over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT as in (27)) can be thought of as first compressing each component of the source at an individual rate specified by the distortion level γ(D,P)superscriptsubscript𝛾𝐷𝑃\gamma_{\ell}^{*}(D,P)italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) based on the conventional rate-distortion tradeoff, then scaling the compressed source to a variance of λ^(D,P)superscriptsubscript^𝜆𝐷𝑃\hat{\lambda}_{\ell}^{*}(D,P)over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) to satisfy the perception constraint. For the perfect perception case with P=0𝑃0P=0italic_P = 0, the compression rate becomes (58) and the distortion level becomes (60); further, each component of the compressed signal is simply scaled to match the variance of the source in order to ensure zero perception loss. The distortion after scaling is given by (59). This is shown in Fig. 2.

We further note that at a fixed R𝑅Ritalic_R, the rate allocated to each component is in general different for different (D,P)𝐷𝑃(D,P)( italic_D , italic_P ) tradeoff points. Whereas for the scalar Gaussian source, a universal representation for different (D,P)𝐷𝑃(D,P)( italic_D , italic_P ) points at a fixed R𝑅Ritalic_R is possible via scaling [9], for the Gaussian vector source such universal representation does not exist, due to the different rate allocations in each component at different (D,P)𝐷𝑃(D,P)( italic_D , italic_P ) tradeoff points.

Next, we investigate the asymptotic behavior of the compression rate and the distortion level in the perfect perception case.

Proposition 3 (High-Distortion Compression)

In the high-distortion and perfect perception regime, we have that for sufficiently small ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0,

R(2=1Lλϵ,0)=ϵ28=1Lλ2+O(ϵ3),𝑅2superscriptsubscript1𝐿subscript𝜆italic-ϵ0superscriptitalic-ϵ28superscriptsubscript1𝐿superscriptsubscript𝜆2𝑂superscriptitalic-ϵ3\displaystyle R\left(2\sum_{\ell=1}^{L}\lambda_{\ell}-\epsilon,0\right)=\frac{% \epsilon^{2}}{8\sum_{\ell=1}^{L}\lambda_{\ell}^{2}}+O(\epsilon^{3}),italic_R ( 2 ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_ϵ , 0 ) = divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 8 ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) , (61)

where the water-levels are given by

γ(2=1Lλϵ,0)=λϵ2λ34(=1Lλ2)2+O(ϵ3),{1,,L}.formulae-sequencesubscriptsuperscript𝛾2superscriptsubscript1𝐿subscript𝜆italic-ϵ0subscript𝜆superscriptitalic-ϵ2superscriptsubscript𝜆34superscriptsuperscriptsubscript1𝐿superscriptsubscript𝜆22𝑂superscriptitalic-ϵ31𝐿\displaystyle\gamma^{*}_{\ell}\left(2\sum_{\ell=1}^{L}\lambda_{\ell}-\epsilon,% 0\right)=\lambda_{\ell}-\frac{\epsilon^{2}\lambda_{\ell}^{3}}{4\left(\sum_{% \ell=1}^{L}\lambda_{\ell}^{2}\right)^{2}}+O(\epsilon^{3}),\quad\ell\in\{1,% \ldots,L\}.italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( 2 ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_ϵ , 0 ) = italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG 4 ( ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) , roman_ℓ ∈ { 1 , … , italic_L } . (62)
Proof:

See Appendix H-1.∎

Here, we express R(D,0)𝑅𝐷0R(D,0)italic_R ( italic_D , 0 ) in term of deviation from the maximum distortion at perfect perception at zero rate. This maximum distortion can be shown to be 2=1Lλ2superscriptsubscript1𝐿subscript𝜆2\sum_{\ell=1}^{L}\lambda_{\ell}2 ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, which is twice of the total variance of the source [9], because at zero rate the decoder should simply generate an independent Gaussian random vector with the same covariance matrix. Comparing R(2=1Lλϵ,0)𝑅2superscriptsubscript1𝐿subscript𝜆italic-ϵ0R\left(2\sum_{\ell=1}^{L}\lambda_{\ell}-\epsilon,0\right)italic_R ( 2 ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_ϵ , 0 ) of Proposition 3 with R(=1Lλϵ,)𝑅superscriptsubscript1𝐿subscript𝜆italic-ϵR\left(\sum_{\ell=1}^{L}\lambda_{\ell}-\epsilon,\infty\right)italic_R ( ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_ϵ , ∞ ) in Proposition 1, it is interesting to see that the variances of the source enter R(2=1Lλϵ,0)𝑅2superscriptsubscript1𝐿subscript𝜆italic-ϵ0R\left(2\sum_{\ell=1}^{L}\lambda_{\ell}-\epsilon,0\right)italic_R ( 2 ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_ϵ , 0 ) as =1Lλ2superscriptsubscript1𝐿subscriptsuperscript𝜆2\sum_{\ell=1}^{L}\lambda^{2}_{\ell}∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT which is the sum of the square of the variances over all the components. This is in contrast to the corresponding factor in R(=1Lλϵ,)𝑅superscriptsubscript1𝐿subscript𝜆italic-ϵR\left(\sum_{\ell=1}^{L}\lambda_{\ell}-\epsilon,\infty\right)italic_R ( ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_ϵ , ∞ ) in the traditional reverse water-filling solution which is simply λmaxsuperscript𝜆\lambda^{\max}italic_λ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT. This is a consequence of the perfect perception constraint, which requires all the components to be reconstructed with the same variances as the source at the decoder.

Proposition 4 (Low-Distortion Compression)

In the low-distortion and perfect perception regime, we have that for sufficiently small ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0,

R(ϵ,0)=12=1LlogLλϵ+ϵ8L=1L1λ+O(ϵ2),𝑅italic-ϵ012superscriptsubscript1𝐿𝐿subscript𝜆italic-ϵitalic-ϵ8𝐿superscriptsubscript1𝐿1subscript𝜆𝑂superscriptitalic-ϵ2\displaystyle R(\epsilon,0)=\frac{1}{2}\sum_{\ell=1}^{L}\log\frac{L\lambda_{% \ell}}{\epsilon}+\frac{\epsilon}{8L}\sum_{\ell=1}^{L}\frac{1}{\lambda_{\ell}}+% O(\epsilon^{2}),italic_R ( italic_ϵ , 0 ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log divide start_ARG italic_L italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_ϵ end_ARG + divide start_ARG italic_ϵ end_ARG start_ARG 8 italic_L end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (63)

where the water-levels are given by

γ(ϵ,0)=ϵLϵ22L2λ+ϵ24L3=1L1λ+O(ϵ3),{1,,L}.formulae-sequencesubscriptsuperscript𝛾italic-ϵ0italic-ϵ𝐿superscriptitalic-ϵ22superscript𝐿2subscript𝜆superscriptitalic-ϵ24superscript𝐿3superscriptsubscript1𝐿1subscript𝜆𝑂superscriptitalic-ϵ31𝐿\displaystyle\gamma^{*}_{\ell}(\epsilon,0)=\frac{\epsilon}{L}-\frac{\epsilon^{% 2}}{2L^{2}\lambda_{\ell}}+\frac{\epsilon^{2}}{4L^{3}}\sum_{\ell=1}^{L}\frac{1}% {\lambda_{\ell}}+O(\epsilon^{3}),\quad\ell\in\{1,\ldots,L\}.italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ϵ , 0 ) = divide start_ARG italic_ϵ end_ARG start_ARG italic_L end_ARG - divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_L start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) , roman_ℓ ∈ { 1 , … , italic_L } . (64)
Proof:

See Appendix H-2. ∎

Comparing Proposition 4 with Proposition 2, we see that in this high-rate low-distortion regime, the extra rate required to satisfy zero-perception scales as

R(ϵ,0)R(ϵ,)𝑅italic-ϵ0𝑅italic-ϵ\displaystyle R(\epsilon,0)-R(\epsilon,\infty)italic_R ( italic_ϵ , 0 ) - italic_R ( italic_ϵ , ∞ ) =\displaystyle== ϵ8L=1L1λ+O(ϵ2),italic-ϵ8𝐿superscriptsubscript1𝐿1subscript𝜆𝑂superscriptitalic-ϵ2\displaystyle\frac{\epsilon}{8L}\sum_{\ell=1}^{L}\frac{1}{\lambda_{\ell}}+O(% \epsilon^{2}),divide start_ARG italic_ϵ end_ARG start_ARG 8 italic_L end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (65)
γ(ϵ,)γ(ϵ,0)subscriptsuperscript𝛾italic-ϵsubscriptsuperscript𝛾italic-ϵ0\displaystyle\gamma^{*}_{\ell}(\epsilon,\infty)-\gamma^{*}_{\ell}(\epsilon,0)italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ϵ , ∞ ) - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ϵ , 0 ) =\displaystyle== ϵ22L2λϵ24L3=1L1λ+O(ϵ3),{1,,L}.superscriptitalic-ϵ22superscript𝐿2subscript𝜆superscriptitalic-ϵ24superscript𝐿3superscriptsubscript1𝐿1subscript𝜆𝑂superscriptitalic-ϵ31𝐿\displaystyle\frac{\epsilon^{2}}{2L^{2}\lambda_{\ell}}-\frac{\epsilon^{2}}{4L^% {3}}\sum_{\ell=1}^{L}\frac{1}{\lambda_{\ell}}+O(\epsilon^{3}),\quad\ell\in\{1,% \ldots,L\}.divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_L start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) , roman_ℓ ∈ { 1 , … , italic_L } . (66)
Refer to caption
(a) High distortion; no perception constraint
Refer to caption
(b) Low distortion; no perception constraint
Refer to caption
(c) High distortion; zero perception loss
Refer to caption
(d) Low distortion; zero perception loss
Figure 3: The water-levels assigned to different components for a Gaussian vector source with λ1=3,λ2=2,λ3=5,λ4=4formulae-sequencesubscript𝜆13formulae-sequencesubscript𝜆22formulae-sequencesubscript𝜆35subscript𝜆44\lambda_{1}=3,\lambda_{2}=2,\lambda_{3}=5,\lambda_{4}=4italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 3 , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 2 , italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 5 , italic_λ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT = 4 and λ5=1subscript𝜆51\lambda_{5}=1italic_λ start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT = 1.

Fig. 3 shows the water-levels of different components for both low-distortion and high-distortion compression with P=𝑃P=\inftyitalic_P = ∞ or P=0𝑃0P=0italic_P = 0 for an example of a Gaussian vector source. The water-levels determine the compression rates assigned to each component.

In Fig. 3(a), for high-distortion compression with no perception constraint, all components except the one with the largest eigenvalue are allocated a zero compression rate (cf. Proposition 1). With an active perception constarint, as shown in Fig. 3(c) for the P=0𝑃0P=0italic_P = 0 case, all components are allocated positive rates (cf. Proposition 3).

In Fig. 3(b), for low-distortion compression with no perception constraint, the water-levels of all components are the same (cf. Proposition 2). At low distortion and with an active perception constraint, as shown in Fig. 3(d) for the P=0𝑃0P=0italic_P = 0 case, the water-levels of different components are approximately equal with some slight differences which are determined by (64) in Proposition 4. Therefore, in the low-distortion regime, the water-levels of all components are approximately the same regardless of the perception constraint.

V Conclusions

This paper characterizes the RDP function for a Gaussian vector source. In contrast to the traditional reverse water-filling solution (without a perception constraint), the water-levels assigned to different components are not necessarily equal. When both distortion and perception constraints are active, every component is assigned a positive rate. These results have implications to perception-aware image coding.

Appendix A Asymptotic Analysis of the Traditional RD Function

A-1 High-Distortion Compression

Here, we consider D==1Lλϵ𝐷superscriptsubscript1𝐿subscript𝜆italic-ϵD=\sum_{\ell=1}^{L}\lambda_{\ell}-\epsilonitalic_D = ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_ϵ for sufficiently small ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0. Without loss of generality, we assume that the eigenvalues are ordered as follows

λ1λ2λL.subscript𝜆1subscript𝜆2subscript𝜆𝐿\displaystyle\lambda_{1}\leq\lambda_{2}\leq\ldots\leq\lambda_{L}.italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ … ≤ italic_λ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT . (67)

First consider the case that |Lmax|=1superscript𝐿1|L^{\max}|=1| italic_L start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT | = 1. The distortion constraint (17) implies that

=1L[λν(D)]+=ϵ.superscriptsubscript1𝐿superscriptdelimited-[]subscript𝜆𝜈𝐷italic-ϵ\displaystyle\sum_{\ell=1}^{L}[\lambda_{\ell}-\nu(D)]^{+}=\epsilon.∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT [ italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_ν ( italic_D ) ] start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT = italic_ϵ . (68)

The above condition implies that for a small enough ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, ν(D)𝜈𝐷\nu(D)italic_ν ( italic_D ) should satisfy

λ1λ2λL1ν(D)<λL.subscript𝜆1subscript𝜆2subscript𝜆𝐿1𝜈𝐷subscript𝜆𝐿\displaystyle\lambda_{1}\leq\lambda_{2}\leq\ldots\leq\lambda_{L-1}\leq\nu(D)<% \lambda_{L}.italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ … ≤ italic_λ start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT ≤ italic_ν ( italic_D ) < italic_λ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT . (69)

Considering (69) with (68) yields

ν(D)=λLϵ.𝜈𝐷subscript𝜆𝐿italic-ϵ\displaystyle\nu(D)=\lambda_{L}-\epsilon.italic_ν ( italic_D ) = italic_λ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT - italic_ϵ . (70)

Plugging the above into the RDP function of Proposition 1, we get

R(=1Lλϵ,)𝑅superscriptsubscript1𝐿subscript𝜆italic-ϵ\displaystyle R\left(\sum_{\ell=1}^{L}\lambda_{\ell}-\epsilon,\infty\right)italic_R ( ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_ϵ , ∞ ) =\displaystyle== 12logλLλLϵ12subscript𝜆𝐿subscript𝜆𝐿italic-ϵ\displaystyle\frac{1}{2}\log\frac{\lambda_{L}}{\lambda_{L}-\epsilon}divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT - italic_ϵ end_ARG (71)
=\displaystyle== 12λLϵ+O(ϵ2).12subscript𝜆𝐿italic-ϵ𝑂superscriptitalic-ϵ2\displaystyle\frac{1}{2\lambda_{L}}\epsilon+O(\epsilon^{2}).divide start_ARG 1 end_ARG start_ARG 2 italic_λ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_ARG italic_ϵ + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) . (72)

Finally, noting λL=maxλsubscript𝜆𝐿subscriptsubscript𝜆\lambda_{L}=\max_{\ell}\lambda_{\ell}italic_λ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT gives (22).

If |Lmax|>1superscript𝐿1|L^{\max}|>1| italic_L start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT | > 1, then similar to the above discussion, all eigenvalues except the largest ones are assigned a zero compression rate and for the maximum eigenvalues, we have the following water-level

ν(D)=λmaxϵ|Lmax|,𝜈𝐷superscript𝜆italic-ϵsuperscript𝐿\displaystyle\nu(D)=\lambda^{\max}-\frac{\epsilon}{|L^{\max}|},italic_ν ( italic_D ) = italic_λ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT - divide start_ARG italic_ϵ end_ARG start_ARG | italic_L start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT | end_ARG , (73)

and the following rate

R(=1Lλϵ,)𝑅superscriptsubscript1𝐿subscript𝜆italic-ϵ\displaystyle R\left(\sum_{\ell=1}^{L}\lambda_{\ell}-\epsilon,\infty\right)italic_R ( ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_ϵ , ∞ ) =\displaystyle== |Lmax|2logλLλLϵ|Lmax|superscript𝐿2subscript𝜆𝐿subscript𝜆𝐿italic-ϵsuperscript𝐿\displaystyle\frac{|L^{\max}|}{2}\log\frac{\lambda_{L}}{\lambda_{L}-\frac{% \epsilon}{|L^{\max}|}}divide start_ARG | italic_L start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT | end_ARG start_ARG 2 end_ARG roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT - divide start_ARG italic_ϵ end_ARG start_ARG | italic_L start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT | end_ARG end_ARG (74)
=\displaystyle== 12λLϵ+O(ϵ2).12subscript𝜆𝐿italic-ϵ𝑂superscriptitalic-ϵ2\displaystyle\frac{1}{2\lambda_{L}}\epsilon+O(\epsilon^{2}).divide start_ARG 1 end_ARG start_ARG 2 italic_λ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_ARG italic_ϵ + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) . (75)

This proves (22) for arbitrary Lmaxsuperscript𝐿L^{\max}italic_L start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT.

A-2 Low-Distortion Compression

Consider the case of D=ϵ𝐷italic-ϵD=\epsilonitalic_D = italic_ϵ for sufficiently small ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0. In this low-distortion regime, the constant water-level ν(D)𝜈𝐷\nu(D)italic_ν ( italic_D ) is not saturated by the eigenvalues. Thus, Proposition 1 simplifies to the following

R(ϵ,)=12=1Llogλν(D).𝑅italic-ϵ12superscriptsubscript1𝐿subscript𝜆𝜈𝐷\displaystyle R(\epsilon,\infty)=\frac{1}{2}\sum_{\ell=1}^{L}\log\frac{\lambda% _{\ell}}{\nu(D)}.italic_R ( italic_ϵ , ∞ ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_ν ( italic_D ) end_ARG . (76)

Also, the distortion constraint (17) implies that

ν(D)=DL.𝜈𝐷𝐷𝐿\displaystyle\nu(D)=\frac{D}{L}.italic_ν ( italic_D ) = divide start_ARG italic_D end_ARG start_ARG italic_L end_ARG . (77)

Combining (76) and (77), we get the rate expression (24) in Proposition 2.

Appendix B Proof of Theorem 2

First, we prove the optimality of Gaussian reconstruction for the case of the KL-divergence as the perception metric. Define the following distribution

PX^|X=argminPX^|X:𝔼[XX^2]DD(PX^PX)PI(X;X^).subscript𝑃conditionalsuperscript^𝑋𝑋subscript:subscript𝑃conditional^𝑋𝑋absent𝔼delimited-[]superscriptnorm𝑋^𝑋2𝐷𝐷conditionalsubscript𝑃^𝑋subscript𝑃𝑋𝑃𝐼𝑋^𝑋\displaystyle P_{\hat{X}^{*}|X}=\arg\min_{\begin{subarray}{c}P_{\hat{X}|X}:\\ \mathbb{E}[\|X-\hat{X}\|^{2}]\leq D\\ D(P_{\hat{X}}\|P_{X})\leq P\end{subarray}}I(X;\hat{X}).italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | italic_X end_POSTSUBSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG | italic_X end_POSTSUBSCRIPT : end_CELL end_ROW start_ROW start_CELL blackboard_E [ ∥ italic_X - over^ start_ARG italic_X end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_D end_CELL end_ROW start_ROW start_CELL italic_D ( italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ∥ italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ) ≤ italic_P end_CELL end_ROW end_ARG end_POSTSUBSCRIPT italic_I ( italic_X ; over^ start_ARG italic_X end_ARG ) . (78)

Now, let X^Gsubscript^𝑋𝐺\hat{X}_{G}over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT be a random variable jointly Gaussian distributed with X𝑋Xitalic_X such that

𝔼[X^G]𝔼delimited-[]subscript^𝑋𝐺\displaystyle\mathbbm{E}[\hat{X}_{G}]blackboard_E [ over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ] =\displaystyle== 𝔼[X^],𝔼delimited-[]superscript^𝑋\displaystyle\mathbbm{E}[\hat{X}^{*}],blackboard_E [ over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ] , (79a)
cov(X^G,X)covsubscript^𝑋𝐺𝑋\displaystyle\text{cov}(\hat{X}_{G},X)cov ( over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT , italic_X ) =\displaystyle== cov(X^,X).covsuperscript^𝑋𝑋\displaystyle\text{cov}(\hat{X}^{*},X).cov ( over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_X ) . (79b)

We proceed with lower bounding the rate as follows

I(X;X^)𝐼𝑋superscript^𝑋\displaystyle I(X;\hat{X}^{*})italic_I ( italic_X ; over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) =\displaystyle== h(X)h(X|X^)𝑋conditional𝑋superscript^𝑋\displaystyle h(X)-h(X|\hat{X}^{*})italic_h ( italic_X ) - italic_h ( italic_X | over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) (80)
\displaystyle\geq h(X)h(X|X^G)𝑋conditional𝑋subscript^𝑋𝐺\displaystyle h(X)-h(X|\hat{X}_{G})italic_h ( italic_X ) - italic_h ( italic_X | over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ) (81)
=\displaystyle== I(X;X^G),𝐼𝑋subscript^𝑋𝐺\displaystyle I(X;\hat{X}_{G}),italic_I ( italic_X ; over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ) , (82)

where (81) follows from (79) and the fact that under a fixed covariance matrix, a jointly Gaussian distribution maximizes the conditional differential entropy [17, Lemma 2]. The condition (79) also implies that for the distortion loss, we have

D𝔼[XX^2]=𝔼[XX^G2].𝐷𝔼delimited-[]superscriptnorm𝑋superscript^𝑋2𝔼delimited-[]superscriptnorm𝑋subscript^𝑋𝐺2\displaystyle D\geq\mathbbm{E}[\|X-\hat{X}^{*}\|^{2}]=\mathbbm{E}[\|X-\hat{X}_% {G}\|^{2}].italic_D ≥ blackboard_E [ ∥ italic_X - over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = blackboard_E [ ∥ italic_X - over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] . (83)

Moreover, for the perception loss, we have

D(PX^PX)𝐷conditionalsubscript𝑃superscript^𝑋subscript𝑃𝑋\displaystyle D(P_{\hat{X}^{*}}\|P_{X})italic_D ( italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ) =\displaystyle== PX^(x)logPX^(x)PX(x)dxsubscript𝑃superscript^𝑋𝑥subscript𝑃superscript^𝑋𝑥subscript𝑃𝑋𝑥𝑑𝑥\displaystyle\int P_{\hat{X}^{*}}(x)\log\frac{P_{\hat{X}^{*}}(x)}{P_{X}(x)}dx∫ italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) roman_log divide start_ARG italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) end_ARG italic_d italic_x (84)
=\displaystyle== h(X^)PX^(x)logPX(x)𝑑xsuperscript^𝑋subscript𝑃superscript^𝑋𝑥subscript𝑃𝑋𝑥differential-d𝑥\displaystyle-h(\hat{X}^{*})-\int P_{\hat{X}^{*}}(x)\log P_{X}(x)dx- italic_h ( over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - ∫ italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) roman_log italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) italic_d italic_x (85)
=\displaystyle== h(X^)+12PX^(x)xΣX1xTdx+12log(2π)Ldet(ΣX)\displaystyle-h(\hat{X}^{*})+\frac{1}{2}\int P_{\hat{X}^{*}}(x)x\Sigma_{X}^{-1% }x^{T}dx+\frac{1}{2}\log(2\pi)^{L}\det(\Sigma_{X})- italic_h ( over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∫ italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) italic_x roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_d italic_x + divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log ( 2 italic_π ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_det ( roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ) (86)
=\displaystyle== h(X^)+12PX^G(x)xΣX1xTdx+12log(2π)Ldet(ΣX)\displaystyle-h(\hat{X}^{*})+\frac{1}{2}\int P_{\hat{X}_{G}}(x)x\Sigma_{X}^{-1% }x^{T}dx+\frac{1}{2}\log(2\pi)^{L}\det(\Sigma_{X})- italic_h ( over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∫ italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) italic_x roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_d italic_x + divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log ( 2 italic_π ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_det ( roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ) (87)
=\displaystyle== h(X^)PX^G(x)logPX(x)𝑑xsuperscript^𝑋subscript𝑃subscript^𝑋𝐺𝑥subscript𝑃𝑋𝑥differential-d𝑥\displaystyle-h(\hat{X}^{*})-\int P_{\hat{X}_{G}}(x)\log P_{X}(x)dx- italic_h ( over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - ∫ italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) roman_log italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) italic_d italic_x (88)
\displaystyle\geq h(X^G)PX^G(x)logPX(x)𝑑xsubscript^𝑋𝐺subscript𝑃subscript^𝑋𝐺𝑥subscript𝑃𝑋𝑥differential-d𝑥\displaystyle-h(\hat{X}_{G})-\int P_{\hat{X}_{G}}(x)\log P_{X}(x)dx- italic_h ( over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ) - ∫ italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) roman_log italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) italic_d italic_x (89)
=\displaystyle== D(PX^GPX),𝐷conditionalsubscript𝑃subscript^𝑋𝐺subscript𝑃𝑋\displaystyle D(P_{\hat{X}_{G}}\|P_{X}),italic_D ( italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ) , (90)

where (87) follows because the expression xΣX1xT𝑥superscriptsubscriptΣ𝑋1superscript𝑥𝑇x\Sigma_{X}^{-1}x^{T}italic_x roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT for a vector x=(x1,,xL)𝑥subscript𝑥1subscript𝑥𝐿x=(x_{1},\ldots,x_{L})italic_x = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) only contains the terms such as x2superscriptsubscript𝑥2x_{\ell}^{2}italic_x start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, xsubscript𝑥x_{\ell}italic_x start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT and xxsubscript𝑥subscript𝑥superscriptx_{\ell}x_{\ell^{\prime}}italic_x start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT for ,{1,,L}superscript1𝐿\ell,\ell^{\prime}\in\{1,\ldots,L\}roman_ℓ , roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ { 1 , … , italic_L }, and since according to (79), X^superscript^𝑋\hat{X}^{*}over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT has the same mean and covariance matrix as X^Gsubscript^𝑋𝐺\hat{X}_{G}over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT, the expected values of these terms with respect to PX^subscript𝑃superscript^𝑋P_{\hat{X}^{*}}italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT are equal to the same expectations calculated with respect to PX^Gsubscript𝑃subscript^𝑋𝐺P_{\hat{X}_{G}}italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT; (89) follows because for a fixed covariance matrix, the differential entropy is maximized by a Gaussian distribution [16, Thm 8.6.5]. Finally, there is no loss of optimality in setting 𝔼[X^G]=0𝔼delimited-[]subscript^𝑋𝐺0\mathbb{E}[\hat{X}_{G}]=0blackboard_E [ over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ] = 0 since replacing X^Gsubscript^𝑋𝐺\hat{X}_{G}over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT with X^G𝔼[X^G]subscript^𝑋𝐺𝔼delimited-[]subscript^𝑋𝐺\hat{X}_{G}-\mathbb{E}[\hat{X}_{G}]over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT - blackboard_E [ over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ] does not increase I(X;X^G)𝐼𝑋subscript^𝑋𝐺I(X;\hat{X}_{G})italic_I ( italic_X ; over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ), 𝔼[XX^G2]𝔼delimited-[]superscriptnorm𝑋subscript^𝑋𝐺2\mathbb{E}[\|X-\hat{X}_{G}\|^{2}]blackboard_E [ ∥ italic_X - over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ], and D(PX^GPX)𝐷conditionalsubscript𝑃subscript^𝑋𝐺subscript𝑃𝑋D(P_{\hat{X}_{G}}\|P_{X})italic_D ( italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ).

Thus, replacing X^superscript^𝑋\hat{X}^{*}over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT by X^Gsubscript^𝑋𝐺\hat{X}_{G}over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT does not increase the rate, while distortion and perception constraints remain to be satisfied. Thus, the optimal X^superscript^𝑋\hat{X}^{*}over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT must be jointly Gaussian with X𝑋Xitalic_X.

For the case of the Wasserstein-2 distance as the perception metric, lower bounding steps for I(X;X^)𝐼𝑋superscript^𝑋I(X;\hat{X}^{*})italic_I ( italic_X ; over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) and 𝔼[XX^2]𝔼delimited-[]superscriptnorm𝑋superscript^𝑋2\mathbbm{E}[\|X-\hat{X}^{*}\|^{2}]blackboard_E [ ∥ italic_X - over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] are the same as (82) and (83), respectively. For the perception metric, the steps are refined as follows. Define the following distribution

PUV=arginfP~UV:P~U=PXP~V=PX^𝔼P~[UV2].subscript𝑃superscript𝑈superscript𝑉subscriptinfimum:subscript~𝑃𝑈𝑉absentsubscript~𝑃𝑈subscript𝑃𝑋subscript~𝑃𝑉subscript𝑃superscript^𝑋subscript𝔼~𝑃delimited-[]superscriptnorm𝑈𝑉2\displaystyle P_{U^{*}V^{*}}=\arg\inf_{\begin{subarray}{c}\tilde{P}_{UV}:\\ \tilde{P}_{U}=P_{X}\\ \tilde{P}_{V}=P_{\hat{X}^{*}}\end{subarray}}\mathbbm{E}_{\tilde{P}}[\|U-V\|^{2% }].italic_P start_POSTSUBSCRIPT italic_U start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = roman_arg roman_inf start_POSTSUBSCRIPT start_ARG start_ROW start_CELL over~ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_U italic_V end_POSTSUBSCRIPT : end_CELL end_ROW start_ROW start_CELL over~ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL over~ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW end_ARG end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT over~ start_ARG italic_P end_ARG end_POSTSUBSCRIPT [ ∥ italic_U - italic_V ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] . (91)

Now, define PUGVGsubscript𝑃subscript𝑈𝐺subscript𝑉𝐺P_{U_{G}V_{G}}italic_P start_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT to be a joint Gaussian distribution such that

𝔼[UG]𝔼delimited-[]subscript𝑈𝐺\displaystyle\mathbbm{E}[U_{G}]blackboard_E [ italic_U start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ] =\displaystyle== 𝔼[U],𝔼delimited-[]superscript𝑈\displaystyle\mathbbm{E}[U^{*}],blackboard_E [ italic_U start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ] , (92a)
𝔼[VG]𝔼delimited-[]subscript𝑉𝐺\displaystyle\mathbbm{E}[V_{G}]blackboard_E [ italic_V start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ] =\displaystyle== 𝔼[V],𝔼delimited-[]superscript𝑉\displaystyle\mathbbm{E}[V^{*}],blackboard_E [ italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ] , (92b)
cov(UG,VG)covsubscript𝑈𝐺subscript𝑉𝐺\displaystyle\text{cov}(U_{G},V_{G})cov ( italic_U start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT , italic_V start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ) =\displaystyle== cov(U,V).covsuperscript𝑈superscript𝑉\displaystyle\text{cov}(U^{*},V^{*}).cov ( italic_U start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) . (92c)

Then, we have the following set of inequalities:

PW22(PX,PX^)𝑃superscriptsubscript𝑊22subscript𝑃𝑋subscript𝑃superscript^𝑋\displaystyle P\geq W_{2}^{2}(P_{X},P_{\hat{X}^{*}})italic_P ≥ italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) =\displaystyle== infP~UV:P~U=PXP~V=PX^𝔼P~[UV2]subscriptinfimum:subscript~𝑃𝑈𝑉absentsubscript~𝑃𝑈subscript𝑃𝑋subscript~𝑃𝑉subscript𝑃superscript^𝑋subscript𝔼~𝑃delimited-[]superscriptnorm𝑈𝑉2\displaystyle\inf_{\begin{subarray}{c}\tilde{P}_{UV}:\\ \tilde{P}_{U}=P_{X}\\ \tilde{P}_{V}=P_{\hat{X}^{*}}\end{subarray}}\mathbbm{E}_{\tilde{P}}[\|U-V\|^{2}]roman_inf start_POSTSUBSCRIPT start_ARG start_ROW start_CELL over~ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_U italic_V end_POSTSUBSCRIPT : end_CELL end_ROW start_ROW start_CELL over~ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL over~ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW end_ARG end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT over~ start_ARG italic_P end_ARG end_POSTSUBSCRIPT [ ∥ italic_U - italic_V ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (93)
=\displaystyle== 𝔼[UV2]𝔼delimited-[]superscriptnormsuperscript𝑈superscript𝑉2\displaystyle\mathbbm{E}[\|U^{*}-V^{*}\|^{2}]blackboard_E [ ∥ italic_U start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (94)
=\displaystyle== 𝔼[UGVG2]𝔼delimited-[]superscriptnormsubscript𝑈𝐺subscript𝑉𝐺2\displaystyle\mathbbm{E}[\|U_{G}-V_{G}\|^{2}]blackboard_E [ ∥ italic_U start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT - italic_V start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (95)
\displaystyle\geq W22(PUG,PVG)superscriptsubscript𝑊22subscript𝑃subscript𝑈𝐺subscript𝑃subscript𝑉𝐺\displaystyle W_{2}^{2}(P_{U_{G}},P_{V_{G}})italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) (96)
=\displaystyle== infP^UV:P^U=PUGP^V=PVG𝔼P^[UV2]subscriptinfimum:subscript^𝑃𝑈𝑉absentsubscript^𝑃𝑈subscript𝑃subscript𝑈𝐺subscript^𝑃𝑉subscript𝑃subscript𝑉𝐺subscript𝔼^𝑃delimited-[]superscriptnorm𝑈𝑉2\displaystyle\inf_{\begin{subarray}{c}\hat{P}_{UV}:\\ \hat{P}_{U}=P_{U_{G}}\\ \hat{P}_{V}=P_{V_{G}}\end{subarray}}\mathbbm{E}_{\hat{P}}[\|U-V\|^{2}]roman_inf start_POSTSUBSCRIPT start_ARG start_ROW start_CELL over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_U italic_V end_POSTSUBSCRIPT : end_CELL end_ROW start_ROW start_CELL over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW end_ARG end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG end_POSTSUBSCRIPT [ ∥ italic_U - italic_V ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (97)
=\displaystyle== infP^UV:P^U=PXP^V=PX^G𝔼P^[UV2]subscriptinfimum:subscript^𝑃𝑈𝑉absentsubscript^𝑃𝑈subscript𝑃𝑋subscript^𝑃𝑉subscript𝑃subscript^𝑋𝐺subscript𝔼^𝑃delimited-[]superscriptnorm𝑈𝑉2\displaystyle\inf_{\begin{subarray}{c}\hat{P}_{UV}:\\ \hat{P}_{U}=P_{X}\\ \hat{P}_{V}=P_{\hat{X}_{G}}\end{subarray}}\mathbbm{E}_{\hat{P}}[\|U-V\|^{2}]roman_inf start_POSTSUBSCRIPT start_ARG start_ROW start_CELL over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_U italic_V end_POSTSUBSCRIPT : end_CELL end_ROW start_ROW start_CELL over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW end_ARG end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG end_POSTSUBSCRIPT [ ∥ italic_U - italic_V ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (98)
=\displaystyle== W22(PX,PX^G),superscriptsubscript𝑊22subscript𝑃𝑋subscript𝑃subscript^𝑋𝐺\displaystyle W_{2}^{2}(P_{X},P_{\hat{X}_{G}}),italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , (99)

where

  • (94) follows from the definition in (91);

  • (95) follows from (92) which states that (U,V)superscript𝑈superscript𝑉(U^{*},V^{*})( italic_U start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) and (UG,VG)superscript𝑈𝐺superscript𝑉𝐺(U^{G},V^{G})( italic_U start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT , italic_V start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ) have the same first- and second-order statistics;

  • (98) follows because PVG=PX^Gsubscript𝑃subscript𝑉𝐺subscript𝑃subscript^𝑋𝐺P_{V_{G}}=P_{\hat{X}_{G}}italic_P start_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT and PUG=PXsubscript𝑃subscript𝑈𝐺subscript𝑃𝑋P_{U_{G}}=P_{X}italic_P start_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT, which are justified as follows. First, notice that both PVGsubscript𝑃subscript𝑉𝐺P_{V_{G}}italic_P start_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT and PX^Gsubscript𝑃subscript^𝑋𝐺P_{\hat{X}_{G}}italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT are Gaussian distributions. According to (92), the first- and second-order statistics of VGsubscript𝑉𝐺V_{G}italic_V start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT are equal to those of Vsuperscript𝑉V^{*}italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Also, from (91), we know that PV=PX^subscript𝑃superscript𝑉subscript𝑃superscript^𝑋P_{V^{*}}=P_{\hat{X}^{*}}italic_P start_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, hence the first- and second-order statistics of Vsuperscript𝑉V^{*}italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and X^superscript^𝑋\hat{X}^{*}over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT are the same. On the other side, from (79), we know that the first- and second-order statistics of X^superscript^𝑋\hat{X}^{*}over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT are equal to those of X^Gsubscript^𝑋𝐺\hat{X}_{G}over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT. Thus, we conclude that PVG=PX^Gsubscript𝑃subscript𝑉𝐺subscript𝑃subscript^𝑋𝐺P_{V_{G}}=P_{\hat{X}_{G}}italic_P start_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT. A similar argument shows that PUG=PXsubscript𝑃subscript𝑈𝐺subscript𝑃𝑋P_{U_{G}}=P_{X}italic_P start_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT.

Thus, without loss of optimality one can replace X^superscript^𝑋\hat{X}^{*}over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT by X^Gsubscript^𝑋𝐺\hat{X}_{G}over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT since the rate does not increase, while the distortion and perception constraints remain to be satisfied.

Appendix C Proof of Theorem 3

We aim to establish the RDP function for the case of KL-divergence as the perception metric by showing that

R(D,P)=R(D,P),𝑅𝐷𝑃superscript𝑅𝐷𝑃\displaystyle R(D,P)=R^{*}(D,P),italic_R ( italic_D , italic_P ) = italic_R start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) , (100)

where

R(D,P)=min{λ^,γ}=1L12=1Llogλγsuperscript𝑅𝐷𝑃subscriptsuperscriptsubscriptsubscript^𝜆subscript𝛾1𝐿12superscriptsubscript1𝐿subscript𝜆subscript𝛾\displaystyle R^{*}(D,P)=\min_{\{\hat{\lambda}_{\ell},\gamma_{{\ell}}\}_{\ell=% 1}^{L}}\;\;\frac{1}{2}\sum_{\ell=1}^{L}\log\frac{\lambda_{\ell}}{\gamma_{{\ell% }}}italic_R start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) = roman_min start_POSTSUBSCRIPT { over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG (101a)
s.t.0<γλ,s.t.0subscript𝛾subscript𝜆\displaystyle\hskip 76.82234pt\text{s.t.}\qquad 0<\gamma_{\ell}\leq\lambda_{% \ell},s.t. 0 < italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ≤ italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , (101b)
0λ^,0subscript^𝜆\displaystyle\hskip 113.81102pt0\leq\hat{\lambda}_{\ell},0 ≤ over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , (101c)
=1L(λ2λ^(λγ)+λ^)D,superscriptsubscript1𝐿subscript𝜆2subscript^𝜆subscript𝜆subscript𝛾subscript^𝜆𝐷\displaystyle\hskip 113.81102pt\sum_{\ell=1}^{L}\left(\lambda_{{\ell}}-2\sqrt{% \hat{\lambda}_{{\ell}}(\lambda_{{\ell}}-\gamma_{{\ell}})}+\hat{\lambda}_{{\ell% }}\right)\leq D,∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - 2 square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) ≤ italic_D , (101d)
12=1L(λ^λ1+logλλ^)P.12superscriptsubscript1𝐿subscript^𝜆subscript𝜆1subscript𝜆subscript^𝜆𝑃\displaystyle\hskip 113.81102pt\frac{1}{2}\sum_{\ell=1}^{L}\left(\frac{\hat{% \lambda}_{{\ell}}}{\lambda_{{\ell}}}-1+\log\frac{\lambda_{{\ell}}}{\hat{% \lambda}_{{\ell}}}\right)\leq P.divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( divide start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - 1 + roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) ≤ italic_P . (101e)

C-1 Proof of R(D,P)R(D,P)superscript𝑅𝐷𝑃𝑅𝐷𝑃R^{*}(D,P)\geq R(D,P)italic_R start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) ≥ italic_R ( italic_D , italic_P )

Let {γ,λ^}=1Lsuperscriptsubscriptsubscript𝛾subscript^𝜆1𝐿\{\gamma_{\ell},\hat{\lambda}_{\ell}\}_{\ell=1}^{L}{ italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT be the optimal solution of (101). For {1,,L}1𝐿\ell\in\{1,\ldots,L\}roman_ℓ ∈ { 1 , … , italic_L }, let Z^G,superscriptsubscript^𝑍𝐺\hat{Z}_{G,\ell}^{*}over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT be jointly Gaussian with Zsubscript𝑍Z_{\ell}italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT with their covariance matrix as given in (27), and be independent of all other Zsubscript𝑍superscriptZ_{\ell^{\prime}}italic_Z start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, i.e., for-allsuperscript\forall\ell^{\prime}\neq\ell∀ roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≠ roman_ℓ. Let Z^G=(Z^G,1,,Z^G,L)superscriptsubscript^𝑍𝐺superscriptsubscript^𝑍𝐺1superscriptsubscript^𝑍𝐺𝐿\hat{Z}_{G}^{*}=(\hat{Z}_{G,1}^{*},\ldots,\hat{Z}_{G,L}^{*})over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_G , 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_G , italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ). Further, set X^G=ΘTZ^Gsubscriptsuperscript^𝑋𝐺superscriptΘ𝑇subscriptsuperscript^𝑍𝐺\hat{X}^{*}_{G}=\Theta^{T}\hat{Z}^{*}_{G}over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT = roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT. It can be verified that

𝔼[XX^G2]𝔼delimited-[]superscriptnorm𝑋subscriptsuperscript^𝑋𝐺2\displaystyle\mathbb{E}[\|X-\hat{X}^{*}_{G}\|^{2}]blackboard_E [ ∥ italic_X - over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] =\displaystyle== 𝔼[ZZ^G2]𝔼delimited-[]superscriptnorm𝑍subscriptsuperscript^𝑍𝐺2\displaystyle\mathbb{E}[\|Z-\hat{Z}^{*}_{G}\|^{2}]blackboard_E [ ∥ italic_Z - over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (102)
=\displaystyle== =1L𝔼[(ZZ^G,)2]superscriptsubscript1𝐿𝔼delimited-[]superscriptsubscript𝑍subscriptsuperscript^𝑍𝐺2\displaystyle\sum\limits_{\ell=1}^{L}\mathbb{E}[(Z_{\ell}-\hat{Z}^{*}_{G,\ell}% )^{2}]∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT blackboard_E [ ( italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (103)
=\displaystyle== =1L(λ2λ^(λγ)+λ^)superscriptsubscript1𝐿subscript𝜆2subscript^𝜆subscript𝜆subscript𝛾subscript^𝜆\displaystyle\sum_{\ell=1}^{L}\left(\lambda_{{\ell}}-2\sqrt{\hat{\lambda}_{{% \ell}}(\lambda_{{\ell}}-\gamma_{{\ell}})}+\hat{\lambda}_{{\ell}}\right)∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - 2 square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) (104)
\displaystyle\leq D,𝐷\displaystyle D,italic_D , (105)

and

D(PXGPX)𝐷conditionalsubscript𝑃subscriptsuperscript𝑋𝐺subscript𝑃𝑋\displaystyle D(P_{X^{*}_{G}}\|P_{X})italic_D ( italic_P start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ) =\displaystyle== D(PZ^GPZ)𝐷conditionalsubscript𝑃subscriptsuperscript^𝑍𝐺subscript𝑃𝑍\displaystyle D(P_{\hat{Z}^{*}_{G}}\|P_{Z})italic_D ( italic_P start_POSTSUBSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_P start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) (106)
=\displaystyle== =1LD(PZ^G,PZ)superscriptsubscript1𝐿𝐷conditionalsubscript𝑃subscriptsuperscript^𝑍𝐺subscript𝑃subscript𝑍\displaystyle\sum\limits_{\ell=1}^{L}D(P_{\hat{Z}^{*}_{G,\ell}}\|P_{Z_{\ell}})∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_D ( italic_P start_POSTSUBSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_P start_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) (107)
=\displaystyle== 12=1L(λ^λ1+logλλ^)12superscriptsubscript1𝐿subscript^𝜆subscript𝜆1subscript𝜆subscript^𝜆\displaystyle\frac{1}{2}\sum_{\ell=1}^{L}\left(\frac{\hat{\lambda}_{{\ell}}}{% \lambda_{{\ell}}}-1+\log\frac{\lambda_{{\ell}}}{\hat{\lambda}_{{\ell}}}\right)divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( divide start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - 1 + roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) (108)
\displaystyle\leq P,𝑃\displaystyle P,italic_P , (109)

where (102) and (106) are due to the invariance of KL-divergence and Euclidean distance under unitary transformations. Therefore, we must have R(D,P)I(X;X^G)𝑅𝐷𝑃𝐼𝑋subscriptsuperscript^𝑋𝐺R(D,P)\leq I(X;\hat{X}^{*}_{G})italic_R ( italic_D , italic_P ) ≤ italic_I ( italic_X ; over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ). On the other hand,

I(X;X^G)𝐼𝑋subscriptsuperscript^𝑋𝐺\displaystyle I(X;\hat{X}^{*}_{G})italic_I ( italic_X ; over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ) =\displaystyle== I(Z;Z^G)𝐼𝑍subscriptsuperscript^𝑍𝐺\displaystyle I(Z;\hat{Z}^{*}_{G})italic_I ( italic_Z ; over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ) (110)
=\displaystyle== =1LI(Z;Z^G,)superscriptsubscript1𝐿𝐼subscript𝑍subscriptsuperscript^𝑍𝐺\displaystyle\sum\limits_{\ell=1}^{L}I(Z_{\ell};\hat{Z}^{*}_{G,\ell})∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_I ( italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ; over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT ) (111)
=\displaystyle== 12=1Llogλγ12superscriptsubscript1𝐿subscript𝜆subscript𝛾\displaystyle\frac{1}{2}\sum\limits_{\ell=1}^{L}\log\frac{\lambda_{\ell}}{% \gamma_{\ell}}divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG (112)
=\displaystyle== R(D,P).superscript𝑅𝐷𝑃\displaystyle R^{*}(D,P).italic_R start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) . (113)

This proves R(D,P)R(D,P)superscript𝑅𝐷𝑃𝑅𝐷𝑃R^{*}(D,P)\geq R(D,P)italic_R start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) ≥ italic_R ( italic_D , italic_P ).

C-2 Proof of R(D,P)R(D,P)superscript𝑅𝐷𝑃𝑅𝐷𝑃R^{*}(D,P)\leq R(D,P)italic_R start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) ≤ italic_R ( italic_D , italic_P )

It follows from Theorem 2 that

R(D,P)=𝑅𝐷𝑃absent\displaystyle R(D,P)=italic_R ( italic_D , italic_P ) = infPX^G|Xsubscriptinfimumsubscript𝑃conditionalsubscript^𝑋𝐺𝑋\displaystyle\inf_{P_{\hat{X}_{G}|X}}roman_inf start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT | italic_X end_POSTSUBSCRIPT end_POSTSUBSCRIPT I(X;X^G),𝐼𝑋subscript^𝑋𝐺\displaystyle I(X;\hat{X}_{G}),italic_I ( italic_X ; over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ) , (114a)
s.t. 𝔼[XX^G2]D,𝔼delimited-[]superscriptnorm𝑋subscript^𝑋𝐺2𝐷\displaystyle\mathbb{E}[\|X-\hat{X}_{G}\|^{2}]\leq D,blackboard_E [ ∥ italic_X - over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_D , (114c)
D(PX^GPX)P,𝐷conditionalsubscript𝑃subscript^𝑋𝐺subscript𝑃𝑋𝑃\displaystyle D(P_{\hat{X}_{G}}\|P_{X})\leq P,italic_D ( italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ) ≤ italic_P ,

where X^Gsubscript^𝑋𝐺\hat{X}_{G}over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT has mean zero and is jointly Gaussian with X𝑋Xitalic_X. Let PX^G|Xsubscript𝑃conditionalsuperscriptsubscript^𝑋𝐺𝑋P_{\hat{X}_{G}^{*}|X}italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | italic_X end_POSTSUBSCRIPT be the optimal distribution of the program in (114) and define Z^G=ΘX^Gsubscriptsuperscript^𝑍𝐺Θsubscriptsuperscript^𝑋𝐺\hat{Z}^{*}_{G}=\Theta\hat{X}^{*}_{G}over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT = roman_Θ over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT. Let ΣX^GsubscriptΣsubscriptsuperscript^𝑋𝐺\Sigma_{\hat{X}^{*}_{G}}roman_Σ start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT be the covariance matrix of X^Gsubscriptsuperscript^𝑋𝐺\hat{X}^{*}_{G}over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT and ΛZ^GsubscriptΛsubscriptsuperscript^𝑍𝐺\Lambda_{\hat{Z}^{*}_{G}}roman_Λ start_POSTSUBSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT be a diagonal matrix whose diagonal elements coincide with those of ΘΣX^GΘTΘsubscriptΣsubscriptsuperscript^𝑋𝐺superscriptΘ𝑇\Theta\Sigma_{\hat{X}^{*}_{G}}\Theta^{T}roman_Θ roman_Σ start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, i.e.,

ΛZ^G=diagL(λ^1,,λ^L).subscriptΛsubscriptsuperscript^𝑍𝐺superscriptdiag𝐿subscript^𝜆1subscript^𝜆𝐿\displaystyle\Lambda_{\hat{Z}^{*}_{G}}=\text{diag}^{L}(\hat{\lambda}_{1},% \ldots,\hat{\lambda}_{L}).roman_Λ start_POSTSUBSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT = diag start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) . (115)

Furthermore, define

γ=𝔼[(Z𝔼[Z|Z^G,])2],{1,,L}.formulae-sequencesubscript𝛾𝔼delimited-[]superscriptsubscript𝑍𝔼delimited-[]conditionalsubscript𝑍subscriptsuperscript^𝑍𝐺21𝐿\displaystyle\gamma_{\ell}=\mathbbm{E}[(Z_{\ell}-\mathbb{E}[Z_{\ell}|\hat{Z}^{% *}_{G,\ell}])^{2}],\qquad\ell\in\{1,\ldots,L\}.italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = blackboard_E [ ( italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - blackboard_E [ italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] , roman_ℓ ∈ { 1 , … , italic_L } . (116)

Clearly, (101b) and (101c) are satisfied.

It can be verified that

I(X;X^G)𝐼𝑋subscriptsuperscript^𝑋𝐺\displaystyle I(X;\hat{X}^{*}_{G})italic_I ( italic_X ; over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ) =\displaystyle== I(Z;Z^G)𝐼𝑍subscriptsuperscript^𝑍𝐺\displaystyle I(Z;\hat{Z}^{*}_{G})italic_I ( italic_Z ; over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ) (117)
=\displaystyle== h(Z)h(Z|Z^G)𝑍conditional𝑍subscriptsuperscript^𝑍𝐺\displaystyle h(Z)-h(Z|\hat{Z}^{*}_{G})italic_h ( italic_Z ) - italic_h ( italic_Z | over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ) (118)
=\displaystyle== =1Lh(Z)h(Z|Z^G)superscriptsubscript1𝐿subscript𝑍conditional𝑍subscriptsuperscript^𝑍𝐺\displaystyle\sum_{\ell=1}^{L}h(Z_{\ell})-h(Z|\hat{Z}^{*}_{G})∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_h ( italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) - italic_h ( italic_Z | over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ) (119)
\displaystyle\geq =1Lh(Z)=1Lh(Z|Z^G,)superscriptsubscript1𝐿subscript𝑍superscriptsubscript1𝐿conditionalsubscript𝑍subscriptsuperscript^𝑍𝐺\displaystyle\sum_{\ell=1}^{L}h(Z_{\ell})-\sum_{\ell=1}^{L}h(Z_{\ell}|\hat{Z}^% {*}_{G,\ell})∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_h ( italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) - ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_h ( italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT ) (120)
=\displaystyle== =1Lh(Z)=1Lh(Z𝔼[Z|Z^G,]|Z^G,)superscriptsubscript1𝐿subscript𝑍superscriptsubscript1𝐿subscript𝑍conditional𝔼delimited-[]conditionalsubscript𝑍subscriptsuperscript^𝑍𝐺subscriptsuperscript^𝑍𝐺\displaystyle\sum_{\ell=1}^{L}h(Z_{\ell})-\sum_{\ell=1}^{L}h(Z_{\ell}-\mathbbm% {E}[Z_{\ell}|\hat{Z}^{*}_{G,\ell}]|\hat{Z}^{*}_{G,\ell})∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_h ( italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) - ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_h ( italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - blackboard_E [ italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT ] | over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT ) (121)
=\displaystyle== =1Lh(Z)=1Lh(Z𝔼[Z|Z^G,])superscriptsubscript1𝐿subscript𝑍superscriptsubscript1𝐿subscript𝑍𝔼delimited-[]conditionalsubscript𝑍subscriptsuperscript^𝑍𝐺\displaystyle\sum_{\ell=1}^{L}h(Z_{\ell})-\sum_{\ell=1}^{L}h(Z_{\ell}-\mathbbm% {E}[Z_{\ell}|\hat{Z}^{*}_{G,\ell}])∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_h ( italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) - ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_h ( italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - blackboard_E [ italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT ] ) (122)
=\displaystyle== =1L12log((2πe)λ)=1L12log((2πe)γ)superscriptsubscript1𝐿122𝜋𝑒subscript𝜆superscriptsubscript1𝐿122𝜋𝑒subscript𝛾\displaystyle\sum_{\ell=1}^{L}\frac{1}{2}\log\left((2\pi e)\lambda_{{\ell}}% \right)-\sum_{\ell=1}^{L}\frac{1}{2}\log\left((2\pi e)\gamma_{{\ell}}\right)∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log ( ( 2 italic_π italic_e ) italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) - ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log ( ( 2 italic_π italic_e ) italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) (123)
=\displaystyle== =1L12logλγ,superscriptsubscript1𝐿12subscript𝜆subscript𝛾\displaystyle\sum_{\ell=1}^{L}\frac{1}{2}\log\frac{\lambda_{{\ell}}}{\gamma_{{% \ell}}},∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG , (124)

where

  • (117) is due to the invertibility of unitary transformations,

  • (119) follows because Z1,,ZLsubscript𝑍1subscript𝑍𝐿Z_{1},\ldots,Z_{L}italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT are independent,

  • (120) follows from the chain rule and that conditioning does not increase entropy,

  • (122) follows because Z𝔼[Z|Z^G,]subscript𝑍𝔼delimited-[]conditionalsubscript𝑍subscriptsuperscript^𝑍𝐺Z_{\ell}-\mathbbm{E}[Z_{\ell}|\hat{Z}^{*}_{G,\ell}]italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - blackboard_E [ italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT ] is independent of Z^G,subscriptsuperscript^𝑍𝐺\hat{Z}^{*}_{G,\ell}over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT,

  • (123) follows because 𝔼[Z2]=λ𝔼delimited-[]superscriptsubscript𝑍2subscript𝜆\mathbb{E}[Z_{\ell}^{2}]=\lambda_{{\ell}}blackboard_E [ italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT and 𝔼[(Z𝔼[Z|Z^G,])2]=γ𝔼delimited-[]superscriptsubscript𝑍𝔼delimited-[]conditionalsubscript𝑍subscriptsuperscript^𝑍𝐺2subscript𝛾\mathbb{E}[(Z_{\ell}-\mathbb{E}[Z_{\ell}|\hat{Z}^{*}_{G,\ell}])^{2}]=\gamma_{{% \ell}}blackboard_E [ ( italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - blackboard_E [ italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT.

Next, consider the expected distortion loss as follows:

D𝔼[XX^G2]𝐷𝔼delimited-[]superscriptnorm𝑋subscriptsuperscript^𝑋𝐺2\displaystyle D\geq\mathbb{E}[\|X-\hat{X}^{*}_{G}\|^{2}]italic_D ≥ blackboard_E [ ∥ italic_X - over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] =\displaystyle== 𝔼[ZZ^G2]𝔼delimited-[]superscriptnorm𝑍subscriptsuperscript^𝑍𝐺2\displaystyle\mathbb{E}[\|Z-\hat{Z}^{*}_{G}\|^{2}]blackboard_E [ ∥ italic_Z - over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (125)
=\displaystyle== =1L𝔼[(ZZ^G,)2]superscriptsubscript1𝐿𝔼delimited-[]superscriptsubscript𝑍subscriptsuperscript^𝑍𝐺2\displaystyle\sum_{\ell=1}^{L}\mathbb{E}[(Z_{\ell}-\hat{Z}^{*}_{G,\ell})^{2}]∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT blackboard_E [ ( italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (126)
=\displaystyle== =1L𝔼[Z2]2𝔼[ZZ^G,]+𝔼[(Z^G,)2]superscriptsubscript1𝐿𝔼delimited-[]superscriptsubscript𝑍22𝔼delimited-[]subscript𝑍subscriptsuperscript^𝑍𝐺𝔼delimited-[]superscriptsubscriptsuperscript^𝑍𝐺2\displaystyle\sum_{\ell=1}^{L}\mathbb{E}[Z_{\ell}^{2}]-2\mathbb{E}[Z_{\ell}% \hat{Z}^{*}_{G,\ell}]+\mathbb{E}[(\hat{Z}^{*}_{G,\ell})^{2}]∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT blackboard_E [ italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] - 2 blackboard_E [ italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT ] + blackboard_E [ ( over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (127)
=\displaystyle== =1Lλ2𝔼[ZZ^G,]+λ^superscriptsubscript1𝐿subscript𝜆2𝔼delimited-[]subscript𝑍subscriptsuperscript^𝑍𝐺subscript^𝜆\displaystyle\sum_{\ell=1}^{L}\lambda_{{\ell}}-2\mathbb{E}[Z_{\ell}\hat{Z}^{*}% _{G,\ell}]+\hat{\lambda}_{{\ell}}∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - 2 blackboard_E [ italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT ] + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT (128)
=\displaystyle== =1Lλ2λ^(λγ)+λ^superscriptsubscript1𝐿subscript𝜆2subscript^𝜆subscript𝜆subscript𝛾subscript^𝜆\displaystyle\sum_{\ell=1}^{L}\lambda_{{\ell}}-2\sqrt{\hat{\lambda}_{{\ell}}(% \lambda_{{\ell}}-\gamma_{{\ell}})}+\hat{\lambda}_{{\ell}}∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - 2 square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT (129)

where

  • (125) is due to the invariance of Euclidean distance under unitary transformations,

  • (128) follows because 𝔼[Z2]=λ𝔼delimited-[]superscriptsubscript𝑍2subscript𝜆\mathbb{E}[Z_{\ell}^{2}]=\lambda_{{\ell}}blackboard_E [ italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT and 𝔼[(Z^G,)2]=λ^𝔼delimited-[]superscriptsubscriptsuperscript^𝑍𝐺2subscript^𝜆\mathbb{E}[(\hat{Z}^{*}_{G,\ell})^{2}]=\hat{\lambda}_{{\ell}}blackboard_E [ ( over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT,

  • (129) follows from the identity 𝔼[(Z𝔼[Z|Z^G,])2]=𝔼[Z2](𝔼[ZZ^G,])2(𝔼[Z^G,])1𝔼delimited-[]superscriptsubscript𝑍𝔼delimited-[]conditionalsubscript𝑍subscriptsuperscript^𝑍𝐺2𝔼delimited-[]superscriptsubscript𝑍2superscript𝔼delimited-[]subscript𝑍subscriptsuperscript^𝑍𝐺2superscript𝔼delimited-[]subscriptsuperscript^𝑍𝐺1\mathbb{E}[(Z_{\ell}-\mathbb{E}[Z_{\ell}|\hat{Z}^{*}_{G,\ell}])^{2}]=\mathbb{E% }[Z_{\ell}^{2}]-(\mathbb{E}[Z_{\ell}\hat{Z}^{*}_{G,\ell}])^{2}(\mathbb{E}[\hat% {Z}^{*}_{G,\ell}])^{-1}blackboard_E [ ( italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - blackboard_E [ italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = blackboard_E [ italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] - ( blackboard_E [ italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( blackboard_E [ over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT ] ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, and 𝔼[(Z𝔼[Z|Z^G,])2]=γ𝔼delimited-[]superscriptsubscript𝑍𝔼delimited-[]conditionalsubscript𝑍subscriptsuperscript^𝑍𝐺2subscript𝛾\mathbb{E}[(Z_{\ell}-\mathbb{E}[Z_{\ell}|\hat{Z}^{*}_{G,\ell}])^{2}]=\gamma_{\ell}blackboard_E [ ( italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - blackboard_E [ italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, 𝔼[Z2]=λ𝔼delimited-[]superscriptsubscript𝑍2subscript𝜆\mathbb{E}[Z_{\ell}^{2}]=\lambda_{{\ell}}blackboard_E [ italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, 𝔼[(Z^G,)2]=λ^𝔼delimited-[]superscriptsubscriptsuperscript^𝑍𝐺2subscript^𝜆\mathbb{E}[(\hat{Z}^{*}_{G,\ell})^{2}]=\hat{\lambda}_{{\ell}}blackboard_E [ ( over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT.

Finally, consider the perception loss:

PD(PX^GPX)𝑃𝐷conditionalsubscript𝑃subscriptsuperscript^𝑋𝐺subscript𝑃𝑋\displaystyle P\geq D(P_{\hat{X}^{*}_{G}}\|P_{X})italic_P ≥ italic_D ( italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ) =\displaystyle== 12(tr(ΛX1ΘΣX^GΘT)L+logdet(ΛX)det(ΘΣX^GΘT))12trsuperscriptsubscriptΛ𝑋1ΘsubscriptΣsubscriptsuperscript^𝑋𝐺superscriptΘ𝑇𝐿subscriptΛ𝑋ΘsubscriptΣsubscriptsuperscript^𝑋𝐺superscriptΘ𝑇\displaystyle\frac{1}{2}\left(\text{tr}(\Lambda_{X}^{-1}\Theta\Sigma_{\hat{X}^% {*}_{G}}\Theta^{T})-L+\log\frac{\det(\Lambda_{X})}{\det(\Theta\Sigma_{\hat{X}^% {*}_{G}}\Theta^{T})}\right)divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( tr ( roman_Λ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Θ roman_Σ start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) - italic_L + roman_log divide start_ARG roman_det ( roman_Λ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ) end_ARG start_ARG roman_det ( roman_Θ roman_Σ start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) end_ARG ) (130)
=\displaystyle== 12(tr(ΛX1ΛZ^G)L+logdet(ΛX)det(ΘΣX^GΘT))12trsuperscriptsubscriptΛ𝑋1subscriptΛsubscriptsuperscript^𝑍𝐺𝐿subscriptΛ𝑋ΘsubscriptΣsubscriptsuperscript^𝑋𝐺superscriptΘ𝑇\displaystyle\frac{1}{2}\left(\text{tr}(\Lambda_{X}^{-1}\Lambda_{\hat{Z}^{*}_{% G}})-L+\log\frac{\det(\Lambda_{X})}{\det(\Theta\Sigma_{\hat{X}^{*}_{G}}\Theta^% {T})}\right)divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( tr ( roman_Λ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) - italic_L + roman_log divide start_ARG roman_det ( roman_Λ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ) end_ARG start_ARG roman_det ( roman_Θ roman_Σ start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) end_ARG ) (131)
\displaystyle\geq 12(tr(ΛX1ΛZ^G)L+logdet(ΛX)det(ΛZ^G))12trsuperscriptsubscriptΛ𝑋1subscriptΛsubscriptsuperscript^𝑍𝐺𝐿subscriptΛ𝑋subscriptΛsubscriptsuperscript^𝑍𝐺\displaystyle\frac{1}{2}\left(\text{tr}(\Lambda_{X}^{-1}\Lambda_{\hat{Z}^{*}_{% G}})-L+\log\frac{\det(\Lambda_{X})}{\det(\Lambda_{\hat{Z}^{*}_{G}})}\right)divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( tr ( roman_Λ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) - italic_L + roman_log divide start_ARG roman_det ( roman_Λ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ) end_ARG start_ARG roman_det ( roman_Λ start_POSTSUBSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) end_ARG ) (132)
=\displaystyle== 12=1L(λ^λ1+logλλ^),12superscriptsubscript1𝐿subscript^𝜆subscript𝜆1subscript𝜆subscript^𝜆\displaystyle\frac{1}{2}\sum_{\ell=1}^{L}\left(\frac{\hat{\lambda}_{{\ell}}}{% \lambda_{{\ell}}}-1+\log\frac{\lambda_{{\ell}}}{\hat{\lambda}_{{\ell}}}\right),divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( divide start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - 1 + roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) , (133)

where

  • (131) follows because ΛX1superscriptsubscriptΛ𝑋1\Lambda_{X}^{-1}roman_Λ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT is a diagonal matrix and thus the trace depends on the diagonal elements of ΘΣX^GΘTΘsubscriptΣsubscriptsuperscript^𝑋𝐺superscriptΘ𝑇\Theta\Sigma_{\hat{X}^{*}_{G}}\Theta^{T}roman_Θ roman_Σ start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT which are equal to the diagonal elements of ΛZ^GsubscriptΛsubscriptsuperscript^𝑍𝐺\Lambda_{\hat{Z}^{*}_{G}}roman_Λ start_POSTSUBSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT,

  • (132) follows from Hadamard’s inequality for a positive semidefinite matrix.

Combining (124), (129), and (133) yields R(D,P)R(D,P)superscript𝑅𝐷𝑃𝑅𝐷𝑃R^{*}(D,P)\leq R(D,P)italic_R start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , italic_P ) ≤ italic_R ( italic_D , italic_P ).

Appendix D Proof of Theorem 4

First, we show that the optimization problem in (101) is convex. The second derivative of the objective function (101a) with respect to γsubscript𝛾\gamma_{\ell}italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT is 12γ212superscriptsubscript𝛾2\frac{1}{2\gamma_{\ell}^{2}}divide start_ARG 1 end_ARG start_ARG 2 italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG which is positive. The second derivative of the function in the constraint (101e) with respect to λ^subscript^𝜆\hat{\lambda}_{\ell}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT is 12λ^212superscriptsubscript^𝜆2\frac{1}{2\hat{\lambda}_{\ell}^{2}}divide start_ARG 1 end_ARG start_ARG 2 over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG which is again positive. It just remains to study the constraint (101d). The Hessian matrix of the function in this constraint is

[λγ2λ^312λ^(λγ)12λ^(λγ)λ^2(λγ)3].matrixsubscript𝜆subscript𝛾2subscriptsuperscript^𝜆312subscript^𝜆subscript𝜆subscript𝛾12subscript^𝜆subscript𝜆subscript𝛾subscript^𝜆2superscriptsubscript𝜆subscript𝛾3\displaystyle\begin{bmatrix}\frac{\sqrt{\lambda_{\ell}-\gamma_{\ell}}}{2\sqrt{% \hat{\lambda}^{3}_{\ell}}}&\frac{1}{2\sqrt{\hat{\lambda}_{\ell}(\lambda_{\ell}% -\gamma_{\ell})}}\\ \frac{1}{2\sqrt{\hat{\lambda}_{\ell}(\lambda_{\ell}-\gamma_{\ell})}}&\frac{% \sqrt{\hat{\lambda}_{\ell}}}{2\sqrt{(\lambda_{\ell}-\gamma_{\ell})^{3}}}\end{% bmatrix}.[ start_ARG start_ROW start_CELL divide start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 2 square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG end_ARG end_CELL start_CELL divide start_ARG 1 end_ARG start_ARG 2 square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_ARG end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_ARG end_ARG end_CELL start_CELL divide start_ARG square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 2 square-root start_ARG ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG end_ARG end_CELL end_ROW end_ARG ] . (134)

The determinant of the above matrix is zero, and the matrix has positive diagonal terms. Thus, it is a positive semidefinite matrix, which implies the convexity of the associated function. This proves the convexity of the program in (101).

Since the (D,P)𝐷𝑃(D,P)( italic_D , italic_P ) is assumed to be strictly feasible, the Slater’s condition is satisfied. This implies that the solution to this problem is equal to that of the following dual optimization problem

maxν1,ν2,η,ξ0min{γ,λ^}=1Lsubscriptsubscript𝜈1subscript𝜈2subscript𝜂subscript𝜉0subscriptsuperscriptsubscriptsubscript𝛾subscript^𝜆1𝐿\displaystyle\max_{\nu_{1},\nu_{2},\eta_{\ell},\xi_{\ell}\geq 0}\;\;\min_{\{% \gamma_{\ell},\hat{\lambda}_{\ell}\}_{\ell=1}^{L}}roman_max start_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT roman_min start_POSTSUBSCRIPT { italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT end_POSTSUBSCRIPT 12=1Llogλγ+ν1(=1L(λ2λ^(λγ)+λ^)D)12superscriptsubscript1𝐿subscript𝜆subscript𝛾subscript𝜈1superscriptsubscript1𝐿subscript𝜆2subscript^𝜆subscript𝜆subscript𝛾subscript^𝜆𝐷\displaystyle\;\;\frac{1}{2}\sum_{\ell=1}^{L}\log\frac{\lambda_{\ell}}{\gamma_% {{\ell}}}+\nu_{1}\left(\sum_{\ell=1}^{L}(\lambda_{{\ell}}-2\sqrt{\hat{\lambda}% _{{\ell}}(\lambda_{{\ell}}-\gamma_{{\ell}})}+\hat{\lambda}_{{\ell}})-D\right)divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - 2 square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) - italic_D )
+ν2(12=1L(λ^λ1+logλλ^)P)+=1Lξ(γλ)=1Lηλ^,subscript𝜈212superscriptsubscript1𝐿subscript^𝜆subscript𝜆1subscript𝜆subscript^𝜆𝑃superscriptsubscript1𝐿subscript𝜉subscript𝛾subscript𝜆superscriptsubscript1𝐿subscript𝜂subscript^𝜆\displaystyle\quad+\nu_{2}\left(\frac{1}{2}\sum_{\ell=1}^{L}\left(\frac{\hat{% \lambda}_{{\ell}}}{\lambda_{{\ell}}}-1+\log\frac{\lambda_{{\ell}}}{\hat{% \lambda}_{{\ell}}}\right)-P\right)+\sum_{\ell=1}^{L}\xi_{\ell}(\gamma_{\ell}-% \lambda_{\ell})-\sum_{\ell=1}^{L}\eta_{\ell}\hat{\lambda}_{\ell},+ italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( divide start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - 1 + roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) - italic_P ) + ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) - ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ,

where {ν1,ν2}subscript𝜈1subscript𝜈2\{\nu_{1},\nu_{2}\}{ italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } and {ξ,η}=1Lsuperscriptsubscriptsubscript𝜉subscript𝜂1𝐿\{\xi_{\ell},\eta_{\ell}\}_{\ell=1}^{L}{ italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT are nonnegative Lagrange multipliers. Note that the distortion function has implicit constraints λ^0subscript^𝜆0\hat{\lambda}_{\ell}\geq 0over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ≥ 0 and γλsubscript𝛾subscript𝜆\gamma_{\ell}\leq\lambda_{\ell}italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ≤ italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT. Moreover, the derivatives of the respective terms go to infinity when λ^subscript^𝜆\hat{\lambda}_{\ell}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT and γsubscript𝛾\gamma_{\ell}italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT approach these boundaries. For this reason, we cannot immediately write down the Karush-Kuhn-Tucker (KKT) conditions for the optimization problem, and instead, need to carefully consider the behaviour of the optimization problem close to these boundaries. Toward this end, we consider the following three different cases.

D-1 Case Where the Maximum for the Outer Optimization Occurs at ν1,ν2>0subscript𝜈1subscript𝜈20\nu_{1},\nu_{2}>0italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0

This is the case where both perception and distortion constraints are active. Let λ^subscriptsuperscript^𝜆\hat{\lambda}^{*}_{\ell}over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT and γsuperscriptsubscript𝛾\gamma_{\ell}^{*}italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT be the optimal solution to the inner minimization problem in (LABEL:Lagrange-dual-function) for the optimal ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. We first note that

λ^subscriptsuperscript^𝜆\displaystyle\hat{\lambda}^{*}_{\ell}over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT >\displaystyle>> 0.0\displaystyle 0.0 . (136)

This is because if λ^=0subscriptsuperscript^𝜆0\hat{\lambda}^{*}_{\ell}=0over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = 0, then we have P=𝑃P=\inftyitalic_P = ∞ which would violate the perception constraint.

Next, we show that the following strict inequality holds:

γsuperscriptsubscript𝛾\displaystyle\gamma_{\ell}^{*}italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT <\displaystyle<< λ.subscript𝜆\displaystyle\lambda_{\ell}.italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT . (137)

Suppose that the above strict inequality does not hold, i.e., γ=λsuperscriptsubscript𝛾subscript𝜆\gamma_{\ell}^{*}=\lambda_{\ell}italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT. We show that such γsuperscriptsubscript𝛾\gamma_{\ell}^{*}italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT cannot be the optimal solution to the inner minimization problem.

The Lagrangian term in (LABEL:Lagrange-dual-function) depends on γsubscript𝛾\gamma_{\ell}italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT and λ^subscript^𝜆\hat{\lambda}_{\ell}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT through the following function:

G(γ,λ^)subscript𝐺subscript𝛾subscript^𝜆\displaystyle G_{\ell}(\gamma_{\ell},\hat{\lambda}_{\ell})italic_G start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) =\displaystyle== 12logλγ+ν1(λ2λ^(λγ)+λ^)+ν22(λ^λ1+logλλ^)12subscript𝜆subscript𝛾subscript𝜈1subscript𝜆2subscript^𝜆subscript𝜆subscript𝛾subscript^𝜆subscript𝜈22subscript^𝜆subscript𝜆1subscript𝜆subscript^𝜆\displaystyle\frac{1}{2}\log\frac{\lambda_{\ell}}{\gamma_{\ell}}+\nu_{1}\left(% \lambda_{\ell}-2\sqrt{\hat{\lambda}_{\ell}(\lambda_{\ell}-\gamma_{\ell})}+\hat% {\lambda}_{\ell}\right)+\frac{\nu_{2}}{2}\left(\frac{\hat{\lambda}_{\ell}}{% \lambda_{\ell}}-1+\log\frac{\lambda_{\ell}}{\hat{\lambda}_{\ell}}\right)divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - 2 square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) + divide start_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ( divide start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - 1 + roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) (138)
+ξ(γλ)ηλ^.subscript𝜉subscript𝛾subscript𝜆subscript𝜂subscript^𝜆\displaystyle\hskip 28.45274pt+\xi_{\ell}(\gamma_{\ell}-\lambda_{\ell})-\eta_{% \ell}\hat{\lambda}_{\ell}.+ italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) - italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT .

Fix λ^=λ^subscript^𝜆superscriptsubscript^𝜆\hat{\lambda}_{\ell}=\hat{\lambda}_{\ell}^{*}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. When we deviate from γ=λsuperscriptsubscript𝛾subscript𝜆\gamma_{\ell}^{*}=\lambda_{\ell}italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT to γ=λϵsuperscriptsubscript𝛾subscript𝜆italic-ϵ\gamma_{\ell}^{\prime}=\lambda_{\ell}-\epsilonitalic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_ϵ for some small ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, the first order change in G(γ,λ^)subscript𝐺subscript𝛾superscriptsubscript^𝜆G_{\ell}(\gamma_{\ell},\hat{\lambda}_{\ell}^{*})italic_G start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) can be seen as follows:

G(γ,λ^)G(γ,λ^)subscript𝐺subscriptsuperscript𝛾subscriptsuperscript^𝜆subscript𝐺subscriptsuperscript𝛾subscriptsuperscript^𝜆\displaystyle G_{\ell}(\gamma^{*}_{\ell},\hat{\lambda}^{*}_{\ell})-G_{\ell}(% \gamma^{\prime}_{\ell},\hat{\lambda}^{*}_{\ell})italic_G start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) - italic_G start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) =\displaystyle== 12logλϵλ+2ν1ϵλ^ϵξ12subscript𝜆italic-ϵsubscript𝜆2subscript𝜈1italic-ϵsubscriptsuperscript^𝜆italic-ϵsubscript𝜉\displaystyle\frac{1}{2}\log\frac{\lambda_{\ell}-\epsilon}{\lambda_{\ell}}+2% \nu_{1}\sqrt{\epsilon\hat{\lambda}^{*}_{{\ell}}}-\epsilon\xi_{\ell}divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_ϵ end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT square-root start_ARG italic_ϵ over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - italic_ϵ italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT (139)
=\displaystyle== ϵ2λ+2ν1ϵλ^ϵξ+O(ϵ2)italic-ϵ2subscript𝜆2subscript𝜈1italic-ϵsubscriptsuperscript^𝜆italic-ϵsubscript𝜉𝑂superscriptitalic-ϵ2\displaystyle-\frac{\epsilon}{2\lambda_{\ell}}+2\nu_{1}\sqrt{\epsilon\hat{% \lambda}^{*}_{{\ell}}}-\epsilon\xi_{\ell}+O(\epsilon^{2})- divide start_ARG italic_ϵ end_ARG start_ARG 2 italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT square-root start_ARG italic_ϵ over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - italic_ϵ italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) (140)
=\displaystyle== 2ν1ϵλ^+O(ϵ)2subscript𝜈1italic-ϵsubscriptsuperscript^𝜆𝑂italic-ϵ\displaystyle 2\nu_{1}\sqrt{\epsilon\hat{\lambda}^{*}_{{\ell}}}+O(\epsilon)2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT square-root start_ARG italic_ϵ over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + italic_O ( italic_ϵ ) (141)

where we use the fact that log(1x)=x+O(x2)1𝑥𝑥𝑂superscript𝑥2\log(1-x)=-x+O(x^{2})roman_log ( 1 - italic_x ) = - italic_x + italic_O ( italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) for small x𝑥xitalic_x. Thus if ν1>0subscript𝜈10\nu_{1}>0italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0, since λ^>0superscriptsubscript^𝜆0\hat{\lambda}_{\ell}^{*}>0over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT > 0, for sufficiently small ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, we can strictly decrease G(γ,λ^)subscript𝐺subscriptsuperscript𝛾subscriptsuperscript^𝜆G_{\ell}(\gamma^{*}_{\ell},\hat{\lambda}^{*}_{\ell})italic_G start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ), while satisfying the implicit constraints. This contradicts the assumption that γ=λsubscriptsuperscript𝛾subscript𝜆\gamma^{*}_{\ell}=\lambda_{\ell}italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT is the optimal solution to the inner minimization problem. This proves (137), which implies that every component has positive rate.

The strict inequalities in (137) and (136) imply that in this case, the optimal solution occurs at the interior of the set {λ^0 and γλ}subscriptsuperscript^𝜆0 and superscriptsubscript𝛾subscript𝜆\{\hat{\lambda}^{*}_{\ell}\geq 0\text{\ and\ }\gamma_{\ell}^{*}\leq\lambda_{% \ell}\}{ over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ≥ 0 and italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≤ italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT }. This allows us to write down the KKT conditions for the optimal primal variables (γ,λ^)superscriptsubscript𝛾superscriptsubscript^𝜆(\gamma_{\ell}^{*},\hat{\lambda}_{{\ell}}^{*})( italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) and the optimal dual variables {ν1,ν2}subscript𝜈1subscript𝜈2\{\nu_{1},\nu_{2}\}{ italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } and {ξ,η}=1Lsuperscriptsubscriptsubscript𝜉subscript𝜂1𝐿\{\xi_{\ell},\eta_{\ell}\}_{\ell=1}^{L}{ italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT as follows:

12γν1λ^λγξ12subscriptsuperscript𝛾subscript𝜈1superscriptsubscript^𝜆subscript𝜆subscriptsuperscript𝛾subscript𝜉\displaystyle\frac{1}{2\gamma^{*}_{{\ell}}}-\nu_{1}\sqrt{\frac{{\hat{\lambda}_% {{\ell}}^{*}}}{{\lambda_{{\ell}}-\gamma^{*}_{{\ell}}}}}-\xi_{\ell}divide start_ARG 1 end_ARG start_ARG 2 italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT square-root start_ARG divide start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG end_ARG - italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT =\displaystyle== 0,0\displaystyle 0,0 , (142a)
ν1(λγλ^+1)+12ν2(1λ1λ^)ηsubscript𝜈1subscript𝜆subscriptsuperscript𝛾subscriptsuperscript^𝜆112subscript𝜈21subscript𝜆1subscriptsuperscript^𝜆subscript𝜂\displaystyle\nu_{1}\left(-\sqrt{\frac{\lambda_{{\ell}}-\gamma^{*}_{{\ell}}}{% \hat{\lambda}^{*}_{{\ell}}}}+1\right)+\frac{1}{2}\nu_{2}\left(\frac{1}{\lambda% _{{\ell}}}-\frac{1}{\hat{\lambda}^{*}_{{\ell}}}\right)-\eta_{\ell}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( - square-root start_ARG divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG end_ARG + 1 ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) - italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT =\displaystyle== 0,0\displaystyle 0,0 , (142b)
ξ(γλ)subscript𝜉subscriptsuperscript𝛾subscript𝜆\displaystyle\xi_{\ell}(\gamma^{*}_{\ell}-\lambda_{\ell})italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) =\displaystyle== 0,0\displaystyle 0,0 , (142c)
ηλ^subscript𝜂subscriptsuperscript^𝜆\displaystyle\eta_{\ell}\hat{\lambda}^{*}_{\ell}italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT =\displaystyle== 0,0\displaystyle 0,0 , (142d)
ν1(=1L(λ2λ^(λγ)+λ^)D)subscript𝜈1superscriptsubscript1𝐿subscript𝜆2superscriptsubscript^𝜆subscript𝜆superscriptsubscript𝛾superscriptsubscript^𝜆𝐷\displaystyle\nu_{1}\left(\sum_{\ell=1}^{L}\left(\lambda_{{\ell}}-2\sqrt{\hat{% \lambda}_{{\ell}}^{*}(\lambda_{{\ell}}-\gamma_{{\ell}}^{*})}+\hat{\lambda}_{{% \ell}}^{*}\right)-D\right)italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - 2 square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - italic_D ) =\displaystyle== 0,0\displaystyle 0,0 , (142e)
ν2(=1L12(λ^λ1+logλλ^)P)subscript𝜈2superscriptsubscript1𝐿12superscriptsubscript^𝜆subscript𝜆1subscript𝜆superscriptsubscript^𝜆𝑃\displaystyle\nu_{2}\left(\sum_{\ell=1}^{L}\frac{1}{2}\left(\frac{\hat{\lambda% }_{{\ell}}^{*}}{\lambda_{{\ell}}}-1+\log\frac{\lambda_{{\ell}}}{\hat{\lambda}_% {{\ell}}^{*}}\right)-P\right)italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - 1 + roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG ) - italic_P ) =\displaystyle== 0,0\displaystyle 0,0 , (142f)

along with primal and dual feasibility constraints, i.e., η0subscript𝜂0\eta_{\ell}\geq 0italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ≥ 0, ξ0subscript𝜉0\xi_{\ell}\geq 0italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ≥ 0 and (34e)-(34e).

Due to the strict inequalities (137) and (136), we have that ξ=0subscript𝜉0\xi_{\ell}=0italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = 0 and η=0subscript𝜂0\eta_{\ell}=0italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = 0. Then, from condition (142a), we can write λ^subscriptsuperscript^𝜆\hat{\lambda}^{*}_{\ell}over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT as follows

λ^=λγ^4γ2ν12.superscriptsubscript^𝜆subscript𝜆subscriptsuperscript^𝛾4subscriptsuperscript𝛾absent2superscriptsubscript𝜈12\displaystyle\hat{\lambda}_{\ell}^{*}=\frac{\lambda_{\ell}-\hat{\gamma}^{*}_{% \ell}}{4\gamma^{*2}_{\ell}\nu_{1}^{2}}.over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - over^ start_ARG italic_γ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG 4 italic_γ start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (143)

Plugging (143) into (142b) yields the following second-order equation in γsuperscriptsubscript𝛾\gamma_{\ell}^{*}italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT

ν1(12ν1γ)=12ν2(4γ2ν12λγ1λ).subscript𝜈112subscript𝜈1subscriptsuperscript𝛾12subscript𝜈24superscriptsubscript𝛾absent2superscriptsubscript𝜈12subscript𝜆superscriptsubscript𝛾1subscript𝜆\displaystyle\nu_{1}(1-2\nu_{1}\gamma^{*}_{\ell})=\frac{1}{2}\nu_{2}\left(% \frac{4\gamma_{\ell}^{*2}\nu_{1}^{2}}{\lambda_{\ell}-\gamma_{\ell}^{*}}-\frac{% 1}{\lambda_{\ell}}\right).italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 - 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( divide start_ARG 4 italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) . (144)

Note that as γsubscriptsuperscript𝛾\gamma^{*}_{\ell}italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT varies from 00 to λsubscript𝜆\lambda_{\ell}italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, the left-hand side of (144) decreases monotonically from ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to (12ν1λ)ν112subscript𝜈1subscript𝜆subscript𝜈1(1-2\nu_{1}\lambda_{\ell})\nu_{1}( 1 - 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT while the right-hand side of (144) increases monotonically from ν22λsubscript𝜈22subscript𝜆-\frac{\nu_{2}}{2\lambda_{\ell}}- divide start_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG to ++\infty+ ∞ So, this equation has a unique solution in the interval (0,λ)0subscript𝜆(0,\lambda_{\ell})( 0 , italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ). The equation (144) is quadratic, so it can solved analytically. The solution gives (IV-C) and (37).

D-2 Case Where the Maximum for the Outer Optimization Occurs at ν1>0,ν2=0formulae-sequencesubscript𝜈10subscript𝜈20\nu_{1}>0,\nu_{2}=0italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0 , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0

This is the case where the distortion metric is active but the perception metric is inactive. Clearly, this reduces to the traditional rate-distortion function.

D-3 Case Where the Maximum for the Outer Optimization Occurs at ν1=0subscript𝜈10\nu_{1}=0italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0

This is the case where the distortion metric is inactive, so the inner minimization problem in (LABEL:Lagrange-dual-function) decouples into two independent minimizations, one for γsubscript𝛾\gamma_{\ell}italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT and the other one for λ^subscript^𝜆\hat{\lambda}_{\ell}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, i.e.,

min{γ,λ^}=1L12=1Llogλγ+ν2(12=1L(λ^λ1+logλλ^)P)+=1Lξ(γλ)=1Lηλ^subscriptsuperscriptsubscriptsubscript𝛾subscript^𝜆1𝐿12superscriptsubscript1𝐿subscript𝜆subscript𝛾subscript𝜈212superscriptsubscript1𝐿subscript^𝜆subscript𝜆1subscript𝜆subscript^𝜆𝑃superscriptsubscript1𝐿subscript𝜉subscript𝛾subscript𝜆superscriptsubscript1𝐿subscript𝜂subscript^𝜆\displaystyle\min_{\{\gamma_{\ell},\hat{\lambda}_{\ell}\}_{\ell=1}^{L}}\;\;% \frac{1}{2}\sum_{\ell=1}^{L}\log\frac{\lambda_{\ell}}{\gamma_{{\ell}}}+\nu_{2}% \left(\frac{1}{2}\sum_{\ell=1}^{L}\left(\frac{\hat{\lambda}_{{\ell}}}{\lambda_% {{\ell}}}-1+\log\frac{\lambda_{{\ell}}}{\hat{\lambda}_{{\ell}}}\right)-P\right% )+\sum_{\ell=1}^{L}\xi_{\ell}(\gamma_{\ell}-\lambda_{\ell})-\sum_{\ell=1}^{L}% \eta_{\ell}\hat{\lambda}_{\ell}roman_min start_POSTSUBSCRIPT { italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( divide start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - 1 + roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) - italic_P ) + ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) - ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT
=min{γ}=1L12=1Llogλγ+=1Lξ(γλ)absentsubscriptsuperscriptsubscriptsubscript𝛾1𝐿12superscriptsubscript1𝐿subscript𝜆subscript𝛾superscriptsubscript1𝐿subscript𝜉subscript𝛾subscript𝜆\displaystyle\quad=\min_{\{\gamma_{\ell}\}_{\ell=1}^{L}}\;\;\frac{1}{2}\sum_{% \ell=1}^{L}\log\frac{\lambda_{\ell}}{\gamma_{{\ell}}}+\sum_{\ell=1}^{L}\xi_{% \ell}(\gamma_{\ell}-\lambda_{\ell})= roman_min start_POSTSUBSCRIPT { italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT )
+min{λ^}=1Lν2(12=1L(λ^λ1+logλλ^)P)=1Lηλ^.subscriptsuperscriptsubscriptsubscript^𝜆1𝐿subscript𝜈212superscriptsubscript1𝐿subscript^𝜆subscript𝜆1subscript𝜆subscript^𝜆𝑃superscriptsubscript1𝐿subscript𝜂subscript^𝜆\displaystyle\qquad+\min_{\{\hat{\lambda}_{\ell}\}_{\ell=1}^{L}}\nu_{2}\left(% \frac{1}{2}\sum_{\ell=1}^{L}\left(\frac{\hat{\lambda}_{{\ell}}}{\lambda_{{\ell% }}}-1+\log\frac{\lambda_{{\ell}}}{\hat{\lambda}_{{\ell}}}\right)-P\right)-\sum% _{\ell=1}^{L}\eta_{\ell}\hat{\lambda}_{\ell}.+ roman_min start_POSTSUBSCRIPT { over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( divide start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - 1 + roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) - italic_P ) - ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT . (145)

For the first optimization problem in (145), its KKT conditions are given by

12γξ12subscriptsuperscript𝛾subscript𝜉\displaystyle\frac{1}{2\gamma^{*}_{{\ell}}}-\xi_{\ell}divide start_ARG 1 end_ARG start_ARG 2 italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT =\displaystyle== 0,0\displaystyle 0,0 , (146)
ξ(γλ)subscript𝜉subscriptsuperscript𝛾subscript𝜆\displaystyle\xi_{\ell}(\gamma^{*}_{\ell}-\lambda_{\ell})italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) =\displaystyle== 0.0\displaystyle 0.0 . (147)

The above two conditions imply that

γ=λ.superscriptsubscript𝛾subscript𝜆\displaystyle\gamma_{\ell}^{*}=\lambda_{\ell}.italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT . (148)

So each component has zero rate.

For the second minimization problem in (145), this is the Lagrangian dual of a feasibility problem with the perception constraint only. Thus, we can choose λ^superscriptsubscript^𝜆\hat{\lambda}_{\ell}^{*}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT to satisfy the primal constraints:

=1LP(λ^)P,andλ^0.formulae-sequencesuperscriptsubscript1𝐿subscript𝑃subscriptsuperscript^𝜆𝑃andsuperscriptsubscript^𝜆0\sum_{\ell=1}^{L}{P}_{\ell}(\hat{\lambda}^{*}_{\ell})\leq P,\ \ \text{and}\ \ % \ \hat{\lambda}_{\ell}^{*}\geq 0.∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) ≤ italic_P , and over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≥ 0 . (149)

Note that despite that the distortion constraint is already assumed to be inactive, we still need to impose an additional distortion constraint on λ^superscriptsubscript^𝜆\hat{\lambda}_{\ell}^{*}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT:

=1Lλ+λ^D.superscriptsubscript1𝐿subscript𝜆subscriptsuperscript^𝜆𝐷\sum_{\ell=1}^{L}\lambda_{\ell}+\hat{\lambda}^{*}_{\ell}\leq D.∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT + over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ≤ italic_D . (150)

This is because not all λ^superscriptsubscript^𝜆\hat{\lambda}_{\ell}^{*}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT’s satisfying (149) satisfy the constraint (150). A constraint being inactive simply means that if the constraint is removed, there is already at least one optimal solution that automatically satisfies the constraint. In this case, there are multiple optimal solutions, all giving the same objective value (of zero rate). So we need to restrict to the ones that satisfy (150). Note that the left-hand side of (150) is the distortion of the reconstruction at zero rate.

Appendix E Proof of Theorem 5

We now establish the RDP Function with the Wasserstein-2 distance as the perception metric. The proof follows similar steps to those of the KL-divergence metric in Appendix C. We just need to rewrite the lower bounding steps for the perception metric. Let PX^G|Xsubscript𝑃conditionalsubscriptsuperscript^𝑋𝐺𝑋P_{\hat{X}^{*}_{G}|X}italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT | italic_X end_POSTSUBSCRIPT be the optimal conditional distribution of the following optimization program

R(D,P)=𝑅𝐷𝑃absent\displaystyle R(D,P)=italic_R ( italic_D , italic_P ) = infPX^G|Xsubscriptinfimumsubscript𝑃conditionalsubscript^𝑋𝐺𝑋\displaystyle\inf_{P_{\hat{X}_{G}|X}}roman_inf start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT | italic_X end_POSTSUBSCRIPT end_POSTSUBSCRIPT I(X;X^G),𝐼𝑋subscript^𝑋𝐺\displaystyle I(X;\hat{X}_{G}),italic_I ( italic_X ; over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ) , (151a)
s.t. 𝔼[XX^G2]D,𝔼delimited-[]superscriptnorm𝑋subscript^𝑋𝐺2𝐷\displaystyle\mathbb{E}[\|X-\hat{X}_{G}\|^{2}]\leq D,blackboard_E [ ∥ italic_X - over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_D , (151c)
W22(PX,PX^G)P,superscriptsubscript𝑊22subscript𝑃𝑋subscript𝑃subscript^𝑋𝐺𝑃\displaystyle W_{2}^{2}(P_{X},P_{\hat{X}_{G}})\leq P,italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ italic_P ,

where X^Gsubscript^𝑋𝐺\hat{X}_{G}over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT has mean zero and is jointly Gaussian with X𝑋Xitalic_X. Let Z^G=ΘX^Gsubscriptsuperscript^𝑍𝐺Θsubscriptsuperscript^𝑋𝐺\hat{Z}^{*}_{G}=\Theta\hat{X}^{*}_{G}over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT = roman_Θ over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT and ΣX^GsubscriptΣsubscriptsuperscript^𝑋𝐺\Sigma_{\hat{X}^{*}_{G}}roman_Σ start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT be the covariance matrix of X^Gsubscriptsuperscript^𝑋𝐺\hat{X}^{*}_{G}over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT and ΛZ^GsubscriptΛsubscriptsuperscript^𝑍𝐺\Lambda_{\hat{Z}^{*}_{G}}roman_Λ start_POSTSUBSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT be a diagonal matrix whose diagonal elements coincide with those of ΘΣX^GΘTΘsubscriptΣsubscriptsuperscript^𝑋𝐺superscriptΘ𝑇\Theta\Sigma_{\hat{X}^{*}_{G}}\Theta^{T}roman_Θ roman_Σ start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, i.e.,

ΛZ^G=diagL(λ^1,,λ^L).subscriptΛsubscriptsuperscript^𝑍𝐺superscriptdiag𝐿subscript^𝜆1subscript^𝜆𝐿\displaystyle\Lambda_{\hat{Z}^{*}_{G}}=\text{diag}^{L}(\hat{\lambda}_{1},% \ldots,\hat{\lambda}_{L}).roman_Λ start_POSTSUBSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT = diag start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) . (152)

The lower bounding steps for the perception metric are as follows

W22(PX,PX^G)superscriptsubscript𝑊22subscript𝑃𝑋subscript𝑃subscriptsuperscript^𝑋𝐺\displaystyle W_{2}^{2}(P_{X},P_{\hat{X}^{*}_{G}})italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) =\displaystyle== tr(ΣX+ΣX^G2(ΣX12ΣX^GΣX12)12)trsubscriptΣ𝑋subscriptΣsubscriptsuperscript^𝑋𝐺2superscriptsuperscriptsubscriptΣ𝑋12subscriptΣsubscriptsuperscript^𝑋𝐺superscriptsubscriptΣ𝑋1212\displaystyle\text{tr}(\Sigma_{X}+\Sigma_{\hat{X}^{*}_{G}}-2(\Sigma_{X}^{\frac% {1}{2}}\Sigma_{\hat{X}^{*}_{G}}\Sigma_{X}^{\frac{1}{2}})^{\frac{1}{2}})tr ( roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT + roman_Σ start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT - 2 ( roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) (153)
=\displaystyle== tr(ΘΣXΘT+ΘΣX^GΘT2Θ(ΣX12ΣX^GΣX12)12ΘT)trΘsubscriptΣ𝑋superscriptΘ𝑇ΘsubscriptΣsubscriptsuperscript^𝑋𝐺superscriptΘ𝑇2ΘsuperscriptsuperscriptsubscriptΣ𝑋12subscriptΣsubscriptsuperscript^𝑋𝐺superscriptsubscriptΣ𝑋1212superscriptΘ𝑇\displaystyle\text{tr}(\Theta\Sigma_{X}\Theta^{T}+\Theta\Sigma_{\hat{X}^{*}_{G% }}\Theta^{T}-2\Theta(\Sigma_{X}^{\frac{1}{2}}\Sigma_{\hat{X}^{*}_{G}}\Sigma_{X% }^{\frac{1}{2}})^{\frac{1}{2}}\Theta^{T})tr ( roman_Θ roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + roman_Θ roman_Σ start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - 2 roman_Θ ( roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) (154)
=\displaystyle== tr(ΘΣXΘT+ΘΣX^GΘT2(ΘΣX12ΣX^GΣX12ΘT)12)trΘsubscriptΣ𝑋superscriptΘ𝑇ΘsubscriptΣsubscriptsuperscript^𝑋𝐺superscriptΘ𝑇2superscriptΘsuperscriptsubscriptΣ𝑋12subscriptΣsubscriptsuperscript^𝑋𝐺superscriptsubscriptΣ𝑋12superscriptΘ𝑇12\displaystyle\text{tr}(\Theta\Sigma_{X}\Theta^{T}+\Theta\Sigma_{\hat{X}^{*}_{G% }}\Theta^{T}-2(\Theta\Sigma_{X}^{\frac{1}{2}}\Sigma_{\hat{X}^{*}_{G}}\Sigma_{X% }^{\frac{1}{2}}\Theta^{T})^{\frac{1}{2}})tr ( roman_Θ roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + roman_Θ roman_Σ start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - 2 ( roman_Θ roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) (155)
=\displaystyle== tr(ΘΣXΘT+ΘΣX^GΘT2(ΘΣX12ΘTΘΣX^GΘTΘΣX12ΘT)12)trΘsubscriptΣ𝑋superscriptΘ𝑇ΘsubscriptΣsubscriptsuperscript^𝑋𝐺superscriptΘ𝑇2superscriptΘsuperscriptsubscriptΣ𝑋12superscriptΘ𝑇ΘsubscriptΣsubscriptsuperscript^𝑋𝐺superscriptΘ𝑇ΘsuperscriptsubscriptΣ𝑋12superscriptΘ𝑇12\displaystyle\text{tr}(\Theta\Sigma_{X}\Theta^{T}+\Theta\Sigma_{\hat{X}^{*}_{G% }}\Theta^{T}-2(\Theta\Sigma_{X}^{\frac{1}{2}}\Theta^{T}\Theta\Sigma_{\hat{X}^{% *}_{G}}\Theta^{T}\Theta\Sigma_{X}^{\frac{1}{2}}\Theta^{T})^{\frac{1}{2}})tr ( roman_Θ roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + roman_Θ roman_Σ start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - 2 ( roman_Θ roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_Θ roman_Σ start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_Θ roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) (156)
=\displaystyle== tr(ΘΣXΘT+ΘΣX^GΘT2((ΘΣXΘT)12ΘΣX^GΘT(ΘΣXΘT)12)12)trΘsubscriptΣ𝑋superscriptΘ𝑇ΘsubscriptΣsubscriptsuperscript^𝑋𝐺superscriptΘ𝑇2superscriptsuperscriptΘsubscriptΣ𝑋superscriptΘ𝑇12ΘsubscriptΣsubscriptsuperscript^𝑋𝐺superscriptΘ𝑇superscriptΘsubscriptΣ𝑋superscriptΘ𝑇1212\displaystyle\text{tr}(\Theta\Sigma_{X}\Theta^{T}+\Theta\Sigma_{\hat{X}^{*}_{G% }}\Theta^{T}-2((\Theta\Sigma_{X}\Theta^{T})^{\frac{1}{2}}\Theta\Sigma_{\hat{X}% ^{*}_{G}}\Theta^{T}(\Theta\Sigma_{X}\Theta^{T})^{\frac{1}{2}})^{\frac{1}{2}})tr ( roman_Θ roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + roman_Θ roman_Σ start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - 2 ( ( roman_Θ roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT roman_Θ roman_Σ start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( roman_Θ roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) (157)
=\displaystyle== W22(PΘX,PΘX^G)superscriptsubscript𝑊22subscript𝑃Θ𝑋subscript𝑃Θsubscriptsuperscript^𝑋𝐺\displaystyle W_{2}^{2}(P_{\Theta X},P_{\Theta\hat{X}^{*}_{G}})italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT roman_Θ italic_X end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT roman_Θ over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) (158)
=\displaystyle== W22(PZ,PZ^G)superscriptsubscript𝑊22subscript𝑃𝑍subscript𝑃subscriptsuperscript^𝑍𝐺\displaystyle W_{2}^{2}(P_{Z},P_{\hat{Z}^{*}_{G}})italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) (159)
\displaystyle\geq =1LW22(PZ,PZ^G,)superscriptsubscript1𝐿superscriptsubscript𝑊22subscript𝑃subscript𝑍subscript𝑃subscriptsuperscript^𝑍𝐺\displaystyle\sum_{\ell=1}^{L}W_{2}^{2}(P_{Z_{\ell}},P_{\hat{Z}^{*}_{G,\ell}})∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) (160)
=\displaystyle== =1L(𝔼[(Z)2]𝔼[(Z^G,)2])2superscriptsubscript1𝐿superscript𝔼delimited-[]superscriptsubscript𝑍2𝔼delimited-[]superscriptsubscriptsuperscript^𝑍𝐺22\displaystyle\sum_{\ell=1}^{L}(\sqrt{\mathbbm{E}[(Z_{\ell})^{2}]}-\sqrt{% \mathbbm{E}[(\hat{Z}^{*}_{G,\ell})^{2}]})^{2}∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( square-root start_ARG blackboard_E [ ( italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG - square-root start_ARG blackboard_E [ ( over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (161)
=\displaystyle== =1L(λλ^)2,superscriptsubscript1𝐿superscriptsubscript𝜆subscript^𝜆2\displaystyle\sum_{\ell=1}^{L}\left(\sqrt{\lambda_{{\ell}}}-\sqrt{\hat{\lambda% }_{{\ell}}}\right)^{2},∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( square-root start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (162)

where

  • (154) follows because the trace is invariant under unitary transformations;

  • (155) and (157) follow because for a given matrix A𝐴Aitalic_A, (ΘAΘT)12=ΘA12ΘTsuperscriptΘ𝐴superscriptΘ𝑇12Θsuperscript𝐴12superscriptΘ𝑇(\Theta A\Theta^{T})^{\frac{1}{2}}=\Theta A^{\frac{1}{2}}\Theta^{T}( roman_Θ italic_A roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT = roman_Θ italic_A start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT since ΘΘ\Thetaroman_Θ is a unitary matrix;

  • (156) follows because ΘTΘ=IsuperscriptΘ𝑇Θ𝐼\Theta^{T}\Theta=Iroman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_Θ = italic_I;

  • (159) follows from the definitions Z=ΘX𝑍Θ𝑋Z=\Theta Xitalic_Z = roman_Θ italic_X and Z^G=ΘX^Gsubscriptsuperscript^𝑍𝐺Θsubscriptsuperscript^𝑋𝐺\hat{Z}^{*}_{G}=\Theta\hat{X}^{*}_{G}over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT = roman_Θ over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT;

  • (160) follows from the tensorization property of Wasserstein-2 distance, i.e., for given distributions PX1X2subscript𝑃subscript𝑋1subscript𝑋2P_{X_{1}X_{2}}italic_P start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and PY1Y2subscript𝑃subscript𝑌1subscript𝑌2P_{Y_{1}Y_{2}}italic_P start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, we have W22(PX1X2,PY1Y2)W22(PX1,PY1)+W22(PX2,PY2)superscriptsubscript𝑊22subscript𝑃subscript𝑋1subscript𝑋2subscript𝑃subscript𝑌1subscript𝑌2superscriptsubscript𝑊22subscript𝑃subscript𝑋1subscript𝑃subscript𝑌1superscriptsubscript𝑊22subscript𝑃subscript𝑋2subscript𝑃subscript𝑌2W_{2}^{2}(P_{X_{1}X_{2}},P_{Y_{1}Y_{2}})\geq W_{2}^{2}(P_{X_{1}},P_{Y_{1}})+W_% {2}^{2}(P_{X_{2}},P_{Y_{2}})italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≥ italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT );

  • (162) follows from (2) and (152).

On the other hand, the inequality in (160) becomes an equality if X^G=ΘTZ^Gsubscriptsuperscript^𝑋𝐺superscriptΘ𝑇subscriptsuperscript^𝑍𝐺\hat{X}^{*}_{G}=\Theta^{T}\hat{Z}^{*}_{G}over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT = roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT with Z^Gsubscriptsuperscript^𝑍𝐺\hat{Z}^{*}_{G}over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT constructed in such a way that (Z,Z^G,)subscript𝑍subscriptsuperscript^𝑍𝐺(Z_{\ell},\hat{Z}^{*}_{G,\ell})( italic_Z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , over^ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G , roman_ℓ end_POSTSUBSCRIPT ), {1,,L}1𝐿\ell\in\{1,\ldots,L\}roman_ℓ ∈ { 1 , … , italic_L }, are mutually independent and their covariance matrices are given by (27). Thus, the RDP function for the Wassertein-2 distance as perception metric is given by the following optimization problem:

R(D,P)=min{λ^,γ}=1L12=1Llogλγ𝑅𝐷𝑃subscriptsuperscriptsubscriptsubscript^𝜆subscript𝛾1𝐿12superscriptsubscript1𝐿subscript𝜆subscript𝛾\displaystyle R(D,P)=\min_{\{\hat{\lambda}_{\ell},\gamma_{{\ell}}\}_{\ell=1}^{% L}}\;\;\frac{1}{2}\sum_{\ell=1}^{L}\log\frac{\lambda_{\ell}}{\gamma_{{\ell}}}italic_R ( italic_D , italic_P ) = roman_min start_POSTSUBSCRIPT { over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG (163a)
s.t.0<γλ,s.t.0subscript𝛾subscript𝜆\displaystyle\hskip 71.13188pt\text{s.t.}\qquad 0<\gamma_{\ell}\leq\lambda_{% \ell},s.t. 0 < italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ≤ italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , (163b)
0λ^,0subscript^𝜆\displaystyle\hskip 113.81102pt0\leq\hat{\lambda}_{\ell},0 ≤ over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , (163c)
=1L(λ2λ^(λγ)+λ^)D,superscriptsubscript1𝐿subscript𝜆2subscript^𝜆subscript𝜆subscript𝛾subscript^𝜆𝐷\displaystyle\hskip 113.81102pt\sum_{\ell=1}^{L}\left(\lambda_{{\ell}}-2\sqrt{% \hat{\lambda}_{{\ell}}(\lambda_{{\ell}}-\gamma_{{\ell}})}+\hat{\lambda}_{{\ell% }}\right)\leq D,∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - 2 square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) ≤ italic_D , (163d)
=1L(λλ^)2P.superscriptsubscript1𝐿superscriptsubscript𝜆subscript^𝜆2𝑃\displaystyle\hskip 113.81102pt\sum_{\ell=1}^{L}\left(\sqrt{\lambda_{{\ell}}}-% \sqrt{\hat{\lambda}_{{\ell}}}\right)^{2}\leq P.∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( square-root start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_P . (163e)

Appendix F Proof of Theorem 6

First, note that the optimization problem is convex for the Wasserstein-2 distance as justified below. The argument for the rate and distortion constraints is the same as the KL-divergence metric. The second derivative of the perception constraint in (163e) with respect to λ^subscript^𝜆\hat{\lambda}_{\ell}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT is 12λλ^312subscript𝜆subscriptsuperscript^𝜆3\frac{1}{2}\sqrt{\frac{\lambda_{\ell}}{\hat{\lambda}^{3}_{\ell}}}divide start_ARG 1 end_ARG start_ARG 2 end_ARG square-root start_ARG divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG end_ARG, which is positive.

The optimization problem can be analyzed in the same way as in Appendix D, except the case of ν1,ν2>0subscript𝜈1subscript𝜈20\nu_{1},\nu_{2}>0italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0, which is discussed as follows. Here, we need a different proof to show the inequality

λ^subscriptsuperscript^𝜆\displaystyle\hat{\lambda}^{*}_{\ell}over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT >\displaystyle>> 0.0\displaystyle 0.0 . (164)

(The proof uses the same technique as the one showing γ<λsuperscriptsubscript𝛾subscript𝜆\gamma_{\ell}^{*}<\lambda_{\ell}italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT < italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT in Appendix D-1.) Consider the following Lagrange dual optimization

maxν1,ν2,η,ξ0min{γ,λ^}=1Lsubscriptsubscript𝜈1subscript𝜈2subscript𝜂subscript𝜉0subscriptsuperscriptsubscriptsubscript𝛾subscript^𝜆1𝐿\displaystyle\max_{\nu_{1},\nu_{2},\eta_{\ell},\xi_{\ell}\geq 0}\;\;\min_{\{% \gamma_{\ell},\hat{\lambda}_{\ell}\}_{\ell=1}^{L}}roman_max start_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT roman_min start_POSTSUBSCRIPT { italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT end_POSTSUBSCRIPT 12=1Llogλγ+ν1(=1L(λ2λ^(λγ)+λ^)D)12superscriptsubscript1𝐿subscript𝜆subscript𝛾subscript𝜈1superscriptsubscript1𝐿subscript𝜆2subscript^𝜆subscript𝜆subscript𝛾subscript^𝜆𝐷\displaystyle\;\;\frac{1}{2}\sum_{\ell=1}^{L}\log\frac{\lambda_{\ell}}{\gamma_% {{\ell}}}+\nu_{1}\left(\sum_{\ell=1}^{L}(\lambda_{{\ell}}-2\sqrt{\hat{\lambda}% _{{\ell}}(\lambda_{{\ell}}-\gamma_{{\ell}})}+\hat{\lambda}_{{\ell}})-D\right)divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - 2 square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) - italic_D ) (165)
+ν2(=1L(λλ^)2P)+=1Lξ(γλ)=1Lηλ^.subscript𝜈2superscriptsubscript1𝐿superscriptsubscript𝜆subscript^𝜆2𝑃superscriptsubscript1𝐿subscript𝜉subscript𝛾subscript𝜆superscriptsubscript1𝐿subscript𝜂subscript^𝜆\displaystyle+\nu_{2}\left(\sum_{\ell=1}^{L}\left(\sqrt{\lambda_{{\ell}}}-% \sqrt{\hat{\lambda}_{{\ell}}}\right)^{2}-P\right)+\sum_{\ell=1}^{L}\xi_{\ell}(% \gamma_{\ell}-\lambda_{\ell})-\sum_{\ell=1}^{L}\eta_{\ell}\hat{\lambda}_{\ell}.+ italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( square-root start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_P ) + ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) - ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT .

Suppose that the strict inequality in (164) does not hold, i.e., λ^=0superscriptsubscript^𝜆0\hat{\lambda}_{\ell}^{*}=0over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 0. We show that such λ^superscriptsubscript^𝜆\hat{\lambda}_{\ell}^{*}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT cannot be the optimal solution to the inner minimization problem.

The Lagrangian term in (165) depends on γsubscript𝛾\gamma_{\ell}italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT and λ^subscript^𝜆\hat{\lambda}_{\ell}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT through the following function:

G(γ,λ^)subscriptsuperscript𝐺subscript𝛾subscript^𝜆\displaystyle G^{\prime}_{\ell}(\gamma_{\ell},\hat{\lambda}_{\ell})italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) =\displaystyle== 12logλγ+ν1(λ2λ^(λγ)+λ^)+ν2(λλ^)212subscript𝜆subscript𝛾subscript𝜈1subscript𝜆2subscript^𝜆subscript𝜆subscript𝛾subscript^𝜆subscript𝜈2superscriptsubscript𝜆subscript^𝜆2\displaystyle\frac{1}{2}\log\frac{\lambda_{\ell}}{\gamma_{\ell}}+\nu_{1}\left(% \lambda_{\ell}-2\sqrt{\hat{\lambda}_{\ell}(\lambda_{\ell}-\gamma_{\ell})}+\hat% {\lambda}_{\ell}\right)+\nu_{2}\left(\sqrt{\lambda_{\ell}}-\sqrt{\hat{\lambda}% _{\ell}}\right)^{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - 2 square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) + italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( square-root start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (166)
+ξ(γλ)ηλ^.subscript𝜉subscript𝛾subscript𝜆subscript𝜂subscript^𝜆\displaystyle\hskip 28.45274pt+\xi_{\ell}(\gamma_{\ell}-\lambda_{\ell})-\eta_{% \ell}\hat{\lambda}_{\ell}.+ italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) - italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT .

We fix γ=γsubscript𝛾superscriptsubscript𝛾\gamma_{\ell}=\gamma_{\ell}^{*}italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and then deviate from λ^=0superscriptsubscript^𝜆0\hat{\lambda}_{\ell}^{*}=0over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 0 to λ^=ϵsuperscriptsubscript^𝜆italic-ϵ\hat{\lambda}_{\ell}^{\prime}=\epsilonover^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_ϵ for some small ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0. The first order change in G(γ,λ^)subscriptsuperscript𝐺subscriptsuperscript𝛾subscript^𝜆G^{\prime}_{\ell}(\gamma^{*}_{\ell},\hat{\lambda}_{\ell})italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) can be seen as follows:

G(γ,λ^)G(γ,λ^)subscriptsuperscript𝐺subscriptsuperscript𝛾subscriptsuperscript^𝜆subscriptsuperscript𝐺subscriptsuperscript𝛾subscriptsuperscript^𝜆\displaystyle G^{\prime}_{\ell}(\gamma^{*}_{\ell},\hat{\lambda}^{*}_{\ell})-G^% {\prime}_{\ell}(\gamma^{*}_{\ell},\hat{\lambda}^{\prime}_{\ell})italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) - italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) =\displaystyle== ν1(2ϵ(λγ)ϵ)+ν2(2λϵϵ)+ηϵsubscript𝜈12italic-ϵsubscript𝜆superscriptsubscript𝛾italic-ϵsubscript𝜈22subscript𝜆italic-ϵitalic-ϵsubscript𝜂italic-ϵ\displaystyle\nu_{1}(2\sqrt{\epsilon(\lambda_{\ell}-\gamma_{\ell}^{*})}-% \epsilon)+\nu_{2}(2\sqrt{\lambda_{\ell}\epsilon}-\epsilon)+\eta_{\ell}\epsilonitalic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 2 square-root start_ARG italic_ϵ ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG - italic_ϵ ) + italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 2 square-root start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT italic_ϵ end_ARG - italic_ϵ ) + italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT italic_ϵ (167)
=\displaystyle== 2(ν2λ+ν1λγ)ϵ+O(ϵ).2subscript𝜈2subscript𝜆subscript𝜈1𝜆subscriptsuperscript𝛾italic-ϵ𝑂italic-ϵ\displaystyle 2(\nu_{2}\sqrt{\lambda_{\ell}}+\nu_{1}\sqrt{\lambda-\gamma^{*}_{% \ell}})\sqrt{\epsilon}+O(\epsilon).2 ( italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT square-root start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT square-root start_ARG italic_λ - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) square-root start_ARG italic_ϵ end_ARG + italic_O ( italic_ϵ ) . (168)

Thus, if ν2>0subscript𝜈20\nu_{2}>0italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0, for sufficiently small ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, we can strictly decrease G(γ,λ^)subscriptsuperscript𝐺subscriptsuperscript𝛾subscriptsuperscript^𝜆G^{\prime}_{\ell}(\gamma^{*}_{\ell},\hat{\lambda}^{*}_{\ell})italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ), while satisfying the implicit constraints. This contradicts with the assumption that λ^=0subscriptsuperscript^𝜆0\hat{\lambda}^{*}_{\ell}=0over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = 0 is the optimal solution to the inner minimization problem. This proves (164). Given the strict inequality in (164), similar to the KL-divergence metric, we can show that

γ<λ.superscriptsubscript𝛾subscript𝜆\displaystyle\gamma_{\ell}^{*}<\lambda_{\ell}.italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT < italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT . (169)

The strict inequalities in (169) and (164) imply that each component has a positive rate, and further ξ=η=0subscript𝜉subscript𝜂0\xi_{\ell}=\eta_{\ell}=0italic_ξ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = italic_η start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = 0. Thus, we can write down the following KKT conditions

12γν1λ^λγ12subscriptsuperscript𝛾subscript𝜈1superscriptsubscript^𝜆subscript𝜆subscriptsuperscript𝛾\displaystyle\frac{1}{2\gamma^{*}_{{\ell}}}-\nu_{1}\sqrt{\frac{{\hat{\lambda}_% {{\ell}}^{*}}}{\lambda_{{\ell}}-\gamma^{*}_{{\ell}}}}divide start_ARG 1 end_ARG start_ARG 2 italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT square-root start_ARG divide start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG end_ARG =\displaystyle== 0,0\displaystyle 0,0 , (170a)
ν1(λγλ^+1)+ν2(1λλ^)subscript𝜈1subscript𝜆subscriptsuperscript𝛾subscriptsuperscript^𝜆1subscript𝜈21subscript𝜆subscriptsuperscript^𝜆\displaystyle\nu_{1}\left(-\sqrt{\frac{\lambda_{{\ell}}-\gamma^{*}_{{\ell}}}{% \hat{\lambda}^{*}_{{\ell}}}}+1\right)+\nu_{2}\left(1-\sqrt{\frac{\lambda_{\ell% }}{\hat{\lambda}^{*}_{\ell}}}\right)italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( - square-root start_ARG divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG end_ARG + 1 ) + italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 1 - square-root start_ARG divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG end_ARG ) =\displaystyle== 0,0\displaystyle 0,0 , (170b)
=1L(λ2λ^(λγ)+λ^)superscriptsubscript1𝐿subscript𝜆2superscriptsubscript^𝜆subscript𝜆superscriptsubscript𝛾superscriptsubscript^𝜆\displaystyle\sum_{\ell=1}^{L}(\lambda_{{\ell}}-2\sqrt{\hat{\lambda}_{{\ell}}^% {*}(\lambda_{{\ell}}-\gamma_{{\ell}}^{*})}+\hat{\lambda}_{{\ell}}^{*})∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - 2 square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) =\displaystyle== D,𝐷\displaystyle D,italic_D , (170c)
=1L(λλ^)2superscriptsubscript1𝐿superscriptsubscript𝜆subscriptsuperscript^𝜆2\displaystyle\sum_{\ell=1}^{L}\left(\sqrt{\lambda_{{\ell}}}-\sqrt{\hat{\lambda% }^{*}_{{\ell}}}\right)^{2}∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( square-root start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG - square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =\displaystyle== P.𝑃\displaystyle P.italic_P . (170d)

The derivation of the optimal solution can now be shown as follows. Define

θsubscript𝜃\displaystyle\theta_{\ell}italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT =\displaystyle== λγλ^.subscript𝜆subscriptsuperscript𝛾subscriptsuperscript^𝜆\displaystyle\sqrt{\frac{\lambda_{\ell}-\gamma^{*}_{\ell}}{\hat{\lambda}^{*}_{% \ell}}}.square-root start_ARG divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG end_ARG . (171)

Plugging the above definition into (170b) yields

λ^subscriptsuperscript^𝜆\displaystyle\hat{\lambda}^{*}_{\ell}over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT =\displaystyle== λ(1+(1θ)ν1ν2)2,subscript𝜆superscript11subscript𝜃subscript𝜈1subscript𝜈22\displaystyle\frac{\lambda_{\ell}}{\left(1+\frac{(1-\theta_{\ell})\nu_{1}}{\nu% _{2}}\right)^{2}},divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + divide start_ARG ( 1 - italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (172)

Also, from (170a), we get

γ=θ2ν1.subscriptsuperscript𝛾subscript𝜃2subscript𝜈1\displaystyle\gamma^{*}_{\ell}=\frac{\theta_{\ell}}{2\nu_{1}}.italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = divide start_ARG italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG . (173)

Plugging (172) and (173) into (171), we get the following equation:

θ1+(1θ)ν1ν2=1θ2ν1λ.subscript𝜃11subscript𝜃subscript𝜈1subscript𝜈21subscript𝜃2subscript𝜈1subscript𝜆\displaystyle\frac{\theta_{\ell}}{1+\frac{(1-\theta_{\ell})\nu_{1}}{\nu_{2}}}=% \sqrt{1-\frac{\theta_{\ell}}{2\nu_{1}\lambda_{\ell}}}.divide start_ARG italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG 1 + divide start_ARG ( 1 - italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG = square-root start_ARG 1 - divide start_ARG italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG end_ARG . (174)

Note that the function θ1+(1θ)ν1ν2subscript𝜃11subscript𝜃subscript𝜈1subscript𝜈2\frac{\theta_{\ell}}{1+\frac{(1-\theta_{\ell})\nu_{1}}{\nu_{2}}}divide start_ARG italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG 1 + divide start_ARG ( 1 - italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG is an increasing function in θsubscript𝜃\theta_{\ell}italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT. Also, the function 1θ2ν1λ1subscript𝜃2subscript𝜈1subscript𝜆\sqrt{1-\frac{\theta_{\ell}}{2\nu_{1}\lambda_{\ell}}}square-root start_ARG 1 - divide start_ARG italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG end_ARG as defined in θ[0,2ν1λ]subscript𝜃02subscript𝜈1subscript𝜆\theta_{\ell}\in[0,2\nu_{1}\lambda_{\ell}]italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ∈ [ 0 , 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ] is a decreasing function in θsubscript𝜃\theta_{\ell}italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT. So, the solution to the above equation is unique.

Thus, λ^subscriptsuperscript^𝜆\hat{\lambda}^{*}_{\ell}over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT and γsuperscriptsubscript𝛾\gamma_{\ell}^{*}italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in (172) and (173) can be obtained from θsubscript𝜃\theta_{\ell}italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, which is determined via (174). This proves (51) and (52).

Appendix G Proof of Corollary 1

If P=0𝑃0P=0italic_P = 0, this falls under the first case in Theorem 4 and Theorem 6. Here, we have

R(D,0)𝑅𝐷0\displaystyle R(D,0)italic_R ( italic_D , 0 ) =\displaystyle== 12=1Llogλγ(D,0).12superscriptsubscript1𝐿subscript𝜆subscriptsuperscript𝛾𝐷0\displaystyle\frac{1}{2}\sum_{\ell=1}^{L}\log\frac{\lambda_{\ell}}{\gamma^{*}_% {\ell}(D,0)}.divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , 0 ) end_ARG . (175)

The perception constraint (45) and (55) with P=0𝑃0P=0italic_P = 0 implies that λ^(D,0)=λsubscriptsuperscript^𝜆𝐷0subscript𝜆\hat{\lambda}^{*}_{\ell}(D,0)=\lambda_{\ell}over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , 0 ) = italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT for every {1,,L}1𝐿\ell\in\{1,\ldots,L\}roman_ℓ ∈ { 1 , … , italic_L }. Now, using the expression of optimal γsuperscriptsubscript𝛾\gamma_{\ell}^{*}italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in (40) together with λ^=λsubscriptsuperscript^𝜆subscript𝜆\hat{\lambda}^{*}_{\ell}=\lambda_{\ell}over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, we have

γ(D,0)superscriptsubscript𝛾𝐷0\displaystyle\gamma_{\ell}^{*}(D,0)italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , 0 ) =\displaystyle== 2λ1+1+16ν12λ2,2subscript𝜆1116superscriptsubscript𝜈12superscriptsubscript𝜆2\displaystyle\frac{2\lambda_{\ell}}{1+\sqrt{1+16\nu_{1}^{2}\lambda_{\ell}^{2}}},divide start_ARG 2 italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG 1 + square-root start_ARG 1 + 16 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG , (176)

where ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is chosen to satisfy the distortion constraint (44) and (54), i.e.,

D==1L(2λ2λ(λγ(D,0))).𝐷superscriptsubscript1𝐿2subscript𝜆2subscript𝜆subscript𝜆superscriptsubscript𝛾𝐷0\displaystyle D=\sum_{\ell=1}^{L}\left(2\lambda_{\ell}-2\sqrt{\lambda_{\ell}(% \lambda_{\ell}-\gamma_{\ell}^{*}(D,0))}\right).italic_D = ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( 2 italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - 2 square-root start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , 0 ) ) end_ARG ) . (177)

Combining the above proves the desired result.

Appendix H Asymptotic Analysis for Perceptually Perfect Reconstruction

We utilize the optimal solution for the perceptually perfect reconstruction case in Corollary 1, i.e., (175), (176) and (177).

H-1 High-Distortion Compression

Let D=(=1L2λ)ϵ𝐷superscriptsubscript1𝐿2subscript𝜆italic-ϵD=\left(\sum_{\ell=1}^{L}2\lambda_{\ell}\right)-\epsilonitalic_D = ( ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT 2 italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) - italic_ϵ for some small ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0. Note that by (177), this means that we are setting ϵitalic-ϵ\epsilonitalic_ϵ to be

ϵ==1L2λ(λγ(D,0)).italic-ϵsuperscriptsubscript1𝐿2subscript𝜆subscript𝜆subscriptsuperscript𝛾𝐷0\epsilon=\sum_{\ell=1}^{L}2\sqrt{\lambda_{\ell}(\lambda_{\ell}-\gamma^{*}_{% \ell}(D,0))}.italic_ϵ = ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT 2 square-root start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_D , 0 ) ) end_ARG . (178)

In this case, γ(D,0)superscriptsubscript𝛾𝐷0\gamma_{\ell}^{*}(D,0)italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , 0 ) should be close to λsubscript𝜆\lambda_{\ell}italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, and the rate is close to zero. By (176), this also means that ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT must be close to zero. Then, we can approximate γ(D,0)superscriptsubscript𝛾𝐷0\gamma_{\ell}^{*}(D,0)italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , 0 ) as follows:

γ(D,0)superscriptsubscript𝛾𝐷0\displaystyle\gamma_{\ell}^{*}(D,0)italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , 0 ) =\displaystyle== 2λ1+1+16λ2ν122subscript𝜆1116superscriptsubscript𝜆2superscriptsubscript𝜈12\displaystyle\frac{2\lambda_{\ell}}{1+\sqrt{1+16\lambda_{\ell}^{2}\nu_{1}^{2}}}divide start_ARG 2 italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG 1 + square-root start_ARG 1 + 16 italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG (179)
=\displaystyle== λ1+4ν12λ2+O(ν14)subscript𝜆14superscriptsubscript𝜈12subscriptsuperscript𝜆2𝑂superscriptsubscript𝜈14\displaystyle\frac{\lambda_{\ell}}{1+4\nu_{1}^{2}\lambda^{2}_{\ell}+O(\nu_{1}^% {4})}divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG 1 + 4 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT + italic_O ( italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) end_ARG (180)
=\displaystyle== λ(14ν12λ2)+O(ν14).subscript𝜆14superscriptsubscript𝜈12subscriptsuperscript𝜆2𝑂superscriptsubscript𝜈14\displaystyle\lambda_{\ell}(1-4\nu_{1}^{2}\lambda^{2}_{\ell})+O(\nu_{1}^{4}).italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( 1 - 4 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) + italic_O ( italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) . (181)

Plugging the above into (178) yields

ϵ=4ν1=1Lλ2+O(ν12).italic-ϵ4subscript𝜈1superscriptsubscript1𝐿subscriptsuperscript𝜆2𝑂superscriptsubscript𝜈12\displaystyle\epsilon=4\nu_{1}\sum_{\ell=1}^{L}\lambda^{2}_{\ell}+O(\nu_{1}^{2% }).italic_ϵ = 4 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT + italic_O ( italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) . (182)

The rate expression can now be approximated as follows

R(2=1Lλϵ,0)𝑅2superscriptsubscript1𝐿subscript𝜆italic-ϵ0\displaystyle R\left(2\sum_{\ell=1}^{L}\lambda_{\ell}-\epsilon,0\right)italic_R ( 2 ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_ϵ , 0 ) =\displaystyle== 12=1Llog1+1+16ν12λ2212superscriptsubscript1𝐿1116superscriptsubscript𝜈12subscriptsuperscript𝜆22\displaystyle\frac{1}{2}\sum_{\ell=1}^{L}\log\frac{1+\sqrt{1+16\nu_{1}^{2}% \lambda^{2}_{\ell}}}{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log divide start_ARG 1 + square-root start_ARG 1 + 16 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 2 end_ARG (183)
=\displaystyle== 12=1Llog(1+4ν12λ2+O(ν14))12superscriptsubscript1𝐿14superscriptsubscript𝜈12superscriptsubscript𝜆2𝑂superscriptsubscript𝜈14\displaystyle\frac{1}{2}\sum_{\ell=1}^{L}\log(1+4\nu_{1}^{2}\lambda_{\ell}^{2}% +O(\nu_{1}^{4}))divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log ( 1 + 4 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_O ( italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) ) (184)
=\displaystyle== 12=1L4ν12λ2+O(ν14),12superscriptsubscript1𝐿4superscriptsubscript𝜈12superscriptsubscript𝜆2𝑂superscriptsubscript𝜈14\displaystyle\frac{1}{2}\sum_{\ell=1}^{L}4\nu_{1}^{2}\lambda_{\ell}^{2}+O(\nu_% {1}^{4}),divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT 4 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_O ( italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) , (185)

Now, using (182) and (185) to eliminate ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, we get

R(2=1Lλϵ,0)=ϵ28=1Lλ2+O(ϵ3).𝑅2superscriptsubscript1𝐿subscript𝜆italic-ϵ0superscriptitalic-ϵ28superscriptsubscript1𝐿superscriptsubscript𝜆2𝑂superscriptitalic-ϵ3\displaystyle R\left(2\sum_{\ell=1}^{L}\lambda_{\ell}-\epsilon,0\right)=\frac{% \epsilon^{2}}{8\sum_{\ell=1}^{L}\lambda_{\ell}^{2}}+O(\epsilon^{3}).italic_R ( 2 ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_ϵ , 0 ) = divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 8 ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) . (186)

To derive the expression for the water-level, we use (182) in (181) to get

γ(2=1Lλϵ,0)=λϵ2λ34(=1Lλ2)2+O(ϵ3),{1,,L}.formulae-sequencesubscriptsuperscript𝛾2superscriptsubscript1𝐿subscript𝜆italic-ϵ0subscript𝜆superscriptitalic-ϵ2superscriptsubscript𝜆34superscriptsuperscriptsubscript1𝐿superscriptsubscript𝜆22𝑂superscriptitalic-ϵ31𝐿\displaystyle\gamma^{*}_{\ell}\left(2\sum_{\ell=1}^{L}\lambda_{\ell}-\epsilon,% 0\right)=\lambda_{\ell}-\frac{\epsilon^{2}\lambda_{\ell}^{3}}{4\left(\sum_{% \ell=1}^{L}\lambda_{\ell}^{2}\right)^{2}}+O(\epsilon^{3}),\quad\ell\in\{1,% \ldots,L\}.italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( 2 ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_ϵ , 0 ) = italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG 4 ( ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) , roman_ℓ ∈ { 1 , … , italic_L } . (187)

H-2 Low-Distortion Compression

Let D=ϵ𝐷italic-ϵD=\epsilonitalic_D = italic_ϵ for some small ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0. Note that as ϵ0italic-ϵ0\epsilon\rightarrow 0italic_ϵ → 0, we must have γ0superscriptsubscript𝛾0\gamma_{\ell}^{*}\rightarrow 0italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT → 0 by (177), and consequently ν1subscript𝜈1\nu_{1}\rightarrow\inftyitalic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → ∞ by (176). In this regime, we can approximate the water-levels in (176) as follows

γ(D,0)superscriptsubscript𝛾𝐷0\displaystyle\gamma_{\ell}^{*}(D,0)italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , 0 ) =\displaystyle== 2λ1+1+16λ2ν122subscript𝜆1116superscriptsubscript𝜆2superscriptsubscript𝜈12\displaystyle\frac{2\lambda_{\ell}}{1+\sqrt{1+16\lambda_{\ell}^{2}\nu_{1}^{2}}}divide start_ARG 2 italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG 1 + square-root start_ARG 1 + 16 italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG (188)
=\displaystyle== 12ν118ν12λ+O(1ν13).12subscript𝜈118superscriptsubscript𝜈12subscript𝜆𝑂1superscriptsubscript𝜈13\displaystyle\frac{1}{2\nu_{1}}-\frac{1}{8\nu_{1}^{2}\lambda_{\ell}}+O\left(% \frac{1}{\nu_{1}^{3}}\right).divide start_ARG 1 end_ARG start_ARG 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG 8 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + italic_O ( divide start_ARG 1 end_ARG start_ARG italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ) . (189)

Plugging (189) into the distortion constraint (177), we have

ϵitalic-ϵ\displaystyle\epsilonitalic_ϵ =\displaystyle== =1L(2λ2λ(λγ(D,0)))superscriptsubscript1𝐿2subscript𝜆2subscript𝜆subscript𝜆superscriptsubscript𝛾𝐷0\displaystyle\sum_{\ell=1}^{L}\left(2\lambda_{\ell}-2\sqrt{\lambda_{\ell}\left% (\lambda_{\ell}-\gamma_{\ell}^{*}(D,0)\right)}\right)∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( 2 italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - 2 square-root start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_D , 0 ) ) end_ARG ) (190)
=\displaystyle== L2ν1116ν12=1L1λ+O(1ν13),𝐿2subscript𝜈1116superscriptsubscript𝜈12superscriptsubscript1𝐿1subscript𝜆𝑂1superscriptsubscript𝜈13\displaystyle\frac{L}{2\nu_{1}}-\frac{1}{16\nu_{1}^{2}}\sum_{\ell=1}^{L}\frac{% 1}{\lambda_{\ell}}+O\left(\frac{1}{\nu_{1}^{3}}\right),divide start_ARG italic_L end_ARG start_ARG 2 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG 16 italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + italic_O ( divide start_ARG 1 end_ARG start_ARG italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ) , (191)

which implies

1ν1=2ϵL+ϵ22L3=1L1λ.1subscript𝜈12italic-ϵ𝐿superscriptitalic-ϵ22superscript𝐿3superscriptsubscript1𝐿1subscript𝜆\displaystyle\frac{1}{\nu_{1}}=\frac{2\epsilon}{L}+\frac{\epsilon^{2}}{2L^{3}}% \sum\limits_{\ell=1}^{L}\frac{1}{\lambda_{\ell}}.divide start_ARG 1 end_ARG start_ARG italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG = divide start_ARG 2 italic_ϵ end_ARG start_ARG italic_L end_ARG + divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_L start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG . (192)

Substituting (192) into (189) shows that the water-levels in the low-distortion regime are given by

γ(ϵ,0)=ϵLϵ22L2λ+ϵ24L3=1L1λ+O(ϵ3),{1,,L}.formulae-sequencesuperscriptsubscript𝛾italic-ϵ0italic-ϵ𝐿superscriptitalic-ϵ22superscript𝐿2subscript𝜆superscriptitalic-ϵ24superscript𝐿3superscriptsubscript1𝐿1subscript𝜆𝑂superscriptitalic-ϵ31𝐿\displaystyle\gamma_{\ell}^{*}(\epsilon,0)=\frac{\epsilon}{L}-\frac{\epsilon^{% 2}}{2L^{2}\lambda_{\ell}}+\frac{\epsilon^{2}}{4L^{3}}\sum_{\ell=1}^{L}\frac{1}% {\lambda_{\ell}}+O(\epsilon^{3}),\quad\ell\in\{1,\ldots,L\}.italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_ϵ , 0 ) = divide start_ARG italic_ϵ end_ARG start_ARG italic_L end_ARG - divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_L start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) , roman_ℓ ∈ { 1 , … , italic_L } . (193)

The rate expression can now be approximated as follows

R(ϵ,0)𝑅italic-ϵ0\displaystyle R(\epsilon,0)italic_R ( italic_ϵ , 0 ) =\displaystyle== 12=1Llogλγ(ϵ,0)12superscriptsubscript1𝐿subscript𝜆subscriptsuperscript𝛾italic-ϵ0\displaystyle\frac{1}{2}\sum_{\ell=1}^{L}\log\frac{\lambda_{\ell}}{\gamma^{*}_% {\ell}(\epsilon,0)}divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_ϵ , 0 ) end_ARG (194)
=\displaystyle== 12=1LlogLλϵ12=1Llog(1ϵ2Lλ+ϵ4L2=1L1λ+O(ϵ2))12superscriptsubscript1𝐿𝐿subscript𝜆italic-ϵ12superscriptsubscript1𝐿1italic-ϵ2𝐿subscript𝜆italic-ϵ4superscript𝐿2superscriptsubscriptsuperscript1𝐿1subscript𝜆superscript𝑂superscriptitalic-ϵ2\displaystyle\frac{1}{2}\sum_{\ell=1}^{L}\log\frac{L\lambda_{\ell}}{\epsilon}-% \frac{1}{2}\sum\limits_{\ell=1}^{L}\log\left(1-\frac{\epsilon}{2L\lambda_{\ell% }}+\frac{\epsilon}{4L^{2}}\sum_{\ell^{\prime}=1}^{L}\frac{1}{\lambda_{\ell^{% \prime}}}+O(\epsilon^{2})\right)divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log divide start_ARG italic_L italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_ϵ end_ARG - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log ( 1 - divide start_ARG italic_ϵ end_ARG start_ARG 2 italic_L italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_ϵ end_ARG start_ARG 4 italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) (195)
=\displaystyle== 12=1LlogLλϵ12=1L(ϵ2Lλ+ϵ4L2=1L1λ)+O(ϵ2)12superscriptsubscript1𝐿𝐿subscript𝜆italic-ϵ12superscriptsubscript1𝐿italic-ϵ2𝐿subscript𝜆italic-ϵ4superscript𝐿2superscriptsubscriptsuperscript1𝐿1subscript𝜆superscript𝑂superscriptitalic-ϵ2\displaystyle\frac{1}{2}\sum_{\ell=1}^{L}\log\frac{L\lambda_{\ell}}{\epsilon}-% \frac{1}{2}\sum\limits_{\ell=1}^{L}\left(-\frac{\epsilon}{2L\lambda_{\ell}}+% \frac{\epsilon}{4L^{2}}\sum_{\ell^{\prime}=1}^{L}\frac{1}{\lambda_{\ell^{% \prime}}}\right)+O(\epsilon^{2})divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log divide start_ARG italic_L italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_ϵ end_ARG - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( - divide start_ARG italic_ϵ end_ARG start_ARG 2 italic_L italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_ϵ end_ARG start_ARG 4 italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG ) + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) (196)
=\displaystyle== 12=1LlogLλϵ+ϵ8L=1L1λ+O(ϵ2).12superscriptsubscript1𝐿𝐿subscript𝜆italic-ϵitalic-ϵ8𝐿superscriptsubscript1𝐿1subscript𝜆𝑂superscriptitalic-ϵ2\displaystyle\frac{1}{2}\sum_{\ell=1}^{L}\log\frac{L\lambda_{\ell}}{\epsilon}+% \frac{\epsilon}{8L}\sum_{\ell=1}^{L}\frac{1}{\lambda_{\ell}}+O(\epsilon^{2}).divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log divide start_ARG italic_L italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG start_ARG italic_ϵ end_ARG + divide start_ARG italic_ϵ end_ARG start_ARG 8 italic_L end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) . (197)

This concludes the proof.

References

  • [1] E. Agustsson, M. Tschannen, F. Mentzer, R. Timofte, and L. Van Gool, “Generative adversarial networks for extreme learned image compression,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 221–231.
  • [2] J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2017, pp. 1–27.
  • [3] L. Theis, W. Shi, A. Cunningham, and F. Huszár, “Lossy image compression with compressive autoencoders,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2017.
  • [4] F. Mentzer, E. Agustsson, M. Tschannen, R. Timofte, and L. V. Gool, “Conditional probability models for deep image compression,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 4394–4402.
  • [5] Y. Blau and T. Michaeli, “Rethinking lossy compression: The rate-distortion-perception tradeoff,” in Proc. ACM Int. Conf. Mach. Learn. (ICML), 2019, pp. 675–685.
  • [6] N. Saldi, T. Linder, and S. Yüksel, “Output constrained lossy source coding with limited common randomness,” IEEE Trans. Inf. Theory, vol. 61, no. 9, pp. 4984–4998, Jun. 2015.
  • [7] L. Theis and A. Wagner, “A coding theorem for the rate-distortion-perception function,” in Neural Compression Workshop of Int. Conf. Learn. Represent. (ICLR), 2021, p. 9.
  • [8] C. T. Li and A. El Gamal, “Strong functional representation lemma and applications to coding theorems,” IEEE Trans. Inf. Theory, vol. 64, no. 11, pp. 6967–6978, Nov. 2018.
  • [9] G. Zhang, J. Qian, J. Chen, and A. Khisti, “Universal rate-distortion-perception representations for lossy compression,” in Proc. Adv. Neural Inf. Process. Sys. (NeurIPS), 2021, pp. 11 517–11 529.
  • [10] A. B. Wagner, “The rate-distortion-perception tradeoff: The role of common randomness,” arXiv:2202.04147, 2022.
  • [11] J. Chen, L. Yu, J. Wang, W. Shi, Y. Ge, and W. Tong, “On the rate-distortion-perception function,” IEEE J. Sel. Areas Inf. Theory, vol. 3, no. 4, pp. 664–673, Dec. 2022.
  • [12] D. Freirich, T. Michaeli, and R. Meir, “A theory of the distortion-perception tradeoff in Wasserstein space,” Proc. Adv. Neural Inf. Process. Sys. (NeurIPS), vol. 34, pp. 25 661–25 672, 2021.
  • [13] Z. Yan, F. Wen, R. Ying, C. Ma, and P. Liu, “On perceptual lossy compression: The cost of perceptual reconstruction and an optimal training framework,” in Proc. ACM Int. Conf. Mach. Learn. (ICML), 2021, pp. 11 682–11 692.
  • [14] H. Liu, G. Zhang, J. Chen, and A. Khisti, “Lossy compression with distribution shift as entropy constrained optimal transport,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2022, pp. 1–6.
  • [15] S. Salehkalaibar, B. Phan, J. Chen, W. Yu, and A. Khisti, “On the choice of perception loss function for learned video compression,” in Proc. Adv. Neural Inf. Process. Sys. (NeurIPS), 2023.
  • [16] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd Ed.   Wiley, 2006.
  • [17] L. Song, J. Chen, and C. Tian, “Broadcasting correlated vector gaussians,” IEEE Trans. Inf. Theory, vol. 61, no. 5, pp. 2465–2477, May 2015.