Staggered Quantizers for Perfect Perceptual Quality: A Connection between Quantizers with Common Randomness and Without

Ruida Zhou Department of Electrical Engineering
University of California, Los Angeles
Los Angeles, CA
[email protected]
   Chao Tian Department of Electrical and Computer Engineering
Texas A&M University
College Station, TX
[email protected]
Abstract

The rate-distortion-perception (RDP) framework has attracted significant recent attention due to its application in neural compression. It is important to understand the underlying mechanism connecting procedures with common randomness and those without. Different from previous efforts, we study this problem from a quantizer design perspective. By analyzing an idealized setting, we provide an interpretation of the advantage of dithered quantization in the RDP setting, which further allows us to make a conceptual connection between randomized (dithered) quantizers and quantizers without common randomness. This new understanding leads to a new procedure for RDP coding based on staggered quantizers.

I Introduction

Compression plays an important role in the efficient representation of information content, particularly visual content. Traditionally, the tradeoff between the compression rate and the incurred distortion has been studied under two different but related frameworks: the quantization framework [1] and the rate-distortion theory [2] framework. In the former, the focus is on the design of quantizers that compress data samples one at a time (i.e., scalar quantization) or a few at a time (i.e., vector quantization), while the latter focuses on the fundamental limits of lossy compression by allowing an asymptotically large number of samples to be encoded together.

Largely driven by the recent emergence of the neural compression, the issue of perceptual quality has led to the study of the problem of rate-distortion-perception (RDP) tradeoff [3, 4, 5, 6, 7, 8, 9, 10]. In this formulation, a new quality constraint, which was introduced to capture the perceptual quality loss due to compression, is further imposed in addition to the existing objective distortion constraint. Mathematically, this formulation [3] requires the probability distribution of the content after decompression to be close to that of the source content before compression; the case when the two distributions are exactly the same is often referred to as “perfect perceptual quality”, which is our focus in this work.

The RDP problem has attracted significant recent research attention, and several studies in this area revealed that common randomness plays an important role in this setting [5, 6]. More precisely, the lack of common randomness can cause significant performance loss compared to methods that have such common randomness at their disposal, and this loss is particularly severe for scalar quantization. There are two known prevailing methods of introducing common randomness for RDP coding. The first is based on probabilistic sampling [11], and the second is through universal dithered quantization [12, 13]. The first approach requires the knowledge of a target joint distribution between the samples and the compressed version, and furthermore, involves a rather complex sampling procedure. The dither-based approach, on the other hand, is simpler to implement and thus more attractive, however, its architecture places an inherent constraint on the eventual probability distribution, and though widely used, it is not clear what actually makes it suitable for the RDP setting.

One piece of the puzzle has thus far been missing between the compression procedures without common randomness (e.g., scalar quantization with deterministic encoder) and those with a large amount of common randomness (dithered quantizers), particularly from a quantizer design perspective. That is, quantizers with deterministic encoders require no common randomness, and the dither-based approach will utilize common randomness on an uncountable set in a less transparent manner. What exactly is the underlying mechanism that lends the dither-based approach the advantage, and is there an effective procedure with an intermediate amount of common randomness? Although these questions have previously been studied under the rate-distortion framework with asymptotic large sample block size [14], the asymptotic nature of such analysis makes the mechanism rather opaque.

In this work, we develop a better understanding of these issues under the quantization framework. Using a decomposition perspective, we provide a new way to understand the mechanism from which procedures utilizing common randomness obtain the advantage. We first focus on an idealized setting on the unit circle, and provide a complete analysis of the performance. Based on these understandings, we provide a new approach to introduce common randomness using staggered quantizers. We further discuss the application of such an approach to other sources. It should be noted that staggered quantizers have been previously used for multiple description coding [15, 16, 17] which offered surprisingly competitive performance compared to more sophisticated approaches.

II Backgrounds

II-A Rate-distortion function and quantizers

Let the data source X𝑋Xitalic_X be a real-valued random variable, with a distribution PXsubscript𝑃𝑋P_{X}italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT on the alphabet 𝒳𝒳\mathcal{X}caligraphic_X. The reconstruction alphabet is denoted as 𝒳^^𝒳\hat{\mathcal{X}}over^ start_ARG caligraphic_X end_ARG. Given a distortion measure d:𝒳×𝒳^[0,):𝑑𝒳^𝒳0d:\mathcal{X}\times\hat{\mathcal{X}}\rightarrow[0,\infty)italic_d : caligraphic_X × over^ start_ARG caligraphic_X end_ARG → [ 0 , ∞ ), e.g., the squared error distortion d(x,x^)=(xx^)2𝑑𝑥^𝑥superscript𝑥^𝑥2d(x,\hat{x})=(x-\hat{x})^{2}italic_d ( italic_x , over^ start_ARG italic_x end_ARG ) = ( italic_x - over^ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT when 𝒳=𝒳^=𝒳^𝒳\mathcal{X}=\hat{\mathcal{X}}=\mathbb{R}caligraphic_X = over^ start_ARG caligraphic_X end_ARG = blackboard_R, the (informational) rate-distortion function under a distortion constraint D𝐷Ditalic_D is defined as

R(D)=minPX^|X:𝔼d(X,X^)DI(X;X^),𝑅𝐷subscript:subscript𝑃conditional^𝑋𝑋𝔼𝑑𝑋^𝑋𝐷𝐼𝑋^𝑋\displaystyle R(D)=\min_{P_{\hat{X}|X}:\mathbb{E}{d(X,\hat{X})}\leq D}I(X;\hat% {X}),italic_R ( italic_D ) = roman_min start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG | italic_X end_POSTSUBSCRIPT : blackboard_E italic_d ( italic_X , over^ start_ARG italic_X end_ARG ) ≤ italic_D end_POSTSUBSCRIPT italic_I ( italic_X ; over^ start_ARG italic_X end_ARG ) ,

where I(;)𝐼I(\cdot;\cdot)italic_I ( ⋅ ; ⋅ ) is the mutual information function.

Rate-distortion theory deals with the setting when an infinite number of samples is allowed to be encoded together. In practice, samples are usually encoded one or few at a time, referred to as scalar quantization and vector quantization, respectively. In particular, a scalar quantizer consists of an encoding map** f:𝒳:𝑓𝒳f:\mathcal{X}\rightarrow\mathbb{Z}italic_f : caligraphic_X → blackboard_Z which determines the representation index to assign to a sample, and a decoding function g:𝒳^:𝑔^𝒳g:\mathbb{Z}\rightarrow\hat{\mathcal{X}}italic_g : blackboard_Z → over^ start_ARG caligraphic_X end_ARG which assigns a reconstruction point to each representation index. Therefore, X^=g(f(X))^𝑋𝑔𝑓𝑋\hat{X}=g(f(X))over^ start_ARG italic_X end_ARG = italic_g ( italic_f ( italic_X ) ). Indices are allowed to be further entropy-coded, e.g., using Huffman code. When entropy coding is allowed, it is usually referred to as entropy-constrained scalar quantization (ECSQ), whereas when the number of quantization levels is fixed, it is usually referred to as fixed-rate quantization.

Universal dithered quantizer utilizes a uniform quantizer with stepsize ΔΔ\Deltaroman_Δ in the encoding and decoding process [18]. Different from classic deterministic quantizers, a random noise Z𝑍Zitalic_Z, independent of the data samples and uniformly distributed on the base interval (Δ/2,Δ/2]Δ2Δ2(-\Delta/2,\Delta/2]( - roman_Δ / 2 , roman_Δ / 2 ], is available at both the encoder and the decoder. The noise Z𝑍Zitalic_Z is first added to the sample as X+Z𝑋𝑍X+Zitalic_X + italic_Z, which is then quantized to its nearest neighbor using the deterministic uniform quantizer, and finally the same dither noise Z𝑍Zitalic_Z is subtracted at the decoder. It was shown [12, 13] that using this procedure X^=X+Z~^𝑋𝑋~𝑍\hat{X}=X+\tilde{Z}over^ start_ARG italic_X end_ARG = italic_X + over~ start_ARG italic_Z end_ARG, where Z~~𝑍\tilde{Z}over~ start_ARG italic_Z end_ARG has the same marginal probability distribution as Z𝑍Zitalic_Z and is also independent of X𝑋Xitalic_X, and conditioned on the common randomness, the optimal entropy coding rate (of the lattice index) is exactly H(f(X+Z)|Z)=I(X;X+Z)𝐻conditional𝑓𝑋𝑍𝑍𝐼𝑋𝑋𝑍H(f(X+Z)|Z)=I(X;X+Z)italic_H ( italic_f ( italic_X + italic_Z ) | italic_Z ) = italic_I ( italic_X ; italic_X + italic_Z ). Note that such a rate is impossible to achieve in practice, since it requires one entropy code for a specific realization of the noise Z=z𝑍𝑧Z=zitalic_Z = italic_z: Firstly, the usual technique of universal compression becomes unrealistic because it is unlikely (with zero probability) to have identical noise realizations and therefore very few samples to estimate the corresponding probability distribution; secondly, unless the distribution is analytically simple, storing the distribution or the entropy coding codewords for each noise realization is also unrealistic. Entropy coding of f(X+Z)𝑓𝑋𝑍f(X+Z)italic_f ( italic_X + italic_Z ) can be considered instead, resulting in a rate of H(f(X+Z))𝐻𝑓𝑋𝑍H(f(X+Z))italic_H ( italic_f ( italic_X + italic_Z ) ).

II-B Rate-distortion-perception function and RDP coding

The (informational) rate-distortion-perception function can be viewed as a generalization of the rate-distortion function, which under a given distortion constraint D𝐷Ditalic_D and a given perception constraint P𝑃Pitalic_P, is defined as

R(D,P)=minPX^|X:𝔼d(X,X^)D,w(PX,PX^)PI(X;X^),𝑅𝐷𝑃subscript:subscript𝑃conditional^𝑋𝑋formulae-sequence𝔼𝑑𝑋^𝑋𝐷𝑤subscript𝑃𝑋subscript𝑃^𝑋𝑃𝐼𝑋^𝑋\displaystyle R(D,P)=\min_{P_{\hat{X}|X}:\mathbb{E}{d(X,\hat{X})}\leq D,w(P_{X% },P_{\hat{X}})\leq P}I(X;\hat{X}),italic_R ( italic_D , italic_P ) = roman_min start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG | italic_X end_POSTSUBSCRIPT : blackboard_E italic_d ( italic_X , over^ start_ARG italic_X end_ARG ) ≤ italic_D , italic_w ( italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ) ≤ italic_P end_POSTSUBSCRIPT italic_I ( italic_X ; over^ start_ARG italic_X end_ARG ) , (1)

where w(,)𝑤w(\cdot,\cdot)italic_w ( ⋅ , ⋅ ) is a measure quantifying the distance between two probability distributions, e.g., KL divergence, total variation, or Wasserstein metric. We are mainly interested in the case of perfect perception, i.e.,

R(D,0)=minPX^|X:𝔼d(X,X^)D,PX^=PXI(X;X^),𝑅𝐷0subscript:subscript𝑃conditional^𝑋𝑋formulae-sequence𝔼𝑑𝑋^𝑋𝐷subscript𝑃^𝑋subscript𝑃𝑋𝐼𝑋^𝑋\displaystyle R(D,0)=\min_{P_{\hat{X}|X}:\mathbb{E}{d(X,\hat{X})}\leq D,P_{% \hat{X}}=P_{X}}I(X;\hat{X}),italic_R ( italic_D , 0 ) = roman_min start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG | italic_X end_POSTSUBSCRIPT : blackboard_E italic_d ( italic_X , over^ start_ARG italic_X end_ARG ) ≤ italic_D , italic_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_I ( italic_X ; over^ start_ARG italic_X end_ARG ) , (2)

which is independent of the choice of w(,)𝑤w(\cdot,\cdot)italic_w ( ⋅ , ⋅ ) measure. Similar to the rate-distortion setting, it was shown [19] that the RDP function is also the fundamental limit of any encoding and decoding function pairs in the RDP setting. It was established in [20] that under the MSE distortion measure, R(D,0)=R(D2,).𝑅𝐷0𝑅𝐷2R(D,0)=R(\frac{D}{2},\infty).italic_R ( italic_D , 0 ) = italic_R ( divide start_ARG italic_D end_ARG start_ARG 2 end_ARG , ∞ ) . These results are again asymptotic in nature, meaning the corresponding codes are allowed to encode a large number of samples together.

For scalar quantization (also called one-shot coding), it is possible to achieve the following coding rate [19] R(D,P)+log(R(D,P)+1)+4𝑅𝐷𝑃𝑅𝐷𝑃14R(D,P)+\log(R(D,P)+1)+4italic_R ( italic_D , italic_P ) + roman_log ( start_ARG italic_R ( italic_D , italic_P ) + 1 end_ARG ) + 4, using the sampling-based approach mentioned earlier, which is at a higher rate than the RDP function. The loss can be significant at the usual range of practical compression applications, e.g., at a target rate of 4444bits with a potential loss of more than 4444bits. It is not known whether this is the best rate possible for one-shot coding.

It has been shown that quantizers without common randomness can suffer significantly in RDP coding, and common randomness is important. Dithered quantizer appears to be a natural match and can be utilized. However, the output of the original dithered quantizer has a distribution the same as X+Z~𝑋~𝑍X+\tilde{Z}italic_X + over~ start_ARG italic_Z end_ARG, and therefore, there is a mismatch with the target RDP-optimal distribution. Particularly, for the perfect perceptual quality setting, the distribution of X+Z~𝑋~𝑍X+\tilde{Z}italic_X + over~ start_ARG italic_Z end_ARG may be different from PXsubscript𝑃𝑋P_{X}italic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT, and a distribution sha** procedure is needed at the decoder, at the expense of increased distortion. This sha** can be accomplished using a nonlinear function ϕ()italic-ϕ\phi(\cdot)italic_ϕ ( ⋅ ) operating on the output of the dithered quantizer X+Z~𝑋~𝑍X+\tilde{Z}italic_X + over~ start_ARG italic_Z end_ARG, and neural networks can be used to fulfill this role.

II-C Quantization on the unit circle

Consider the following idealized unit-circle setting: the data signal X𝑋Xitalic_X to be compressed is uniformly distributed over the unit circle 𝒳={x2:x2=1}𝒳conditional-set𝑥superscript2subscriptnorm𝑥21\mathcal{X}=\{x\in\mathbb{R}^{2}:\|x\|_{2}=1\}caligraphic_X = { italic_x ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT : ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 }. The distortion is measured using the square error function d(x,x^)=xx^22𝑑𝑥^𝑥superscriptsubscriptnorm𝑥^𝑥22d(x,\hat{x})=\|x-\hat{x}\|_{2}^{2}italic_d ( italic_x , over^ start_ARG italic_x end_ARG ) = ∥ italic_x - over^ start_ARG italic_x end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, the coding rate is set at 1111 bit per sample, and the reconstruction X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG is required to be of perfect perception quality, i.e., X^=dXsuperscript𝑑^𝑋𝑋\hat{X}\stackrel{{\scriptstyle d}}{{=}}Xover^ start_ARG italic_X end_ARG start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG italic_d end_ARG end_RELOP italic_X. Since the signal has its domain is the unit circle, we can represent any x𝒳𝑥𝒳x\in\mathcal{X}italic_x ∈ caligraphic_X by its angle θ(x)Θ(π,π]𝜃𝑥Θ𝜋𝜋\theta(x)\in\Theta\triangleq(-\pi,\pi]italic_θ ( italic_x ) ∈ roman_Θ ≜ ( - italic_π , italic_π ] such that x=(cos(θ(x)),sin(θ(x)))𝑥𝜃𝑥𝜃𝑥x=(\cos(\theta(x)),\sin(\theta(x)))italic_x = ( roman_cos ( start_ARG italic_θ ( italic_x ) end_ARG ) , roman_sin ( start_ARG italic_θ ( italic_x ) end_ARG ) ). Fixed-rate quantization at rate 1 on this data source was previously considered in [5] to illustrate the advantage of stochastic (dithered) encoders. Two types of quantizers were considered in [5]:

  • Quantizer with a deterministic encoder: Since there is no common randomness, to obtain perfect perception quality, decoder side noise must be injected. It was shown that the optimal quantization procedure in this case is as follows:

    f(θ(x))={1θ(x)[0,π)1otherwise,g(i)=i×π2Z¯,\displaystyle f(\theta(x))=\left\{\begin{matrix}1&\theta(x)\in[0,\pi)\\ -1&\text{otherwise}\end{matrix}\right.,\quad g(i)=\frac{i\times\pi}{2}-\bar{Z},italic_f ( italic_θ ( italic_x ) ) = { start_ARG start_ROW start_CELL 1 end_CELL start_CELL italic_θ ( italic_x ) ∈ [ 0 , italic_π ) end_CELL end_ROW start_ROW start_CELL - 1 end_CELL start_CELL otherwise end_CELL end_ROW end_ARG , italic_g ( italic_i ) = divide start_ARG italic_i × italic_π end_ARG start_ARG 2 end_ARG - over¯ start_ARG italic_Z end_ARG ,

    where Z¯¯𝑍\bar{Z}over¯ start_ARG italic_Z end_ARG is a private random variable at the decoder side, independent of X𝑋Xitalic_X, distributed uniformly on [π/2,π/2)𝜋2𝜋2[-\pi/2,\pi/2)[ - italic_π / 2 , italic_π / 2 ). We here view g(i)𝑔𝑖g(i)italic_g ( italic_i ) as a random function, and therefore did not include Z¯¯𝑍\bar{Z}over¯ start_ARG italic_Z end_ARG as part of the function input. This procedure gives a distortion 28/π228superscript𝜋22-8/\pi^{2}2 - 8 / italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

  • Dithered quantizer: Let Z𝑍Zitalic_Z be distributed uniformly over [π/2,π/2)𝜋2𝜋2[-\pi/2,\pi/2)[ - italic_π / 2 , italic_π / 2 ) independent of X𝑋Xitalic_X, dithered quantization operates as follows:

    f(Y)={1Y[0,π) mod 2π1otherwise,g(i)=i×π2Z,\displaystyle f(Y)=\left\{\begin{matrix}1&Y\in[0,\pi)\textnormal{ mod }2\pi\\ -1&\text{otherwise}\end{matrix}\right.,~{}g(i)=\frac{i\times\pi}{2}-Z,italic_f ( italic_Y ) = { start_ARG start_ROW start_CELL 1 end_CELL start_CELL italic_Y ∈ [ 0 , italic_π ) mod 2 italic_π end_CELL end_ROW start_ROW start_CELL - 1 end_CELL start_CELL otherwise end_CELL end_ROW end_ARG , italic_g ( italic_i ) = divide start_ARG italic_i × italic_π end_ARG start_ARG 2 end_ARG - italic_Z ,

    where Y=θ(x)+Z𝑌𝜃𝑥𝑍Y=\theta(x)+Zitalic_Y = italic_θ ( italic_x ) + italic_Z and θ(x^)=g(f(θ(x)+Z))𝜃^𝑥𝑔𝑓𝜃𝑥𝑍\theta(\hat{x})=g(f(\theta(x)+Z))italic_θ ( over^ start_ARG italic_x end_ARG ) = italic_g ( italic_f ( italic_θ ( italic_x ) + italic_Z ) ). By the property of the dither quantizer, we have θ(X^)=θ(X)+Z~mod2π𝜃^𝑋modulo𝜃𝑋~𝑍2𝜋\theta(\hat{X})=\theta(X)+\tilde{Z}\mod 2\piitalic_θ ( over^ start_ARG italic_X end_ARG ) = italic_θ ( italic_X ) + over~ start_ARG italic_Z end_ARG roman_mod 2 italic_π, where Z~=dZsuperscript𝑑~𝑍𝑍\tilde{Z}\stackrel{{\scriptstyle d}}{{=}}Zover~ start_ARG italic_Z end_ARG start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG italic_d end_ARG end_RELOP italic_Z and is independent of X𝑋Xitalic_X. The distortion thus induced is 24/π24𝜋2-4/\pi2 - 4 / italic_π, which is about 38.9% lower than that using the deterministic encoder.

The dithered quantizer performs better here for two reasons: 1) The distribution of θ(X)+Z~mod2πmodulo𝜃𝑋~𝑍2𝜋\theta(X)+\tilde{Z}\mod 2\piitalic_θ ( italic_X ) + over~ start_ARG italic_Z end_ARG roman_mod 2 italic_π is uniform on the unit circle, and thus naturally matches the perceptual requirement; 2) If the perception consideration were not present, the first approach could choose a single reconstruction point to minimize the distortion, however now it is forced to utilize private randomness at the decoder, over 1/2121/21 / 2 of the unit circle, to produce the desired distribution; this private randomness thus induces additional distortion. Fig. 1 (a) and (b) illustrate this effect of the two procedures.

III Quantization on the unit circle

(a) Quantizer - private randomness
(b) Dithered quantization
(c) Dissection of dithered quantization
Figure 1: 1-bit quantizers on the unit-circle with perfect perceptual quality: “×\times×” indicates a sample realization of X𝑋Xitalic_X; “\small\bullet” indicate the distribution of reconstruction X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG; red and blue regions indicate the partition region associated with indices +11+1+ 1 and 11-1- 1, respectively. In (a), the deterministic encoder is used. The sample is encoded as +11+1+ 1 and its reconstruction is distributed uniformly over the red region. In (b), the dithered approach is used, and the reconstruction would be distributed uniformly over the arc centered at the sample. There are no clear partitions in this case, and thus purple is used as a mixture of red and blue regions. In (c), “\circ” indicates realizations of negative common randomness Z𝑍-Z- italic_Z, and the dithered quantization is viewed as a mixture of uncountably many deterministic quantizers, each associated with a realization of Z𝑍Zitalic_Z.
=
Figure 2: Staggered quantizers with 1111 bit coding rate and 2222 bits common randomness.

III-A Noise realization and staggered quantizers

Consider again the unit circle setting at rate 1. An alternative view of a quantizer with common randomness is to consider the quantizer induced by fixing a realization of the common randomness Z=z𝑍𝑧Z=zitalic_Z = italic_z, which is illustrated in Fig. 1 (c). It is seen that the partitions of these quantizers are in fact congruent to that shown in Fig. 1 (a). Since Z𝑍Zitalic_Z is uniformly distributed on [π/2,π/2)𝜋2𝜋2[-\pi/2,\pi/2)[ - italic_π / 2 , italic_π / 2 ), the dithered quantization procedure is in fact mixing an uncountably many such quantizers, one for each z[π/2,π/2)𝑧𝜋2𝜋2z\in[-\pi/2,\pi/2)italic_z ∈ [ - italic_π / 2 , italic_π / 2 ). Due to the common randomness Z𝑍Zitalic_Z, there is no need to inject decoder side randomness, which helps reduce the resultant distortion.

The two types of quantizers considered in [5] can then be viewed as two extremes of a class of quantizers: the former is a single quantizer with a deterministic encoder that relies solely on decoder side randomness for perception, while the latter is mixing (randomly selected using the common randomness) among uncountably many quantizers each with a deterministic encoder that requires no decoder side randomness. In between the two extremes, we can consider mixing staggered quantizers with deterministic encoders, which will need to rely on decoder side randomness to some extent. One such example with N=4𝑁4N=4italic_N = 4 quantizers is illustrated in Fig. 2. It can be seen that each individual quantizer only requires the decoder side randomness to be uniformly distributed on 1/8181/81 / 8 of the unit circle, instead of 1/2121/21 / 2 of the unit circle. As discussed earlier, decoder side randomness induces additional distortion, and this reduction in its range helps to reduce the distortion. As we increase the number of quantizers, the distortion is further reduced, eventually approaching that of the dithered quantizer.

III-B Staggered quantizers on the unit circle

Generalizing the idea shown in Fig. 2, we can use N𝑁Nitalic_N staggered L𝐿Litalic_L-level quantizers, each of which uniformly partitions the unit circle. The N𝑁Nitalic_N quantizers are obtained by offsetting sequentially by an amount of 2π/(LN)2𝜋𝐿𝑁2\pi/(LN)2 italic_π / ( italic_L italic_N ) in terms of the angle on the unit circle. The common randomness uniformly selects one of N𝑁Nitalic_N quantizers, and the decoder adds private random noise uniformly distributed on 1/(2N)12𝑁1/(2N)1 / ( 2 italic_N ) of the unit circle.

Theorem 1.

In the unit-circle setting, at perfect perceptual quality, N𝑁Nitalic_N staggered quantizers each with L𝐿Litalic_L levels achieve the following rate-distortion pair.

(R,D)=(logL,22sin(π/(LN))π/(LN)sin(π/L)π/L).𝑅𝐷𝐿22𝜋𝐿𝑁𝜋𝐿𝑁𝜋𝐿𝜋𝐿\displaystyle(R,D)=\left(\log L,2-2\frac{\sin(\pi/(LN))}{\pi/(LN)}\frac{\sin(% \pi/L)}{\pi/L}\right).( italic_R , italic_D ) = ( roman_log italic_L , 2 - 2 divide start_ARG roman_sin ( start_ARG italic_π / ( italic_L italic_N ) end_ARG ) end_ARG start_ARG italic_π / ( italic_L italic_N ) end_ARG divide start_ARG roman_sin ( start_ARG italic_π / italic_L end_ARG ) end_ARG start_ARG italic_π / italic_L end_ARG ) .

The result subsumes the special case N=1𝑁1N=1italic_N = 1 and L=2𝐿2L=2italic_L = 2 given in [5].

Proof of Theorem 1.

Since each of N𝑁Nitalic_N quantifiers is uniform with L𝐿Litalic_L levels, the rate for the corresponding quantization procedure is logL𝐿\log Lroman_log italic_L. Due to symmetry, we analyze the distortion with a fixed quantizer. The arc (in angle) that the samples are quantized to the same index on has a length (2π)/L2𝜋𝐿(2\pi)/L( 2 italic_π ) / italic_L since there are L𝐿Litalic_L levels, and the inserted decoder noise is placed at the center of the arc uniformly distributed with a length (2π)/(NL)2𝜋𝑁𝐿(2\pi)/(NL)( 2 italic_π ) / ( italic_N italic_L ) since there are also N𝑁Nitalic_N quantizers. Since (cos(θ),sin(θ))(cos(α),sin(α))2=2(1cos(θα))superscriptnorm𝜃𝜃𝛼𝛼221𝜃𝛼\|(\cos(\theta),\sin(\theta))-(\cos(\alpha),\sin(\alpha))\|^{2}=2(1-\cos(% \theta-\alpha))∥ ( roman_cos ( start_ARG italic_θ end_ARG ) , roman_sin ( start_ARG italic_θ end_ARG ) ) - ( roman_cos ( start_ARG italic_α end_ARG ) , roman_sin ( start_ARG italic_α end_ARG ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 2 ( 1 - roman_cos ( start_ARG italic_θ - italic_α end_ARG ) ), the distortion can then be calculated as

=L2πLN2ππ/Lπ/L(π/(NL)π/(NL)2(1cos(θα))dα)dθabsent𝐿2𝜋𝐿𝑁2𝜋superscriptsubscript𝜋𝐿𝜋𝐿superscriptsubscript𝜋𝑁𝐿𝜋𝑁𝐿21𝜃𝛼differential-d𝛼differential-d𝜃\displaystyle=\frac{L}{2\pi}\frac{LN}{2\pi}\int_{-\pi/L}^{\pi/L}\left(\int_{-% \pi/(NL)}^{\pi/(NL)}2(1-\cos(\theta-\alpha))\mathrm{d}\alpha\right)\mathrm{d}\theta= divide start_ARG italic_L end_ARG start_ARG 2 italic_π end_ARG divide start_ARG italic_L italic_N end_ARG start_ARG 2 italic_π end_ARG ∫ start_POSTSUBSCRIPT - italic_π / italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π / italic_L end_POSTSUPERSCRIPT ( ∫ start_POSTSUBSCRIPT - italic_π / ( italic_N italic_L ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π / ( italic_N italic_L ) end_POSTSUPERSCRIPT 2 ( 1 - roman_cos ( start_ARG italic_θ - italic_α end_ARG ) ) roman_d italic_α ) roman_d italic_θ
=2+L2N2π2π/Lπ/Lsin(θπ/(NL))sin(θ+π/(NL))dθabsent2superscript𝐿2𝑁2superscript𝜋2superscriptsubscript𝜋𝐿𝜋𝐿𝜃𝜋𝑁𝐿𝜃𝜋𝑁𝐿d𝜃\displaystyle=2+\frac{L^{2}N}{2\pi^{2}}\int_{-\pi/L}^{\pi/L}\sin(\theta-\pi/(% NL))-\sin(\theta+\pi/(NL))\mathrm{d}\theta= 2 + divide start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_N end_ARG start_ARG 2 italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∫ start_POSTSUBSCRIPT - italic_π / italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π / italic_L end_POSTSUPERSCRIPT roman_sin ( start_ARG italic_θ - italic_π / ( italic_N italic_L ) end_ARG ) - roman_sin ( start_ARG italic_θ + italic_π / ( italic_N italic_L ) end_ARG ) roman_d italic_θ
=2+L2Nπ2(cos(πLN+1N)cos(πLN1N))absent2superscript𝐿2𝑁superscript𝜋2𝜋𝐿𝑁1𝑁𝜋𝐿𝑁1𝑁\displaystyle=2+\frac{L^{2}N}{\pi^{2}}\left(\cos(\frac{\pi}{L}\frac{N+1}{N})-% \cos(\frac{\pi}{L}\frac{N-1}{N})\right)= 2 + divide start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_N end_ARG start_ARG italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( roman_cos ( start_ARG divide start_ARG italic_π end_ARG start_ARG italic_L end_ARG divide start_ARG italic_N + 1 end_ARG start_ARG italic_N end_ARG end_ARG ) - roman_cos ( start_ARG divide start_ARG italic_π end_ARG start_ARG italic_L end_ARG divide start_ARG italic_N - 1 end_ARG start_ARG italic_N end_ARG end_ARG ) )
=22sin(π/(NL))π/(NL)sin(π/L)π/L,absent22𝜋𝑁𝐿𝜋𝑁𝐿𝜋𝐿𝜋𝐿\displaystyle=2-2\frac{\sin(\pi/(NL))}{\pi/(NL)}\frac{\sin(\pi/L)}{\pi/L},= 2 - 2 divide start_ARG roman_sin ( start_ARG italic_π / ( italic_N italic_L ) end_ARG ) end_ARG start_ARG italic_π / ( italic_N italic_L ) end_ARG divide start_ARG roman_sin ( start_ARG italic_π / italic_L end_ARG ) end_ARG start_ARG italic_π / italic_L end_ARG ,

which is the desired result. ∎

The next two theorems provide the fundamental limits of RDP coding and single-shot coding in the unit-circle setting.

Theorem 2.

In the unit-circle setting, the information-theoretic rate-distortion trade-off with perfect perceptual quality R(D,0)𝑅𝐷0R(D,0)italic_R ( italic_D , 0 ) is given by the pairs parametrized by λ>0𝜆0\lambda>0italic_λ > 0

{(R,D)\displaystyle{\Big{\{}}(R,D){ ( italic_R , italic_D ) =(log(2π)h(Z),𝔼[22cos(Z)])::absent2𝜋𝑍𝔼delimited-[]22𝑍absent\displaystyle={\Big{(}}\log(2\pi)-h(Z),\mathbb{E}[2-2\cos(Z)]{\Big{)}}:= ( roman_log ( start_ARG 2 italic_π end_ARG ) - italic_h ( italic_Z ) , blackboard_E [ 2 - 2 roman_cos ( start_ARG italic_Z end_ARG ) ] ) :
Zp(z;λ)=eλcos(z)ππeλcos(z)dz,λ>0}.\displaystyle\qquad\quad Z\sim p(z;\lambda)=\frac{e^{\lambda\cos(z)}}{\int_{-% \pi}^{\pi}e^{\lambda\cos(z^{\prime})}\mathrm{d}z^{\prime}},~{}\lambda>0{\Big{% \}}}.italic_Z ∼ italic_p ( italic_z ; italic_λ ) = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_λ roman_cos ( start_ARG italic_z end_ARG ) end_POSTSUPERSCRIPT end_ARG start_ARG ∫ start_POSTSUBSCRIPT - italic_π end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_λ roman_cos ( start_ARG italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) end_POSTSUPERSCRIPT roman_d italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG , italic_λ > 0 } .

Note that this is the best that can be achieved using infinitely large coding blocks, and it is in general impossible to achieve using single-shot coding.

Proof of Theorem 2.

We aim to minimize the rate-distortion Lagrangian with perfect perceptual quality for any Lagrange multiplier λ>0𝜆0\lambda>0italic_λ > 0, i.e,

minpX^|X:X^=dXI(X;X^)+λ𝔼[XX^2].subscript:subscript𝑝conditional^𝑋𝑋superscript𝑑^𝑋𝑋𝐼𝑋^𝑋𝜆𝔼delimited-[]superscriptnorm𝑋^𝑋2\displaystyle\min_{p_{\hat{X}|X}:\hat{X}\stackrel{{\scriptstyle d}}{{=}}X}I(X;% \hat{X})+\lambda\mathbb{E}[\|X-\hat{X}\|^{2}].roman_min start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG | italic_X end_POSTSUBSCRIPT : over^ start_ARG italic_X end_ARG start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG italic_d end_ARG end_RELOP italic_X end_POSTSUBSCRIPT italic_I ( italic_X ; over^ start_ARG italic_X end_ARG ) + italic_λ blackboard_E [ ∥ italic_X - over^ start_ARG italic_X end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] . (3)

Due to perfect perceptual quality, the reconstructed signal X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG must lie on the unit circle, and we can represent X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG by its angle θ(X^)𝜃^𝑋\theta(\hat{X})italic_θ ( over^ start_ARG italic_X end_ARG ). The MSE distortion term XX^22superscriptsubscriptnorm𝑋^𝑋22\|X-\hat{X}\|_{2}^{2}∥ italic_X - over^ start_ARG italic_X end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT can be written as 2(1cos(θ(X)θ(X^))).21𝜃𝑋𝜃^𝑋2(1-\cos(\theta(X)-\theta(\hat{X}))).2 ( 1 - roman_cos ( start_ARG italic_θ ( italic_X ) - italic_θ ( over^ start_ARG italic_X end_ARG ) end_ARG ) ) . The mutual information can be lower bounded by

I(X;X^)𝐼𝑋^𝑋\displaystyle I(X;\hat{X})italic_I ( italic_X ; over^ start_ARG italic_X end_ARG ) =h(X)h(X|X^)h(X)h(XX^)absent𝑋conditional𝑋^𝑋𝑋𝑋^𝑋\displaystyle=h(X)-h(X|\hat{X})\geq h(X)-h(X-\hat{X})= italic_h ( italic_X ) - italic_h ( italic_X | over^ start_ARG italic_X end_ARG ) ≥ italic_h ( italic_X ) - italic_h ( italic_X - over^ start_ARG italic_X end_ARG ) (4)
=h(θ(X))h(θ(X)θ(X^)).absent𝜃𝑋𝜃𝑋𝜃^𝑋\displaystyle=h(\theta(X))-h(\theta(X)-\theta(\hat{X})).= italic_h ( italic_θ ( italic_X ) ) - italic_h ( italic_θ ( italic_X ) - italic_θ ( over^ start_ARG italic_X end_ARG ) ) .

For simplicity, from here on we will write θ=θ(X)𝜃𝜃𝑋\theta=\theta(X)italic_θ = italic_θ ( italic_X ) and θ^=θ(X^)^𝜃𝜃^𝑋\hat{\theta}=\theta(\hat{X})over^ start_ARG italic_θ end_ARG = italic_θ ( over^ start_ARG italic_X end_ARG ), and denote β:=θθ^assign𝛽𝜃^𝜃\beta:=\theta-\hat{\theta}italic_β := italic_θ - over^ start_ARG italic_θ end_ARG.

Since h(θ(X))𝜃𝑋h(\theta(X))italic_h ( italic_θ ( italic_X ) ) is a constant, we can consider the optimization problem below, equivalent to lower-bounding (3)

minimizep(β)h(β)+2λ𝔼[(1cos(β))].subscriptminimize𝑝𝛽𝛽2𝜆𝔼delimited-[]1𝛽\displaystyle\text{minimize}_{p(\beta)}-h(\beta)+2\lambda\mathbb{E}[(1-\cos(% \beta))].minimize start_POSTSUBSCRIPT italic_p ( italic_β ) end_POSTSUBSCRIPT - italic_h ( italic_β ) + 2 italic_λ blackboard_E [ ( 1 - roman_cos ( start_ARG italic_β end_ARG ) ) ] . (5)

Using simple calculus of variation, it can be verified that the optimal distribution of β𝛽\betaitalic_β for the optimization above is p(β)=e2λcos(β)ππe2λcos(β)dβ𝑝𝛽superscript𝑒2𝜆𝛽superscriptsubscript𝜋𝜋superscript𝑒2𝜆superscript𝛽differential-dsuperscript𝛽p(\beta)=\frac{e^{2\lambda\cos(\beta)}}{\int_{-\pi}^{\pi}e^{2\lambda\cos(\beta% ^{\prime})}\mathrm{d}\beta^{\prime}}italic_p ( italic_β ) = divide start_ARG italic_e start_POSTSUPERSCRIPT 2 italic_λ roman_cos ( start_ARG italic_β end_ARG ) end_POSTSUPERSCRIPT end_ARG start_ARG ∫ start_POSTSUBSCRIPT - italic_π end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT 2 italic_λ roman_cos ( start_ARG italic_β start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) end_POSTSUPERSCRIPT roman_d italic_β start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG. Since β𝛽\betaitalic_β is independent of θ𝜃\thetaitalic_θ, the sum θ^=θ+β^𝜃𝜃𝛽\hat{\theta}=\theta+\betaover^ start_ARG italic_θ end_ARG = italic_θ + italic_β has a uniform distribution over [π,π]𝜋𝜋[-\pi,\pi][ - italic_π , italic_π ]. Thus this distribution indeed provides a lower bound to (3).

To show that they are in fact equal, we only need to observe that in (4), the only inequality can be written as

I(X;X^)𝐼𝑋^𝑋\displaystyle I(X;\hat{X})italic_I ( italic_X ; over^ start_ARG italic_X end_ARG ) =h(X)h(X|X^)=h(θ)h(θ|θ^)absent𝑋conditional𝑋^𝑋𝜃conditional𝜃^𝜃\displaystyle=h(X)-h(X|\hat{X})=h(\theta)-h(\theta|\hat{\theta})= italic_h ( italic_X ) - italic_h ( italic_X | over^ start_ARG italic_X end_ARG ) = italic_h ( italic_θ ) - italic_h ( italic_θ | over^ start_ARG italic_θ end_ARG )
=h(θ)h(β|θ^)h(θ)h(β).absent𝜃conditional𝛽^𝜃𝜃𝛽\displaystyle=h(\theta)-h(\beta|\hat{\theta})\geq h(\theta)-h(\beta).= italic_h ( italic_θ ) - italic_h ( italic_β | over^ start_ARG italic_θ end_ARG ) ≥ italic_h ( italic_θ ) - italic_h ( italic_β ) . (6)

However, observe that we have

pβ|θ^(β|θ^)subscript𝑝conditional𝛽^𝜃conditional𝛽^𝜃\displaystyle p_{\beta|\hat{\theta}}(\beta|\hat{\theta})italic_p start_POSTSUBSCRIPT italic_β | over^ start_ARG italic_θ end_ARG end_POSTSUBSCRIPT ( italic_β | over^ start_ARG italic_θ end_ARG ) =pβ,θ^(β,θ^)pθ^(θ^)=pβ,θ(β,θ^β)pθ^(θ^)absentsubscript𝑝𝛽^𝜃𝛽^𝜃subscript𝑝^𝜃^𝜃subscript𝑝𝛽𝜃𝛽^𝜃𝛽subscript𝑝^𝜃^𝜃\displaystyle=\frac{p_{\beta,\hat{\theta}}(\beta,\hat{\theta})}{p_{\hat{\theta% }}(\hat{\theta})}=\frac{p_{\beta,\theta}(\beta,\hat{\theta}-\beta)}{p_{\hat{% \theta}}(\hat{\theta})}= divide start_ARG italic_p start_POSTSUBSCRIPT italic_β , over^ start_ARG italic_θ end_ARG end_POSTSUBSCRIPT ( italic_β , over^ start_ARG italic_θ end_ARG ) end_ARG start_ARG italic_p start_POSTSUBSCRIPT over^ start_ARG italic_θ end_ARG end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG ) end_ARG = divide start_ARG italic_p start_POSTSUBSCRIPT italic_β , italic_θ end_POSTSUBSCRIPT ( italic_β , over^ start_ARG italic_θ end_ARG - italic_β ) end_ARG start_ARG italic_p start_POSTSUBSCRIPT over^ start_ARG italic_θ end_ARG end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG ) end_ARG
=pβ(β)pθ(θ^β)pθ^(θ^)=pβ(β),absentsubscript𝑝𝛽𝛽subscript𝑝𝜃^𝜃𝛽subscript𝑝^𝜃^𝜃subscript𝑝𝛽𝛽\displaystyle=\frac{p_{\beta}(\beta)p_{\theta}(\hat{\theta}-\beta)}{p_{\hat{% \theta}}(\hat{\theta})}=p_{\beta}(\beta),= divide start_ARG italic_p start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ( italic_β ) italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG - italic_β ) end_ARG start_ARG italic_p start_POSTSUBSCRIPT over^ start_ARG italic_θ end_ARG end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG ) end_ARG = italic_p start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ( italic_β ) ,

where the last step is because both θ𝜃\thetaitalic_θ and θ^^𝜃\hat{\theta}over^ start_ARG italic_θ end_ARG are uniformly distributed marginally. This implies β𝛽\betaitalic_β is in fact independent of θ^^𝜃\hat{\theta}over^ start_ARG italic_θ end_ARG, and h(β|θ^)=h(β)conditional𝛽^𝜃𝛽h(\beta|\hat{\theta})=h(\beta)italic_h ( italic_β | over^ start_ARG italic_θ end_ARG ) = italic_h ( italic_β ), and therefore (6) becomes equality, which establishes the overall equality. Thus the rate-distortion pairs are indeed characterized by that given in Theorem 2.

It is not difficult to verify that the curve (or function) above is continuous, and its epigraph is non-empty and closed lying in the upper right quadrant. Each point on the curve naturally has a supporting hyperplane, since it is a solution of optimizing the corresponding Lagrangian. Thus by the partial converse of supporting hyperplane theorem the curve is convex. ∎

Theorem 3.

In the unit-circle setting, the optimal scalar quantization (single shot coding) trade-off between the coding rate and the distortion with perfect perceptual quality is the piece-wise linear function with the following extreme points

{(R,D)=(logL,22sin(π/L)π/L):L=1,2,3,},conditional-set𝑅𝐷𝐿22𝜋𝐿𝜋𝐿𝐿123\displaystyle\left\{(R,D)=\left(\log L,2-2\frac{\sin(\pi/L)}{\pi/L}\right):L=1% ,2,3,\ldots\right\},{ ( italic_R , italic_D ) = ( roman_log italic_L , 2 - 2 divide start_ARG roman_sin ( start_ARG italic_π / italic_L end_ARG ) end_ARG start_ARG italic_π / italic_L end_ARG ) : italic_L = 1 , 2 , 3 , … } ,

which can be achieved by dithered quantizations.

As N𝑁N\rightarrow\inftyitalic_N → ∞, we see that sin(π/(LN))π/(LN)1𝜋𝐿𝑁𝜋𝐿𝑁1\frac{\sin(\pi/(LN))}{\pi/(LN)}\rightarrow 1divide start_ARG roman_sin ( start_ARG italic_π / ( italic_L italic_N ) end_ARG ) end_ARG start_ARG italic_π / ( italic_L italic_N ) end_ARG → 1, therefore, the performance of the staggered quantizer approaches that of dithered quantization in this setting. Due to the uniform data source distribution, dithered quantizers are optimal, and N𝑁Nitalic_N staggered quantizers each with L𝐿Litalic_L levels each does not offer any advantage over dithered quantizers. However, as we will discuss in the next section, this is not the case in general, since the flexibility in entropy coding can lead to an additional edge.

Proof of Theorem 3.

Any codecs (f,g)𝑓𝑔(f,g)( italic_f , italic_g ) can be represented by f:𝒳×:𝑓𝒳f:\mathcal{X}\times\mathbb{R}\rightarrow\mathbb{Z}italic_f : caligraphic_X × blackboard_R → blackboard_Z and g:×𝒳:𝑔𝒳g:\mathbb{Z}\times\mathbb{R}\rightarrow\mathcal{X}italic_g : blackboard_Z × blackboard_R → caligraphic_X. The signal X𝑋Xitalic_X is encoded by f(X,V)𝑓𝑋𝑉f(X,V)italic_f ( italic_X , italic_V ) to some integer and then reconstructed by X^=g(f(X,V),V)^𝑋𝑔𝑓𝑋𝑉𝑉\hat{X}=g(f(X,V),V)over^ start_ARG italic_X end_ARG = italic_g ( italic_f ( italic_X , italic_V ) , italic_V ), where V𝑉Vitalic_V is the common randomness.

Due to the perfect perceptual quality requirement, the reconstructed signal X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG must lie on the unit circle. Without considering perceptual quality, we first characterize the scalar optimal quantization under the condition that reconstruction X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG lies on the unit circle. Take any Lagrange multiplier λ>0𝜆0\lambda>0italic_λ > 0, consider minimizing the following rate distortion Lagrangian with decision variables (f,g,V)𝑓𝑔𝑉(f,g,V)( italic_f , italic_g , italic_V )

H(f(X;V)|V)+λ𝔼X,V[d(X,g(f(X;V);V))]𝐻conditional𝑓𝑋𝑉𝑉𝜆subscript𝔼𝑋𝑉delimited-[]𝑑𝑋𝑔𝑓𝑋𝑉𝑉\displaystyle H(f(X;V)|V)+\lambda\mathbb{E}_{X,V}[d(X,g(f(X;V);V))]italic_H ( italic_f ( italic_X ; italic_V ) | italic_V ) + italic_λ blackboard_E start_POSTSUBSCRIPT italic_X , italic_V end_POSTSUBSCRIPT [ italic_d ( italic_X , italic_g ( italic_f ( italic_X ; italic_V ) ; italic_V ) ) ]
=𝔼V[𝔼X[log((f(X;V)|V))+λd(X,g(f(X;V);V))|V]]absentsubscript𝔼𝑉delimited-[]subscript𝔼𝑋delimited-[]conditional𝑓𝑋𝑉𝑉conditional𝜆𝑑𝑋𝑔𝑓𝑋𝑉𝑉𝑉\displaystyle=\mathbb{E}_{V}[\mathbb{E}_{X}[-\log(\mathbb{P}(f(X;V)|V))+% \lambda d(X,g(f(X;V);V))|V]]= blackboard_E start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT [ blackboard_E start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT [ - roman_log ( start_ARG blackboard_P ( italic_f ( italic_X ; italic_V ) | italic_V ) end_ARG ) + italic_λ italic_d ( italic_X , italic_g ( italic_f ( italic_X ; italic_V ) ; italic_V ) ) | italic_V ] ]

It suffices to study the deterministic quantizer, since for any stochastic quantizer (f,g,V)𝑓𝑔𝑉(f,g,V)( italic_f , italic_g , italic_V ), there exists a deterministic quantizer (f(;v),g(;,v))(f(;v),g(;,v))( italic_f ( ; italic_v ) , italic_g ( ; , italic_v ) ) with some realization of V=v𝑉𝑣V=vitalic_V = italic_v such that its Lagrangian is at most that of the stochastic quantizer.

It is straightforward to verify that the optimal deterministic quantizer in this setting must have contiguous regions (pathological cases may exist for complex distributions [gyorgy2002structure]), i.e., the region in 𝒳𝒳\mathcal{X}caligraphic_X of the same index f(,v)𝑓𝑣f(\cdot,v)italic_f ( ⋅ , italic_v ) should be contiguous. For such a quantizer with L𝐿Litalic_L levels, i.e., |f(,v)|=L𝑓𝑣𝐿|f(\cdot,v)|=L| italic_f ( ⋅ , italic_v ) | = italic_L, it can then be shown using calculus that it must be a uniform quantizer. The optimal scalar quantization (single shot coding) trade-off between the coding rate and the distortion is the piece-wise linear function with the following extreme points

{(R,D)=(logL,22sin(π/L)π/L):L=1,2,3,}.conditional-set𝑅𝐷𝐿22𝜋𝐿𝜋𝐿𝐿123\displaystyle\left\{(R,D)=\left(\log L,2-2\frac{\sin(\pi/L)}{\pi/L}\right):L=1% ,2,3,\ldots\right\}.{ ( italic_R , italic_D ) = ( roman_log italic_L , 2 - 2 divide start_ARG roman_sin ( start_ARG italic_π / italic_L end_ARG ) end_ARG start_ARG italic_π / italic_L end_ARG ) : italic_L = 1 , 2 , 3 , … } .

This piece-wise linear function is a lower bound, when considering perfect perceptual quality. However, it is straightforward to verify that dithered quantization has the perfect perceptual quality and can achieve the extreme points and thus match the lower bound. Thus the optimal scalar quantization trade-off between the coding rate and the distortion with perfect perceptual quality is also the piece-wise linear function above and can be achieved by time-sharing dithered quantizers. ∎

IV Design of staggered quantizers for general scalar sources

Consider applying the staggered quantization approach to a general scalar source. Assuming there are N𝑁Nitalic_N uniform quantizers to be staggered, the encoding function fn(x)subscript𝑓𝑛𝑥f_{n}(x)italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x ) for the n𝑛nitalic_n-th quantizer with stepsize ΔΔ\Deltaroman_Δ is

fn(x)=xΔnN,n=0,1,2,,N1,\displaystyle f_{n}(x)=\left\lfloor\frac{x}{\Delta}-\frac{n}{N}\right\rceil,% \quad n=0,1,2,\ldots,N-1,italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x ) = ⌊ divide start_ARG italic_x end_ARG start_ARG roman_Δ end_ARG - divide start_ARG italic_n end_ARG start_ARG italic_N end_ARG ⌉ , italic_n = 0 , 1 , 2 , … , italic_N - 1 , (7)

where delimited-⌊⌉\lfloor\cdot\rceil⌊ ⋅ ⌉ is the operation that rounds to the nearest integer.

To achieve perfect perceptual quality, decoder side randomness must be used, yet due to the potential non-uniformity of the distribution, it is more involved than simply subtracting certain random values. To present the procedure, first denote the density of the data source X𝑋Xitalic_X as pX(x)subscript𝑝𝑋𝑥p_{X}(x)italic_p start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) and denote by FX(x)=(Xx)subscript𝐹𝑋𝑥𝑋𝑥F_{X}(x)=\mathbb{P}(X\leq x)italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) = blackboard_P ( italic_X ≤ italic_x ) its cumulative distribution function. Denote its inverse as FX1(t)inf{x:FX(x)>t}subscriptsuperscript𝐹1𝑋𝑡infimumconditional-set𝑥subscript𝐹𝑋𝑥𝑡F^{-1}_{X}(t)\triangleq\inf\{x:F_{X}(x)>t\}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_t ) ≜ roman_inf { italic_x : italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) > italic_t } for any t[0,1)𝑡01t\in[0,1)italic_t ∈ [ 0 , 1 ). Let us introduce a density function on [a,b]𝑎𝑏[a,b][ italic_a , italic_b ] as qa,b(x)pX(x)abpX(t)𝑑tsubscript𝑞𝑎𝑏𝑥subscript𝑝𝑋𝑥superscriptsubscript𝑎𝑏subscript𝑝𝑋𝑡differential-d𝑡q_{a,b}(x)\triangleq\frac{p_{X}(x)}{\int_{a}^{b}p_{X}(t)dt}italic_q start_POSTSUBSCRIPT italic_a , italic_b end_POSTSUBSCRIPT ( italic_x ) ≜ divide start_ARG italic_p start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_t ) italic_d italic_t end_ARG. A random variable generated privately at the decoder side according to this distribution is denoted as Z~a,bsubscript~𝑍𝑎𝑏\tilde{Z}_{a,b}over~ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_a , italic_b end_POSTSUBSCRIPT, which is independent of all the other random variables.

Define an indexing function m(x,n)=Nfn(x)+n𝑚𝑥𝑛𝑁subscript𝑓𝑛𝑥𝑛m(x,n)=N\cdot f_{n}(x)+nitalic_m ( italic_x , italic_n ) = italic_N ⋅ italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x ) + italic_n, which essentially specifies an order of all the quantization cells in all these N𝑁Nitalic_N quantizers. Define its inverse at input x𝑥xitalic_x as mX1(j)inf{x:n[0:N1],m(x,n)=j}m^{-1}_{X}(j)\triangleq\inf\{x:\exists n\in[0:N-1],m(x,n)=j\}italic_m start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_j ) ≜ roman_inf { italic_x : ∃ italic_n ∈ [ 0 : italic_N - 1 ] , italic_m ( italic_x , italic_n ) = italic_j }. Intuitively, for each quantizer and quantizer cell index pair (n,fn(x))𝑛subscript𝑓𝑛𝑥(n,f_{n}(x))( italic_n , italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x ) ), the reconstruction at the decoder is a random variable that follows a distribution that matches the data sample distribution in an interval. Now to specify the specific interval, we define a sequence of boundaries (a(j),b(j))jsubscript𝑎𝑗𝑏𝑗𝑗(a(j),b(j))_{j\in\mathbb{Z}}( italic_a ( italic_j ) , italic_b ( italic_j ) ) start_POSTSUBSCRIPT italic_j ∈ blackboard_Z end_POSTSUBSCRIPT as

a(j)FX1(k=1NFX(mx1(jk))N),b(j1)a(j).formulae-sequence𝑎𝑗superscriptsubscript𝐹𝑋1superscriptsubscript𝑘1𝑁subscript𝐹𝑋superscriptsubscript𝑚𝑥1𝑗𝑘𝑁𝑏𝑗1𝑎𝑗\displaystyle a(j)\triangleq F_{X}^{-1}\left(\sum_{k=1}^{N}\frac{F_{X}(m_{x}^{% -1}(j-k))}{N}\right),~{}b(j-1)\triangleq a(j).italic_a ( italic_j ) ≜ italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_m start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_j - italic_k ) ) end_ARG start_ARG italic_N end_ARG ) , italic_b ( italic_j - 1 ) ≜ italic_a ( italic_j ) .

The encoding and reconstruction process can now be described as follows. Given data source X𝑋Xitalic_X at the encoder side, the encoding procedure uniformly at random selects one of the N𝑁Nitalic_N encoders {f0,f1,,fN1}subscript𝑓0subscript𝑓1subscript𝑓𝑁1\{f_{0},f_{1},\ldots,f_{N-1}\}{ italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_N - 1 end_POSTSUBSCRIPT } with stepsize ΔΔ\Deltaroman_Δ. The index n𝑛nitalic_n of the selected encoder is a common randomness shared by the decoder, and the data sample is encoded as fn(X)subscript𝑓𝑛𝑋f_{n}(X)italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_X ). At the decoder, we compute the index j𝑗jitalic_j using fn(x)subscript𝑓𝑛𝑥f_{n}(x)italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x ) and n𝑛nitalic_n by the indexing function m()𝑚m(\cdot)italic_m ( ⋅ ), and the reconstruction is a random sample X^=Z~a(j),b(j)^𝑋subscript~𝑍𝑎𝑗𝑏𝑗\hat{X}=\tilde{Z}_{a(j),b(j)}over^ start_ARG italic_X end_ARG = over~ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_a ( italic_j ) , italic_b ( italic_j ) end_POSTSUBSCRIPT. More formally, the decoding function upon receiving code fn(X)=isubscript𝑓𝑛𝑋𝑖f_{n}(X)=iitalic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_X ) = italic_i is

g(i)=Z~a(j),b(j), with j=Ni+n,formulae-sequence𝑔𝑖subscript~𝑍𝑎𝑗𝑏𝑗 with 𝑗𝑁𝑖𝑛\displaystyle g(i)=\tilde{Z}_{a(j),b(j)},~{}\text{ with }j=Ni+n,italic_g ( italic_i ) = over~ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_a ( italic_j ) , italic_b ( italic_j ) end_POSTSUBSCRIPT , with italic_j = italic_N italic_i + italic_n , (8)

where n𝑛nitalic_n is the common randomness of the offset quantizer index. We remark here that the offsets can be viewed as a random dither which takes discrete values in {0,1/N,2/N,,(N1)/N}01𝑁2𝑁𝑁1𝑁\{0,1/N,2/N,\ldots,(N-1)/N\}{ 0 , 1 / italic_N , 2 / italic_N , … , ( italic_N - 1 ) / italic_N }. However, for each realization, the reconstruction is sampled in an interval, unlike in classic deterministic quantizers or dithered quantizers. An illustration is given in Fig. 3.

Refer to caption
Figure 3: Staggered quantizers for general probability distributions.

Since the number of staggered quantizers is small, it is possible to design tailored entropy code for each, whereas this is impossible for dithered quantizers, resulting in a rate close to H(f(X+Z))𝐻𝑓𝑋𝑍H(f(X+Z))italic_H ( italic_f ( italic_X + italic_Z ) ). Dithered quantization also suffers because X+Z~𝑋~𝑍X+\tilde{Z}italic_X + over~ start_ARG italic_Z end_ARG induces loss of perception, and an additional sha** step is required. As shown in Fig. 4, the proposed approach can sometimes outperform both dithered quantizers and deterministic encoders. Particularly, even mixing 2222 quantizers appears to provide competitive performance.

Refer to caption
Figure 4: Quantization of a uniformly distributed source on an interval

V Conclusion

We consider RDP coding from a quantizer design perspective. By decomposing dithered quantization, we obtain staggered quantizers as intermediates between the two extremes of dithered quantization and quantization without common randomness. This new perspective provides a new way to understand one-shot coding for RDP.

References

  • [1] A. Gersho and R. M. Gray, Vector quantization and signal compression.   Springer, 1992, vol. 159.
  • [2] T. Berger, Rate distortion theory: A mathematical basis for data compression.   Prentice-Hall series in information and system sciences, 1971.
  • [3] Y. Blau and T. Michaeli, “Rethinking lossy compression: The rate-distortion-perception tradeoff,” in International Conference on Machine Learning.   PMLR, 2019, pp. 675–685.
  • [4] A. B. Wagner, “The rate-distortion-perception tradeoff: The role of common randomness,” arXiv preprint arXiv:2202.04147, 2022.
  • [5] L. Theis and E. Agustsson, “On the advantages of stochastic encoders,” in Neural Compression: From Information Theory to Applications–Workshop@ ICLR 2021, 2021.
  • [6] J. Chen, L. Yu, J. Wang, W. Shi, Y. Ge, and W. Tong, “On the rate-distortion-perception function,” IEEE Journal on Selected Areas in Information Theory, vol. 3, no. 4, pp. 664–673, 2022.
  • [7] G. Serra, P. A. Stavrou, and M. Kountouris, “Computation of rate-distortion-perception function under f-divergence perception constraints,” arXiv preprint arXiv:2305.04604, 2023.
  • [8] Y. Hamdi and D. Gündüz, “The rate-distortion-perception trade-off with side information,” arXiv preprint arXiv:2305.13116, 2023.
  • [9] S. Salehkalaibar, J. Chen, A. Khisti, and W. Yu, “Rate-distortion-perception tradeoff based on the conditional-distribution perception measure,” arXiv preprint arXiv:2401.12207, 2024.
  • [10] X. Niu, D. Gündüz, B. Bai, and W. Han, “Conditional rate-distortion-perception trade-off,” arXiv preprint arXiv:2305.09318, 2023.
  • [11] C. T. Li and A. El Gamal, “Strong functional representation lemma and applications to coding theorems,” IEEE Transactions on Information Theory, vol. 64, no. 11, pp. 6967–6978, 2018.
  • [12] J. Ziv, “On universal quantization,” IEEE Transactions on Information Theory, vol. 31, no. 3, pp. 344–347, 1985.
  • [13] R. Zamir and M. Feder, “On universal quantization by randomized uniform/lattice quantizers,” IEEE Transactions on Information Theory, vol. 38, no. 2, pp. 428–436, 1992.
  • [14] N. Saldi, T. Linder, and S. Yüksel, “Randomized quantization and source coding with constrained output distribution,” IEEE Transactions on Information Theory, vol. 61, no. 1, pp. 91–106, 2014.
  • [15] C. Tian, “A new class of multiple description scalar quantizer and its application to image coding,” IEEE Signal Processing Letters, vol. 12, no. 4, pp. 329–332, 2005.
  • [16] ——, “Staggered lattices in multiple description quantization,” in Data Compression Conference, 2005, pp. 398–407.
  • [17] U. Samarawickrama, J. Liang, and C. Tian, “m𝑚mitalic_m-channel multiple description coding with two-rate coding and staggered quantization,” IEEE transactions on circuits and systems for video technology, vol. 20, no. 7, pp. 933–944, 2010.
  • [18] R. Zamir, Lattice Coding for Signals and Networks: A Structured Coding Approach to Quantization, Modulation, and Multiuser Information Theory.   Cambridge University Press, 2014.
  • [19] L. Theis and A. B. Wagner, “A coding theorem for the rate-distortion-perception function,” in Neural Compression: From Information Theory to Applications–Workshop@ ICLR 2021, 2021.
  • [20] Z. Yan, F. Wen, R. Ying, C. Ma, and P. Liu, “On perceptual lossy compression: The cost of perceptual reconstruction and an optimal training framework,” in International Conference on Machine Learning.   PMLR, 2021, pp. 11 682–11 692.
Optimality of Uniform Quantizers in the Proof of Theorem 3.

Consider two adjacent Voronoi cells. Suppose the two adjacent regions have a total size (in terms of the angle spanned) 2πr2𝜋𝑟2\pi r2 italic_π italic_r for some r(0,1]𝑟01r\in(0,1]italic_r ∈ ( 0 , 1 ], moreover, suppose the first Voronoi is of size 2πα2𝜋𝛼2\pi\alpha2 italic_π italic_α for some α(0,r)𝛼0𝑟\alpha\in(0,r)italic_α ∈ ( 0 , italic_r ). For optimal partitions, α𝛼\alphaitalic_α must be a minimizer of the following function

l(α;r)𝑙𝛼𝑟\displaystyle l(\alpha;r)italic_l ( italic_α ; italic_r ) =(rα)ln(rα)+λπsin(π(rα))absent𝑟𝛼𝑟𝛼𝜆𝜋𝜋𝑟𝛼\displaystyle=(r-\alpha)\ln(r-\alpha)+\frac{\lambda}{\pi}\sin(\pi(r-\alpha))= ( italic_r - italic_α ) roman_ln ( start_ARG italic_r - italic_α end_ARG ) + divide start_ARG italic_λ end_ARG start_ARG italic_π end_ARG roman_sin ( start_ARG italic_π ( italic_r - italic_α ) end_ARG )
+αln(α)+λπsin(πα).𝛼𝛼𝜆𝜋𝜋𝛼\displaystyle+\alpha\ln(\alpha)+\frac{\lambda}{\pi}\sin(\pi\alpha).+ italic_α roman_ln ( start_ARG italic_α end_ARG ) + divide start_ARG italic_λ end_ARG start_ARG italic_π end_ARG roman_sin ( start_ARG italic_π italic_α end_ARG ) .

Its derivative is

l(α;r)=superscript𝑙𝛼𝑟absent\displaystyle l^{\prime}(\alpha;r)=italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ; italic_r ) = ln(rα)λcos(π(rα))𝑟𝛼𝜆𝜋𝑟𝛼\displaystyle-\ln(r-\alpha)-\lambda\cos(\pi(r-\alpha))- roman_ln ( start_ARG italic_r - italic_α end_ARG ) - italic_λ roman_cos ( start_ARG italic_π ( italic_r - italic_α ) end_ARG )
+ln(α)+λcos(πα)𝛼𝜆𝜋𝛼\displaystyle\qquad+\ln(\alpha)+\lambda\cos(\pi\alpha)+ roman_ln ( start_ARG italic_α end_ARG ) + italic_λ roman_cos ( start_ARG italic_π italic_α end_ARG )

and its second derivative is

l′′(α;r)=1rα+1αλπ(sin(π(rα))+sin(πα)).superscript𝑙′′𝛼𝑟1𝑟𝛼1𝛼𝜆𝜋𝜋𝑟𝛼𝜋𝛼\displaystyle l^{\prime\prime}(\alpha;r)=\frac{1}{r-\alpha}+\frac{1}{\alpha}-% \lambda\pi(\sin(\pi(r-\alpha))+\sin(\pi\alpha)).italic_l start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_α ; italic_r ) = divide start_ARG 1 end_ARG start_ARG italic_r - italic_α end_ARG + divide start_ARG 1 end_ARG start_ARG italic_α end_ARG - italic_λ italic_π ( roman_sin ( start_ARG italic_π ( italic_r - italic_α ) end_ARG ) + roman_sin ( start_ARG italic_π italic_α end_ARG ) ) .

It is not hard to verify that l𝑙litalic_l and l′′superscript𝑙′′l^{\prime\prime}italic_l start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT are even functions, and lsuperscript𝑙l^{\prime}italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is an odd function. There are two circumstances

  1. 1.

    λ𝜆\lambdaitalic_λ is small, and l′′(α;r)0superscript𝑙′′𝛼𝑟0l^{\prime\prime}(\alpha;r)\geq 0italic_l start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_α ; italic_r ) ≥ 0. Then l(α;r)𝑙𝛼𝑟l(\alpha;r)italic_l ( italic_α ; italic_r ) is a non-constant symmetric convex function whose optimal value is achieved by α0𝛼0\alpha\rightarrow 0italic_α → 0 or αr𝛼𝑟\alpha\rightarrow ritalic_α → italic_r, which conflicts with the fact that the optimal quantizer has non-empty Voronoi.

  2. 2.

    λ𝜆\lambdaitalic_λ is large, and l′′(α;r)superscript𝑙′′𝛼𝑟l^{\prime\prime}(\alpha;r)italic_l start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_α ; italic_r ) will be positive on both ends and negative in the middle. l(α;r)superscript𝑙𝛼𝑟l^{\prime}(\alpha;r)italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ; italic_r ) is increasing, decreasing and increasing. l(α;r)𝑙𝛼𝑟l(\alpha;r)italic_l ( italic_α ; italic_r ) will either have a maximum with α=r/2𝛼𝑟2\alpha=r/2italic_α = italic_r / 2 or the maximum is approached by α0𝛼0\alpha\rightarrow 0italic_α → 0 or αr𝛼𝑟\alpha\rightarrow ritalic_α → italic_r.

Therefore any two adjacent non-empty Voronoi cells have the same size. The optimal quantizer thus must have equal-sized Voronoi cells, thus a uniform quantizer. ∎