Linearization of optimal rates for independent zero-error source and channel problems

Nicolas Charpenay, , Maël Le Treust, , and Aline Roumy The work of Nicolas Charpenay was conducted during his PhD at IRISA UMR 6074 and Centre Inria de l’Université de Rennes, funded by CDSN ENS Paris-Saclay.This work was presented in part at the IEEE Information Theory Workshop (ITW) 2023 in Saint-Malo, France, [DOI: 10.1109/ITW55543.2023.10161637], and in part at the IEEE International Symposium on Information Theory (ISIT) 2023 in Taipei, Taiwan, [DOI: 10.1109/ISIT54713.2023.10206770].Nicolas Charpenay is with Univ Rennes, CNRS, IRMAR UMR 6625, F-35000 Rennes, France (e-mail: [email protected]).Maël Le Treust is with Univ. Rennes, CNRS, Inria, IRISA UMR 6074, F-35000 Rennes, France (e-mail: [email protected]).Aline Roumy is with Centre Inria de l’Université de Rennes, France (e-mail: [email protected]).

Abstract

Zero-error coding encompasses a variety of source and channel problems where the probability of error must be exactly zero. The zero-error constraint differs from the vanishing-error constraint, the latter only requires the probability of error to go to zero when the block length of the code goes to infinity. Here, many problems change from a statistical nature to a combinatorial one, which is tied to the encoder?s lack of knowledge about what is observed by the decoder. In this paper, we investigate two unsolved zero-error problems: the source coding with side information and the channel coding. We focus our attention on families of independent problems for which the distribution decomposes into a product of distributions, corresponding to solved zero-error problems. A crucial step is the linearization property of the optimal rate, which does not always hold in the zero-error regime, unlike in the vanishing error regime. By generalizing recent results of Wigderson and Zuiddam, and of Schrijver, we derive a condition under which the linearization properties of the complementary graph entropy $\overline{H}$ and of the zero-error capacity $C_{0}$ for the AND product of graph and for the disjoint union of graphs are all equivalent. This provides new single-letter characterization of $\overline{H}$ and $C_{0}$ , for example when the graph is a product of perfect graphs, which is not perfect in general, and for the class of graphs obtain by the product of a perfect graph $G$ with the pentagon graph $C_{5}$ . By building on Haemers result, we also show that the linearization of the complementary graph entropy does not hold for the product of the Schläfli graph with its complementary graph.

I Introduction

Transmitting information without any errors has been a concern for Shannon since the beginning of his work. In his seminal paper [1], Shannon proposed a construction for zero-error source coding, a problem soon solved by Huffman in [2]. Shortly after establishing the channel capacity in [1], Shannon turned his attention to channel coding with zero-error in [3], instead of vanishing error. This subtle difference radically changes the nature of the problem, essentially combinatorial rather than probabilistic. The single-letter characterization of the zero-error capacity is a notoriously difficult open problem. For example, the zero-error capacity of the noisy-typewriter channel with $7$ letters is unknown, some lower and upper bounds are stated in [4], [5], and [6]. In fact, the zero-error property only depends on the support of the channel conditional distribution $P_{Y|X}$ but not on the probability values. More precisely, the zero-error property is translated into the characteristic graph that encompasses the problem data in its structure: the vertices are the channel inputs $\mathcal{X}$ , and two symbols $x$ and $x^{\prime}$ are adjacent if they are “confusable”, i.e. if they can produce the same channel output $y$ with positive probability. For sequences of symbols $x^{n}$ , the characteristic graph is obtained by taking iteratively the AND product ( $\wedge$ ) of the graphs. In order to prevent any decoding error, a zero-error codebook must be composed of non-adjacent codewords. Thus, the size of the optimal codebook is given by the size of the maximal independent set, called the independence number. More specifically, the zero-error capacity $C_{0}$ is the asymptotic limit of the independence number of iterated AND products of the characteristic graph. This means that all channel distributions with the same characteristic graph (or equivalently, with the same support) have the same zero-error capacity. Over time, this open question has attracted a lot of attention in Information Theory [7, Chap. 11] and in Combinatorics and Graph Theory, see [8, Chap. 27].

Figure 1: The characteristic graph

C_{7}

of the noisy-typewriter channel with

7

letters.

This problem inspired Berge’s notion of perfect graphs [9, pp. 382], for which the zero-error capacity is given by the one-shot independence number [10, Theorem 4.18]. Graphs with odd cycles are also related to Berge’s conjecture [11], later proved in [12] by Chudnovsky et al., namely “a graph $G$ is perfect if and only if neither $G$ nor its complementary graph $\overline{G}$ , have odd cycles of length $5$ or more.” Since the zero-error capacity of the pentagon graph $C_{5}$ has been characterized by Shannon [3] and Lovász [13], as well as the zero-error capacity for perfect graphs, see [10, Theorem 4.18], the graph $C_{7}$ depicted in Fig. 1, is the minimal connected graph for which the zero-error capacity is still an open problem.

In the source coding framework, an important unsolved zero-error problem was posed by Witsenhausen in [14] when the decoder has side information. In this problem depicted in Fig. 2, the encoder shares information about a source $X$ , exploiting the side-information $Y$ observed by the decoder but not by itself. In the vanishing error regime, Slepian and Wolf in [15] showed that the optimal rate is $H(X|Y)$ , but no single-letter characterization is available in the zero-error regime. As for the channel coding problem, the zero-error property is embedded into the characteristic graph $G$ constructed with respect to the conditional distribution $P_{Y|X}$ of the “side-information channel”. The main difference with channel coding problem is that the source distribution $P_{X}$ is fixed. Witsenhausen showed in [14] that the fixed-length encoding task becomes equivalent to graph coloring. In [16], Alon and Orlitsky considered the variable-length version, determining an asymptotic expression for the optimal rate based on the chromatic entropy. In [17], Koulgi et al. proved that the optimal rate coincides with the complementary graph entropy $\overline{H}$ , defined by Körner and Longo in [18]. These two expressions are asymptotic, as optimal rates are determined by coloring an infinite product of graphs. Single-letter characterizations are known only in the same cases as for zero-error capacity, such as perfect graphs and the pentagon graph $C_{5}$ . Instead, the Körner graph entropy [19] provides a single-letter expression for the unrestricted input setting of Alon and Orlitsky in [16], where the zero-error constraint is satisfied even outside the source’s support, providing an upper bound on the optimal rate.

Figure 2: The source coding problem with decoder side information, called side-information problem.

The difficulty of the zero-error problem comes from the fact that the knowledge of the decoder is not included in the one of the encoder. This is because the side information is available only at the decoder. Indeed, giving the side information also to the encoder allows to implement a zero-error conditional Huffman code [2] of rate $H(X|Y)$ , as in the vanishing error regime [15]. The asymmetry of knowledge poses the same difficulty in zero-error channel coding problem. Note that this difficulty is mitigated when the encoder observes the past channel output symbols, leading to the characterization by Shannon, of the zero-error capacity with feedback in [3].

In this paper we focus our attention on zero-error problems composed of a family of independent problems. First, we consider that the source and side-information decompose into a family of independent random variables $(X_{a},Y_{a})_{a\in\mathcal{A}}$ . This induces a characteristic graph with a specific structure given by the AND product of the graphs of each subproblems $G=\wedge_{a\in\mathcal{A}}G_{a}$ . Such decomposition has two benefits: it reduces the problem complexity and it provides new cases for which the complementary graph entropy is charaterized. In the vanishing error regime, the optimal rate is equal to the sum of the optimal rates in each subproblem, we say that the optimal rate linearizes. By building on Haemers results for the zero-error capacity of the Schläfli graph in [20], we show that independence alone doesn’t ensure the linearization of the complementary graph entropy, which contradicts a standard property in the vanishing error regime. Thus, showing the linearization of the optimal rates becomes crucial.

Our first contribution is to show that the linearization of the complementary graph entropy holds for the AND product of graphs if and only if it holds for the disjoint union of graphs ( $\sqcup$ ), also called “sum of graphs” in [21]. Recently, Wigderson and Zuiddam in [22] and Schrijver in [23] show a similar statement for the zero-error capacity: the linearization holds for the AND product if and only if it holds for the disjoint union. We then explore the consequences of these two statements, for which the characteristic graphs are defined similarly. An important difference is about the probability distribution $P_{X}$ which is specified in the source problem but a priori unspecified in the zero-error channel coding problem.

A natural notion related to both $C_{0}$ and $\overline{H}$ is the zero-error capacity $C(\cdot,P_{X})$ of a graph relative to a distribution $P_{X}$ introduced by Csiszár and Körner in [24]. By taking the maximum over the distribution $P_{X}$ , Gargano et al. [25] showed that it is equal to the zero-error capacity

\displaystyle C_{0}(G)=\max_{P_{X}}C(G,P_{X}).

(1)

Moreover, Marton showed in [26] that the complementary graph entropy satisfies

\displaystyle C(G,P_{X})+\overline{H}(G)=H(X).

(2)

Equations (1) and (2) are the analogues of the channel capacity $C=\max_{P_{X}}I(X;Y)$ and the entropy property $I(X;Y)+H(X|Y)=H(X)$ in the vanishing error regime. The main contribution of the paper is to show that the linearization properties of $C_{0}$ , $C(\cdot,P_{X})$ and $\overline{H}$ for the AND product and for the disjoint union are all equivalent, provided that the source distribution $P_{X}$ maximizes $C(\cdot,P_{X})$ when evaluated with respect to the AND product of graphs.

These linearization properties enlarge the class of problems for which $C_{0}$ and $\overline{H}$ have a single-letter characterization. For perfect graphs, we show that the linearizations of $C_{0}$ and $\overline{H}$ always hold for the AND product and for the disjoint union. As a consequence, we determine new single-letter characterizations for the products of perfect graphs that are not perfect in general, and for the product of a perfect graph $G$ with the pentagon graph $C_{5}$ , by building on the characterization of $\overline{H}(G\sqcup C_{5})$ stated in [21].

A crucial notion is the set of capacity-achieving distributions that contains all the distributions $P_{X}$ for which $C_{0}=C(\cdot,P_{X})$ . We show that the uniform distribution is capacity-achieving when the graph is vertex-transitive, i.e. when all vertices play the same role within the graph. Since, the Schlälfi graph $S$ and its complementary graph $\overline{S}$ are vertex-transitive, so as their product $S\wedge\overline{S}$ , the uniform distributions are capacity achieving for $S$ , $\overline{S}$ and $S\wedge\overline{S}$ . Together with Haemers result [20], this shows a counterexample where linearizations of $C_{0}$ , $C(\cdot,P_{X})$ and $\overline{H}$ for the AND product and the disjoint union of $S$ and $\overline{S}$ do not hold.

In Sec. II, we study the linearization of the complementary graph entropy $\overline{H}$ for the source problem with side information. The connection with the linearization of the zero-error capacity $C_{0}$ is investigated in Sec. III. New single-letter characterizations for $C_{0}$ and $\overline{H}$ are provided in Sec. IV, as well as the counter-example of linearization based on the Schläfli graph.

II Zero-Error Source Coding With Decoder Side Information

II-A Problem Statement and Results from the Literature

The zero-error source coding problem with decoder side information is depicted in Fig. 2. It corresponds to a situation in data compression where the decoder has side-information $Y$ about the source $X$ that has to be retrieved. This problem was formulated by Slepian and Wolf in [15] in the vanishing error regime and by Witsenhausen in [14] for the zero-error variant. We call this the side-information problem.

More formally, we assume that a sequence i.i.d. random variables $(X^{n},Y^{n})$ of length $n\in\mathbb{N}^{\star}$ is drawn according to $P_{X,Y}\in\Delta(\mathcal{X}\times\mathcal{Y})$ where $\mathcal{X}$ and $\mathcal{Y}$ are finite sets. We consider variable-length source coding, which encompasses the special case of fixed-length source coding. An $(n,\phi_{e},\phi_{d})$ variable-length side-information source code for $X$ and $Y$ consists of

-

an encoder $\phi_{e}:\mathcal{X}^{n}\rightarrow\{0,1\}^{*}$ that assigns to each $x^{n}$ a binary string such that $\operatorname*{Im}\phi_{e}$ is prefix-free,
-

a decoder $\phi_{d}:\mathcal{Y}^{n}\times\{0,1\}^{*}\rightarrow\mathcal{X}^{n}$ that assigns an estimate $\widehat{x}^{n}$ to each pair $(y^{n},\phi_{e}(x^{n}))$ .

The rate of the $(n,\phi_{e},\phi_{d})$ -code is the average length of the codeword per source symbol, i.e. $R\doteq\frac{1}{n}\mathbb{E}[\ell\circ\phi_{e}(X^{n})]$ , and the probability of error is $P_{e}^{(n)}\doteq\mathbb{P}\big{(}\widehat{X}^{n}\neq X^{n}\big{)}$ .

Definition 1.

The optimal rate in the vanishing error regime is the minimal rate among all codes that satisfy the $\varepsilon$ -error constraint with $\varepsilon\rightarrow 0$ :

\displaystyle R^{\star}\doteq\lim_{\displaystyle\varepsilon\rightarrow 0}\quad% \inf\limits_{\displaystyle(n,\phi_{e},\phi_{d}):P_{e}^{(n)}\leq\varepsilon}% \quad\frac{1}{n}\mathbb{E}[\ell\circ\phi_{e}(X^{n})].

(3)

The optimal rate in the zero-error regime is the minimal rate among all coding schemes that satisfy the zero-error constraint:

\displaystyle R^{\star}_{0}\doteq\inf\limits_{\displaystyle(n,\phi_{e},\phi_{d% }):P_{e}^{(n)}=0}\quad\frac{1}{n}\mathbb{E}[\ell\circ\phi_{e}(X^{n})].

(4)

When the side-information $Y$ is available at both encoder and decoder, the optimal rates in the vanishing and zero-error regimes are equal to $H(X|Y)$ . The zero-error coding construction relies on a conditional Huffman coding [2]. In the side-information problem, the encoder does not observes the side-information $Y$ . This asymmetry of information has a consequence: the optimal rates in the vanishing and zero-error regimes are distinct.

Theorem 1 (from [15, Theorem 2]).

The optimal rate in the vanishing error regime is

\displaystyle R^{\star}=H(X|Y).

(5)

The nature of the problem changes when considering an error probability equal to zero, instead of a vanishing error probability. In the zero-error regime, the characterisation of the optimal rate $R^{\star}_{0}$ is a notoriously difficult open problem of combinatorial nature. The key features of the side-information problem are captured by the “characteristic graph” introduced by Witsenhausen in [14], which we review below.

Definition 2 (Characteristic graph).

Let $\mathcal{X},\mathcal{Y}$ be two finite sets and $P_{Y|X}$ be a conditional distribution. The characteristic graph $G=(\mathcal{X},\mathcal{E})$ associated to $P_{Y|X}$ is defined by:

-

$\mathcal{X}$ as set of vertices,
-

$x,x^{\prime}\in\mathcal{X}$ are adjacent $xx^{\prime}\in\mathcal{E}$ , if $P_{Y|X}(y|x)P_{Y|X}(y|x^{\prime})>0$ for some $y\in\mathcal{Y}$ .

A characteristic graph is a probabilistic graph $G=(\mathcal{X},\mathcal{E},P_{X})$ , when it has the underlying distribution $P_{X}$ on its vertices.

The meaning of the characteristic graph is that, when the side information $y$ does not allow to distinguish exactly between the source realizations $x$ and $x^{\prime}$ , then $x$ and $x^{\prime}$ are adjacent, and must be mapped to different codewords. Therefore, a zero-error encoding is a graph coloring for which adjacent vertices are mapped to different colors.

Definition 3 (Coloring, chromatic number $\chi$ ).

Let $G=(\mathcal{X},\mathcal{E})$ be a graph. A map** $c:\mathcal{X}\rightarrow\mathcal{C}$ is a coloring if for all adjacent vertices $x$ , $x^{\prime}$ with $xx^{\prime}\in\mathcal{E}$ , we have $c(x)\neq c(x^{\prime})$ . The chromatic number $\chi(G)$ is the smallest $|\mathcal{C}|$ such that there exists a coloring $c:\mathcal{X}\rightarrow\mathcal{C}$ of $G$ .

For sequences of symbols with underlying distribution $P^{n}_{Y|X}(y^{n}|x^{n})=\prod_{t=1}^{n}P_{Y|X}(y_{t}|x_{t})$ , two sequences of source inputs $x^{n},x^{\prime n}$ are adjacent in the graph if $P^{n}_{Y|X}(y^{n}|x^{n})P^{n}_{Y|X}(y^{n}|x^{\prime n})>0$ for some sequence of channel outputs $y^{n}$ , i.e. if and only if either $x_{t}=x^{\prime}_{t}$ or $x_{t}x_{t}^{\prime}\in\mathcal{E},\forall\ 1\leq t\leq n$ . This implies that for sequences of symbols, the characteristic graph is built by using the AND product of graphs, denoted by $\wedge$ , and also called “strong product” or “normal product” in [27, 26], and defined below.

Definition 4 (AND product $\wedge$ ).

Let $G_{1}=(\mathcal{X}_{1},\mathcal{E}_{1},P_{X_{1}})$ , $G_{2}=(\mathcal{X}_{2},\mathcal{E}_{2},P_{X_{2}})$ be two probabilistic graphs, their AND product $G_{1}\wedge G_{2}$ is a probabilistic graph defined by:

-

$\mathcal{X}_{1}\times\mathcal{X}_{2}$ as set of vertices,
-

$(x_{1}x_{2}),(x^{\prime}_{1}x^{\prime}_{2})$ are adjacent if $x_{1}x^{\prime}_{1}\in\mathcal{E}_{1}$ and $x_{2}x^{\prime}_{2}\in\mathcal{E}_{2}$ , with the convention of self-adjacency for all vertices.
-

$P_{X_{1}}\otimes P_{X_{2}}$ as probability distribution on the vertices.

We denote by $G_{1}^{\wedge n}$ the $n$ -th AND power: $G_{1}^{\wedge n}=G_{1}\wedge...\wedge G_{1}$ $n$ times.

Unlike the vanishing error regime, there is no single-letter characterization of the optimal rate in the zero-error regime. We present two different asymptotic expressions which rely on codebooks composed of codewords that form a coloring of the AND product of the characteristic graph $G^{\wedge n}$ .

Alon and Orlitsky introduced an asymptotic expression in [16], for the optimal rate in the “restricted inputs” setting. The optimal rate $R_{0}^{\star}$ relies on the notion of chromatic entropy $H_{\chi}(G^{\wedge n})$ , which is the minimal entropy of a coloring of $G^{\wedge n}$ .

Definition 5 (Chromatic entropy $H_{\chi}$ ).

The chromatic entropy of the probabilistic graph $G=(\mathcal{X},\mathcal{E},P_{X})$ is defined by

\displaystyle H_{\chi}(G)=\inf\Big{\{}H(c(X))\Big{|}c\text{ is a coloring of }% G\Big{\}}.

(6)

Theorem 2 (from [16, Lemma 6]).

For all probabilistic graph $G=(\mathcal{X},\mathcal{E},P_{X})$ ,

\displaystyle R^{\star}_{0}=\lim_{n\rightarrow\infty}\frac{1}{n}H_{\chi}(G^{% \wedge n}).

(7)

Even though there is no single-letter expression for $R^{\star}_{0}$ , Alon and Orlitsky provided a single-letter upper bound in [16] by adding the constraint called “unrestricted inputs”. This constraint requires the zero-error property to be satisfied even for the sequences of symbols $(X^{n},Y^{n})$ that take values out of the support of $P^{n}_{X,Y}$ .

Figure 3: The pentagon graphs

C_{5}

with uniform distribution

P_{X}=\operatorname*{Unif}\big{(}\{1,...,5\}\big{)}

over the vertices.

With high probability, the source sequence $X^{n}$ is typical with respect to $P_{X}$ . Let $G^{\wedge n}[\mathcal{T}^{n}_{\varepsilon}(P_{X})]$ be the subgraph of $G^{\wedge n}$ induced by the set of typical sequences $\mathcal{T}^{n}_{\varepsilon}(P_{X})$ with tolerance $\varepsilon>0$ , see [7, Definition 2.8]. The zero-error code consists of a coloring of this induced subgraph $G^{\wedge n}[\mathcal{T}^{n}_{\varepsilon}(P_{X})]$ with a minimum number of colors $\chi\big{(}G^{\wedge n}[\mathcal{T}^{n}_{\varepsilon}(P_{X})]\big{)}$ , where $\chi$ denotes the chromatic number of the graph. The encoder sends the color index to the decoder if $X^{n}$ is typical, otherwise it sends the index of the sequence $X^{n}$ in $\mathcal{X}^{n}$ . This coding strategy has a rate upper-bounded by

\displaystyle\frac{1}{n}+\mathbb{P}\big{(}X^{n}\notin\mathcal{T}^{n}_{% \varepsilon}(P_{X})\big{)}\log|\mathcal{X}|+\frac{1}{n}\log\chi\big{(}G^{% \wedge n}[\mathcal{T}^{n}_{\varepsilon}(P_{X})]\big{)}.

(8)

The zero-error property is satisfied since the decoder is able to retrieve $X^{n}$ thanks to $Y^{n}$ and the color symbol. Koulgi et al. have shown in [17, Theorem 1] that taking the limit when $n$ goes to infinity and $\varepsilon$ goes to $0$ yields the best achievable rate in the zero-error side-information problem. This quantity, introduced by Körner and Longo in [18], is called the complementary graph entropy.

Definition 6.

For all probabilistic graph $G=(\mathcal{X},\mathcal{E},P_{X})$ , the complementary graph entropy $\overline{H}(G)$ is defined by:

\displaystyle\overline{H}(G)=\lim_{\varepsilon\rightarrow 0}\limsup_{n% \rightarrow\infty}\frac{1}{n}\log\chi\big{(}G^{\wedge n}[\mathcal{T}^{n}_{% \varepsilon}(P_{X})]\big{)}.

(9)

Theorem 3 (from [17, Theorem 1]).

The optimal rate in the zero-error regime writes

\displaystyle R^{\star}_{0}=\overline{H}(G),

(10)

where $G$ is the probabilistic graph formed of the characteristic graph associated to the distribution $P_{Y|X}$ , with the underlying distribution $P_{X}$ on its vertices.

A trivial single-letter upper bound is given by $H(X)$ where the zero-error coding construction relies on Huffman coding and the decoder ignores the side information $Y$ . In fact, this upper bound is tight for a dense subset of distributions in $\Delta(\mathcal{X}\times\mathcal{Y})$ .

Proposition 1 (Full support, from [14]).

If the distribution $P_{X,Y}$ has full support, then $R^{\star}_{0}=H(X)$ .

Proof.

Since the distribution $P_{X,Y}$ has full support, the characteristic graph $G$ is complete, i.e. every pair of symbols $x\in\mathcal{X}$ , $x^{\prime}\in\mathcal{X}$ are adjacent in $G$ , thus $H_{\chi}(G)=H(X)$ , which concludes the proof of Prop. 1. ∎

There are a few other cases where the optimal zero-error rate is known such as perfect graphs, or the pentagon $C_{5}$ with uniform distribution shown in Fig. 3 where $R^{\star}_{0}=\frac{1}{2}\log_{2}(5)$ . In general, the single-letter characterization of $R^{\star}_{0}$ remains a difficult open question.

II-B Independent Zero-Error Side-Information problems

In order to understand the problem’s difficulty, we examine a specific scenario where the source and side information decompose into independent variables. In the vanishing error regime, independence is a key assumption that induces the linearization of optimal rates, shedding light on practical coding techniques. In the zero-error regime, the independence hypothesis alone is insufficient for linearization of optimal rates. We specify a hypothesis that ensures linearization, enabling us to enlarge the set of problems for which the optimal rate has a single-letter characterization.

Figure 4: Independent side-information problems

More formally, for a finite set $\mathcal{A}$ , we assume a set of pairs $(X_{a},Y_{a})_{a\in\mathcal{A}}$ , referred to as an independent family, that consists of $|\mathcal{A}|$ pairs with joint distribution that decomposes as a product of distributions. This independent family generates sequences of i.i.d. random variables,

\displaystyle\big{(}X_{1}^{n},Y_{1}^{n},\ldots,X_{|\mathcal{A}|}^{n},Y_{|% \mathcal{A}|}^{n}\big{)}\sim\Big{(}P_{X_{1},Y_{1}}\otimes\ldots\otimes P_{X_{|% \mathcal{A}|},Y_{|\mathcal{A}|}}\Big{)}^{\otimes n}.

(11)

Independent side-information problems correspond to a side-information problem in which the source and the side information are an independent family, as shown in Fig. 4. In the vanishing error regime, the optimal rate linearizes:

\displaystyle R^{\star}=H\big{(}X_{1},\ldots,X_{|\mathcal{A}|}\big{|}Y_{1},% \ldots,Y_{|\mathcal{A}|}\big{)}=\sum_{a\in\mathcal{A}}H(X_{a}|Y_{a})=\sum_{a% \in\mathcal{A}}R_{a}^{\star}.

(12)

This property is fundamental because it means that the construction of an optimal codebook results from the concatenation of the codewords of the optimal codebooks for each subproblem.

But does linearization also hold in the zero-error regime for independent side-information problems? To answer this question, we first derive an asymptotic expression for the optimal zero-error rate. This derivation follows from the fact that the independent family can be characterized by a product of graphs.

Proposition 2.

Let $(X_{a},Y_{a})_{a\in\mathcal{A}}$ be an independent family. The optimal rate for the independent zero-error side-information problems is

\displaystyle R^{\star}_{0}=\overline{H}\left(\bigwedge_{a\in\mathcal{A}}G_{a}% \right),

(13)

where for all $a\in\mathcal{A}$ , $G_{a}$ is the characteristic graph associated to the conditional distribution $P_{Y_{a}|X_{a}}$ , with the underlying probability distribution $P_{X_{a}}$ on its vertices.

It is known that the complementary graph entropy $\overline{H}(\cdot)$ is sublinear with respect to the AND product. Indeed, [21, Theorem 2] states that for all probabilistic graphs $G$ and $G^{\prime}$

\displaystyle\overline{H}(G\wedge G^{\prime})\leq

\displaystyle\,\overline{H}(G)+\overline{H}(G^{\prime}).

(14)

However, $\overline{H}(\cdot)$ does not linearize in general. Inspired by Haemers result [20], we show in Theorem 19, that the inequality (14) is strict for the Schläfli graph $S$ and its complement $\overline{S}$ . In the following, we study a condition that allows for the linearization of $\overline{H}(\cdot)$ , i.e. where (14) holds with equality. To do this, we introduce the disjoint union of graphs, also called “sum of graphs” in [21].

Definition 7 (Disjoint union of probabilistic graphs $\sqcup$ ).

Let $\mathcal{A}$ be a finite set, let $P_{A}\in\Delta(\mathcal{A})$ , and let $G_{a}=(\mathcal{X}_{a},\mathcal{E}_{a},P_{X_{a}})$ be probabilistic graphs, for all $a\in\mathcal{A}$ . The disjoint union with respect to $P_{A}$ is a probabilistic graph $(\mathcal{X},\mathcal{E},P_{X})$ denoted by $\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_{a}$ and defined by:

-

$\mathcal{X}=\bigsqcup_{a\in\mathcal{A}}\mathcal{X}_{a}$ is the disjoint union of the sets $(\mathcal{X}_{a})_{a\in\mathcal{A}}$ ;
-

For all $x,x^{\prime}\in\mathcal{X}$ , $xx^{\prime}\in\mathcal{E}$ if and only if they both belong to the same $\mathcal{X}_{a}$ and $xx^{\prime}\in\mathcal{E}_{a}$ ;
-

$P_{X}=\sum_{a\in\mathcal{A}}P_{A}(a)P_{X_{a}}$ , note that the $(P_{X_{a}})_{a\in\mathcal{A}}$ have disjoint support in $\mathcal{X}$ .

The disjoint union of graphs without probability distribution has the vertex set and edges defined above, without underlying probability distribution.

An example of an AND product and a disjoint union of probabilistc graphs is shown in Fig. 5. Note that, as with the AND product, the complementary graph entropy $\overline{H}(\cdot)$ is sublinear with respect to the disjoint union. Indeed, [21, Theorem 2] states that for all probabilistic graphs $G$ and $G^{\prime}$ and $s\in[0,1]$ ,

\displaystyle\overline{H}(G\overset{(s,1-s)}{\sqcup}G^{\prime})\leq

\displaystyle\,s\overline{H}(G)+(1-s)\overline{H}(G).

(15)

Figure 5: An empty graph

G_{1}=(N_{3},(\frac{1}{4},\frac{1}{2},\frac{1}{4}))

and a complete graph

G_{2}=(K_{2},(\frac{1}{3},\frac{2}{3}))

, along with their AND product

G_{1}\wedge G_{2}

and their disjoint union

G_{1}\sqcup G_{2}

with respect to

(\frac{1}{4},\frac{3}{4})

. The underlying distributions are represented by the numbers on each vertex.

We present our first main result, which is that the linearization of the complementary graph entropy with respect to the AND product holds if and only if the linearization holds with respect to the disjoint union. An important consequence of this linearization property is that the concatenation of optimal codes for the subproblems is optimal.

Theorem 4 (Equivalence of the linearization of $\wedge$ and $\sqcup$ ).

Let $\mathcal{A}$ be a finite set, $P_{A}$ a distribution with full-support, and let $(G_{a})_{a\in\mathcal{A}}=(\mathcal{X}_{a},\mathcal{E}_{a},P_{X_{a}})_{a\in% \mathcal{A}}$ a family of probabilistic graphs. The following equivalence holds:

		$\displaystyle\overline{H}\left(\bigwedge_{a\in\mathcal{A}}G_{a}\right)=\sum_{a% \in\mathcal{A}}\overline{H}(G_{a}),$		(16)
	$\displaystyle\Longleftrightarrow\quad$	$\displaystyle\overline{H}\left(\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_{a}\right)% =\sum_{a\in\mathcal{A}}P_{A}(a)\overline{H}(G_{a}).$		(17)

We say that the linearization property holds if (16) or (17) are satisfied.

An important consequence of this theorem is that it allows the single-letter characterization of optimal rates for new sources, as discussed in Sec. IV. For example, it characterizes the optimal rate for the product of perfect graphs, which is not perfect in general. This is a consequence of the fact that the disjoint product of perfect graphs is perfect, therefore the optimal rates linearize, and from our theorem, the optimal rate for the AND product of perfect graphs is equal to the sum of the rates of each graph.

Without loss of generality, we consider that $\mathcal{A}$ is the support of $P_{A}$ . We observe that (16) does not depend on the distribution $P_{A}$ , therefore if (17) holds for a distribution $P_{A}$ with full support, then it holds for all distributions with full support. This remark, along with the continuity of the function $P_{A}\mapsto\overline{H}(\overset{P_{A}}{\sqcup}\cdot)$ stated in Lemma 2 below, allows us to state the following corollary.

Corollary 1.

If the linearization properties (16) or (17) hold for a family of probabilistic graphs $(G_{a})_{a\in\mathcal{A}}=(\mathcal{X}_{a},\mathcal{E}_{a},P_{X_{a}})_{a\in% \mathcal{A}}$ , then (16) and (17) also hold for any subfamily of probabilistic graphs $(G_{\tilde{a}})_{\tilde{a}\in\widetilde{\mathcal{A}}}$ with $\widetilde{\mathcal{A}}\subset\mathcal{A}$ .

II-C Sketch of Proof of Theorem 4

The proof of Theorem 4 is stated in App. A. It relies on Lemma 1 which follows from the definition of $\overline{H}$ that involves the AND power of graph, and from the distributivity of $\wedge$ with respect to $\sqcup$ .

Lemma 1.

\displaystyle\overline{H}\left(\bigsqcup_{a\in\mathcal{A}}^{\operatorname*{% Unif}(\mathcal{A})}G_{a}\right)=\frac{1}{|\mathcal{A}|}\overline{H}\left(% \bigwedge_{a\in\mathcal{A}}G_{a}\right).

(18)

According to Lemma 1, the complementary graph entropies of the AND product $\overline{H}(\wedge\>\cdot)$ and of the disjoint union of graphs $\overline{H}(\sqcup\>\cdot)$ are proportional when $P_{A}$ is the uniform distribution. This shows the equivalence between (17) and (16) when $P_{A}$ is uniform. In order to extend this argument to all distributions $P_{A}\in\Delta(\mathcal{A})$ , we need to show the convexity property of the function

\displaystyle\eta:P_{A}\mapsto\overline{H}\Bigg{(}\bigsqcup_{a\in\mathcal{A}}^% {P_{A}}G_{a}\Bigg{)}.

(19)

Lemma 2.

The function $\eta$ is convex and $(\log\max_{a}|\mathcal{X}_{a}|)$ -Lipschitz.

The proof of Lemma 2 is stated in App. A-B. Assume that the linearization (17) holds for the uniform distribution. The convexity property implies that the function $\eta$ linearizes for all distributions $P_{A}\in\Delta(\mathcal{A})$ . This argument, together with the result of Lemma 1, shows the proof of Theorem 4.

An important precision is that the complementary graph entropies of the AND product $\overline{H}(\wedge\>\cdot)$ and of the disjoint union of graphs $\overline{H}(\sqcup\>\cdot)$ remains proportional on a dense subset of $\Delta(\mathcal{A})$ .

Lemma 3.

Let $P_{A}\in\Delta_{k}(\mathcal{A})$ be a type of a sequence of length $k\in\mathbb{N}^{\star}$ . We have,

\overline{H}\left(\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}\right)=\frac{1}{k}% \overline{H}\left(\bigwedge_{a\in\mathcal{A}}G_{a}^{\wedge kP_{A}(a)}\right).

(20)

The proof of Lemma 3 is stated in App. A-A, note that Lemma 1 is a special case of Lemma 3. The equivalence between (16) and (17) is satisfied for all type $P_{A}\in\Delta_{k}(\mathcal{A})$ of sequence of length $k\in\mathbb{N}^{\star}$ , moreover the function $\eta$ is continuous on $\Delta(\mathcal{A})$ , so this argument also demonstrates the Theorem 4.

II-D Zero-error source coding with side information at the decoder and partial side information at the encoder

In Sec. II-B, we highlighted the significance of the disjoint union of graphs. Indeed, using Theorem 4, we can show the linearization for the disjoint union in order to conclude on the linearization of the AND product. This is simpler due to the fact that the disjoint union has fewer vertices and edges than the AND product. Yet, the usefulness extends further as the disjoint union corresponds to the problem in Fig. 6, where the encoder has partial information about the decoder’s side information, obtained through the deterministic function $g:\mathcal{Y}\rightarrow\mathcal{A}$ . This setting in Fig. 6 is a specific case of Fig. 2, equivalent to a source $(X,g(Y))$ that the decoder must retrieve. We refer to this as the partial-side-information problem.

Figure 6: The partial-side-information problem.

Proposition 3.

When the encoder has partial side information, the optimal rate writes

\displaystyle R^{\star}_{0}=\overline{H}\left(\bigsqcup^{P_{A}}_{a\in\mathcal{% A}}G_{a}\right),

(21)

where for all $a\in\mathcal{A}$ , $G_{a}$ is the characteristic graph associated to the conditional distribution $P_{Y|X,A=a}$ with the underlying probability distribution $P_{X|A=a}$ on its vertices. These conditional distributions and $P_{A}$ are obtained from the joint distribution $P_{XY}\textrm{\dsrom{1}}_{A=g(Y)}$ depending on the deterministic function $g:\mathcal{Y}\rightarrow\mathcal{A}$ .

Indeed, for each realization of the encoder side information $a=g(y)$ , we construct a characteristic graph $G_{a}$ to model the sub-problem indexed by $a\in\mathcal{A}$ . Since, both encoder and decoder have access to $a\in\mathcal{A}$ , the characteristic graph consists in the disjoint union the graphs $(G_{a})_{a\in\mathcal{A}}$ . Moreover, each $G_{a}$ contains all realizations $x\in\mathcal{X}$ , and there is an edge between two vertices $x,x^{\prime}$ if and only if $P_{Y|X,a}(y|x,a)P_{Y|X,a}(y|x^{\prime},a)>0$ for some $y\in g^{-1}(a)$ .

If the disjoint union linearizes, it reveals that it is optimal to implement the following coding scheme:

•

For each symbol $a\in\mathcal{A}$ , select the indices $t\in\{1,\ldots,n\}$ of the sequence $a^{n}$ such that $a_{t}=a$ . Denote by $(X^{n_{a}},Y^{n_{a}})$ the corresponding subsequences extracted from $(X^{n},Y^{n})$ of length $n_{a}$ .
•

For each $a\in\mathcal{A}$ , use the optimal codebook for the independent sources $(X^{n_{a}},Y^{n_{a}})_{a\in\mathcal{A}}$ with distinct length of sequences $(n_{a})_{a\in\mathcal{A}}$ and concatenate the codewords obtained.

With high probability, the sequence $A^{n}$ belongs to the set of typical sequences $\mathcal{T}^{n}_{\varepsilon}(P_{A})$ , therefore the empirical distribution $(\frac{n_{a}}{n})_{a\in\mathcal{A}}$ converges to $P_{A}$ in probability. The coding rate of the above scheme converges to

\displaystyle\sum_{a\in\mathcal{A}}P_{A}(a)\overline{H}(G_{a})=\overline{H}% \left(\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_{a}\right),

(22)

which is optimal for the partial-side-information problem.

III Zero-Error Channel Coding Problem

Recently, Wigderson and Zuiddam in [22] and Schrijver in [23] establish the equivalence between the linearization of the zero-error channel capacity of the AND product $C_{0}(\wedge\>\cdot)$ and of the disjoint union of graphs $C_{0}(\sqcup\>\cdot)$ . The main difference with the side information problem is that the channel input distribution $P_{X}$ is a priory not specified in the zero-error channel coding problem.

Figure 7: Equivalences of linearization properties between the zero-error capacity

C_{0}

, the zero-error capacity relative to a distribution

C(\cdot,P_{X})

, and the complementary graph entropy

\overline{H}

. Our contributions are represented in the dashed rectangles.

In this section, we connect these two zero-error problems by exploring the properties of the zero-error capacity $C(G,P_{X})$ of a graph $G=(\mathcal{X},\mathcal{E})$ relative to a distribution $P_{X}\in\Delta(\mathcal{X})$ , introduced by Csiszár and Körner [24]. We show the equivalence of the linearization properties of [22, Theorem 4.1] and [23, Theorem 2], and of the linearization properties of Theorem 4, provided that the source distribution maximizes the zero-error capacity relative to a distribution of the AND product of graphs. The path taken to demonstrate these equivalences is shown in Fig. 7.

III-A Zero-error channel capacity

The channel coding problem in Fig. 8 is introduced in [1] in the vanishing error regime, and in [3] in the zero-error regime. We consider a Discrete Memoryless Channel (DMC) that consists of an input alphabet $\mathcal{X}$ , a finite output alphabet $\mathcal{Y}$ and a conditional distribution (a.k.a. transition probability) $P_{Y|X}$ . A $(n,\mathcal{C}_{n},\phi_{d})$ -code consists of

-

an encoder that selects uniformly a codeword $x^{n}$ from the codebook $\mathcal{C}_{n}\subseteq\mathcal{X}^{n}$ , and sends it over the DMC,
-

a decoder $\phi_{d}$ that assigns an estimate $\widehat{x}^{n}$ to each received $y^{n}$ .

The rate of the $(n,\mathcal{C}_{n},\phi_{d})$ -code is the average number of messages transmitted per channel use, i.e. $\frac{1}{n}\log|\mathcal{C}_{n}|$ , and the probability of error is $P_{e}^{(n)}\doteq\mathbb{P}\big{(}\widehat{X}^{n}\neq X^{n}\big{)}$ .

Figure 8: The channel coding problem.

Definition 8.

The channel capacity is the maximal rate among all codes that satisfy the ${\varepsilon}$ -error property, with $\varepsilon\rightarrow 0$ :

\displaystyle C^{\star}\doteq\lim_{\displaystyle\varepsilon\rightarrow 0}\quad% \,\sup\limits_{\displaystyle(n,\mathcal{C}_{n},\phi_{d}):P_{e}^{(n)}\leq% \varepsilon}\quad\frac{1}{n}\log|\mathcal{C}_{n}|.

(23)

The zero-error channel capacity is the maximal rate among all coding schemes that satisfy the zero-error property:

\displaystyle C_{0}^{\star}\doteq\sup\limits_{\displaystyle(n,\mathcal{C}_{n},% \phi_{d}):P_{e}^{(n)}=0}\quad\frac{1}{n}\log|\mathcal{C}_{n}|.

(24)

Theorem 5 (from [1]).

The channel capacity (in the vanishing error regime) is

\displaystyle C^{\star}=C\doteq\max_{P_{X}\in\Delta(\mathcal{X})}I(X;Y).

(25)

In the zero-error regime, the capacity depends on a characteristic graph and its independence number, defined below.

Definition 9 (Independent subset, independence number $\alpha$ ).

Let $G=(\mathcal{X},\mathcal{E})$ be a graph. A subset $\mathcal{S}\subseteq\mathcal{X}$ is independent in $G$ if $xx^{\prime}\notin\mathcal{E}$ for all $x,x^{\prime}\in\mathcal{S}$ . The independence number is the maximal size of an independent set in $G$ , and is denoted by $\alpha(G)$ .

Theorem 6 (from [3]).

Let $G$ be the characteristic graph corresponding to the DMC $(\mathcal{X},P_{Y|X},\mathcal{Y})$ . The zero-error channel capacity satisfies

\displaystyle C_{0}^{\star}=C_{0}(G)\doteq\lim_{n\rightarrow\infty}\frac{1}{n}% \log\alpha(G^{\wedge n}).

(26)

Remark 1.

Note that, by convention, we define the zero-error capacity with the logarithm. Another existing convention (for example in [27]) for the zero-error capacity is $\Theta(G)\doteq\lim_{n\rightarrow\infty}\sqrt[n]{\alpha(G^{\wedge n})}$ , which is equivalent in the sense that $C_{0}=\log\Theta$ .

We present in Sec. IV-A some examples from the literature where $C_{0}(G)$ is known, in particular when $G$ is a perfect graph. The Lovász $\theta$ function, introduced in [27], is an upper bound on the zero-error capacity. This function is used to show that $C_{0}(C_{5})=\frac{1}{2}\log 5$ , which makes $C_{5}$ the minimally non-perfect graph for which $C_{0}$ is known. Further observations on the $\theta$ function are derived by Sason in [28]. The zero-error capacity of $C_{7}$ is still unknown. Several existing lower bounds on $C_{0}(C_{7})$ were found via computer programming, in particular in [29], [30] and [31].

III-B Independent zero-error channel coding problems

To understand why no single-letter exists for the zero-error channel capacity, we study, similarly to Sec. II-B, the case where the channel transition probability decomposes as a product $\bigotimes_{a\in\mathcal{A}}P_{Y_{a}|X_{a}}$ , as depicted in Fig. 9. This is called the independent channel coding problems.

Figure 9: Independent channel coding problems: the information is transmitted via

|\mathcal{A}|

parallel channels

(P_{Y_{a}|X_{a}})_{a\in\mathcal{A}}

Proposition 4 (from [3]).

The zero-error capacity of independent channels $(P_{Y_{a}|X_{a}})_{a\in\mathcal{A}}$ is given by

\displaystyle C_{0}\left(\bigwedge_{a\in\mathcal{A}}G_{a}\right),

(27)

where for all $a\in\mathcal{A}$ , $G_{a}$ is the characteristic graph associated to the conditional distribution $P_{Y_{a}|X_{a}}$ .

In the vanishing error regime, the capacity of independent channels linearizes since

\displaystyle C=\max_{P_{X_{1},\ldots,X_{|\mathcal{A}|}}}I\big{(}X_{1},\ldots,% X_{|\mathcal{A}|};Y_{1},\ldots,Y_{|\mathcal{A}|}\big{)}=\sum_{a\in\mathcal{A}}% \max_{P_{X_{a}}}I(X_{a};Y_{a})=\sum_{a\in\mathcal{A}}C_{a}.

(28)

Therefore, it is optimal to concatenate the optimal codebooks designed for each channel $P_{Y_{a}|X_{a}}$ .

In the zero-error regime, the capacity is super-linear as shown by Shannon in [3, Theorem 4],

\displaystyle C_{0}(G)+C_{0}(G^{\prime})\leq

\displaystyle\,C_{0}(G\wedge G^{\prime}).

(29)

Haemers shows in [20] that the inequality (29) is strict for the product of the Schläfli graph $S$ and its complementary graph $\overline{S}$ , as stated in Theorem 7. An explicit construction of the Schläfli graph is provided in [32, Sec. 6.1]. As we will see in Sec. IV-C, Haemers’s result relies on a bound on the zero-error capacity based on the rank of the adjacency matrix of the graph. Refinements of this bound are developed by Bukh and Cox in [33], and by Gao et al. in [34].

Theorem 7 (from [20]).

Let $S$ be the Schläfli graph and $\overline{S}$ its complementary graph, then

\displaystyle C_{0}(S)+C_{0}(\overline{S})<C_{0}(S\wedge\overline{S}).

(30)

Since the linearization property does not hold in general, Wigderson and Zuiddam in [22] and Schrijver in [23], recently established a condition under which capacity linearization holds.

Theorem 8 (from [22, Theorem 4.1] and [23, Theorem 2]).

For all graphs $G,G^{\prime}$ ,

	$\displaystyle C_{0}(G)+C_{0}(G^{\prime})=$	$\displaystyle\,C_{0}(G\wedge G^{\prime}),$		(31)
	$\displaystyle\Longleftrightarrow\quad\log\left(2^{C_{0}(G)}+2^{C_{0}(G^{\prime% })}\right)=$	$\displaystyle\,C_{0}(G\sqcup G^{\prime}).$		(32)

We establish the connection between the linearization properties of Theorem 4 and Theorem 8, by using the zero-error capacity $C(\cdot,P_{X})$ of a graph relative to a distribution $P_{X}$ , introduced by Csiszár and Körner in [24].

III-C Zero-error capacity $C(\cdot,P_{X})$ of a graph relative to a distribution $P_{X}$ and equivalence of the linearizations of $C(\cdot,P_{X})$ and $\overline{H}$

Theorems 4 and 8 both establish the equivalence of linearizations for the AND product and disjoint union of graphs but for different quantities: $\overline{H}$ and $C_{0}$ . We will show the equivalence between these two quantities and then deduce the equivalence between the two linearizations. To do this, we study the zero-error capacity $C(\cdot,P_{X})$ of a graph relative to a distribution $P_{X}\in\Delta(\mathcal{X})$ .

Definition 10.

A sequence of codes $(\mathcal{C}_{n})_{n}$ is said to be typical with respect to $P_{X}$ , or in short $P_{X}$ -typical, if

\displaystyle\max_{x^{n}\in\mathcal{C}_{n}}\|T_{x^{n}}-P_{X}\|_{\infty}% \underset{n\rightarrow\infty}{\rightarrow}0.

(33)

The zero-error channel capacity relative to the input distribution $P_{X}$ is the maximal rate among all $P_{X}$ -typical sequence of codes that satisfy the zero-error property:

\displaystyle C^{\star}(P_{X})\doteq\sup\limits_{\displaystyle(n,\mathcal{C}_{% n},\phi_{d}):(\mathcal{C}_{n})_{n}\ P_{X}\text{\emph{-typical}},P_{e}^{(n)}=0}% \quad\frac{1}{n}\log|\mathcal{C}_{n}|.

(34)

Csiszár and Körner [24, Eq. (3.2)] derive an asymptotic expression of this capacity, which we review below.

Lemma 4 (from [24]).

The zero-error capacity of the graph $G=(\mathcal{X},\mathcal{E})$ relative to $P_{X}$ is

\displaystyle C^{\star}(P_{X})=C(G,P_{X})\doteq\lim_{\varepsilon\rightarrow 0}% \limsup_{n\rightarrow\infty}\frac{1}{n}\log\alpha\big{(}G^{\wedge n}[\mathcal{% T}^{n}_{\varepsilon}(P_{X})]\big{)},

(35)

where $G^{\wedge n}[\mathcal{T}^{n}_{\varepsilon}(P_{X})]$ is the subgraph of $G^{\wedge n}$ induced by the set of typical sequences $\mathcal{T}^{n}_{\varepsilon}(P_{X})$ with tolerance $\varepsilon>0$ .

The superior limit in (35) can be replaced by a regular limit, thanks to the superadditivity of the sequence
$\big{(}\frac{1}{n}\log\alpha\big{(}G^{\wedge n}[\mathcal{T}^{n}_{\varepsilon}(% P_{X})]\big{)}\big{)}_{n\in\mathbb{N}^{\star}}$ .

In [26, Lemma 1], Marton established the connection between the complementary graph entropy $\overline{H}$ and $C(\cdot,P_{X})$ .

Theorem 9 (from [26, Lemma 1]).

Given the graph $G=(\mathcal{X},\mathcal{E})$ with the probability distribution $P_{X}$ ,

\displaystyle C(G,P_{X})+\overline{H}(G)=H(X).

(36)

We can interpret the formula in Theorem 9 in the following way. The quantities $\overline{H}(G)$ and $C(G,P_{X})$ are respectively the minimum number of colors, and the maximum size of an independent set. A color class, i.e. vertices of the same color, is an independent subset of vertices: in the case with same-sized color classes we would need $n\log\alpha(G)$ bits to describe the source sequence in its color. Therefore, $C(G,P_{X})$ can be seen as the average number of bits needed to describe the index of the source sequence in its color class. These two quantities sum up to $H(X)$ , which is the information needed to describe the source sequence with zero-error. Equation (36) can be seen as an analog for zero-error regime of the formula $I(X;Y)+H(X|Y)=H(X)$ .

We establish below the connection between the linearization properties of $\overline{H}$ and of $C(\cdot,P_{X})$ , where the equivalences in (38) and (40) follow from Marton’s formula in Theorem 9, and the equivalence (39) from Theorem 4. The complete proof is in App. B.

Proposition 5.

	$\displaystyle C\left(\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_{a},\;\sum_{a\in% \mathcal{A}}P_{A}(a)P_{X_{a}}\right)=H(P_{A})+\sum_{a\in\mathcal{A}}P_{A}(a)C(% G_{a},P_{X_{a}})$	(37)
$\displaystyle\Longleftrightarrow\;$	$\displaystyle\overline{H}\left(\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_{a}\right)% =\sum_{a\in\mathcal{A}}P_{A}(a)\overline{H}(G_{a})$	(38)
$\displaystyle\Longleftrightarrow\;$	$\displaystyle\overline{H}\left(\bigwedge_{a\in\mathcal{A}}G_{a}\right)=\sum_{a% \in\mathcal{A}}\overline{H}(G_{a})$	(39)
$\displaystyle\Longleftrightarrow\;$	$\displaystyle C\left(\bigwedge_{a\in\mathcal{A}}G_{a},\;\bigotimes_{a\in% \mathcal{A}}P_{X_{a}}\right)=\sum_{a\in\mathcal{A}}C(G_{a},P_{X_{a}}).$	(40)

III-D Capacity achieving distributions and equivalence of the linearizations of $C_{0}$ and $\overline{H}$ for the AND product

From the previous equivalences, we present our second main contribution which shows the equivalence between the linearization properties of $\overline{H}$ and $C_{0}$ . A key element is the set of input distributions $P_{X}$ that achieve the zero-error capacity in the sense that $C_{0}(G)=C(G,P_{X})$ . As in the vanishing error regime, it seems optimal to consider codebooks composed of codewords that are typical with respect to the input distribution $P_{X}$ that maximizes $C(G,P_{X})$ .

Theorem 10 (from [25, Theorem 2]).

For all graph $G=(\mathcal{X},\mathcal{E})$ ,

\displaystyle C_{0}(G)=\max_{P_{X}\in\Delta(\mathcal{X})}C(G,P_{X}).

(41)

The result of [25, Theorem 2], see also [7, Theorem 11.22], is stated for the Sperner capacity of a family of directed graphs, while the statement of [35, Theorem 13.68] is specific to the zero-error capacity of a family of graphs. For the sake of completeness, in App. C we provide a proof for Theorem 10 that does not rely on directed graphs. As a consequence of [26, Lemma 1] and of [25, Theorem 2], the zero-error capacity reformulates

\displaystyle C_{0}(G)=\max_{P_{X}\in\Delta(\mathcal{X})}\Big{(}H(X)-\overline% {H}(G)\Big{)}.

(42)

In order to show the equivalence of the linearization properties between $C_{0}$ and $C(\cdot,P_{X})$ , we define the set of capacity-achieving distributions.

Definition 11.

Let $G=(\mathcal{X},\mathcal{E})$ be a graph. The set of capacity-achieving distributions of $G$ is the subset of $\Delta(\mathcal{X})$ defined by

\displaystyle\mathcal{P}^{\star}(G)\doteq\operatorname*{\arg\!\max}_{P_{X}\in% \Delta(\mathcal{X})}\;C(G,P_{X}).

(43)

Proposition 6.

For all graphs $G$ , the map** $P_{X}\mapsto C(G,P_{X})$ is concave and the set of capacity-achieving distributions $\mathcal{P}^{\star}(G)$ is convex, nonempty.

The proof of Proposition 6 is stated in App. D, and relies on Theorem 10.

The following Theorem is essential for demonstrating the equivalence of the linearizations depicted in Fig. 7. It establishes that if a joint distribution achieves capacity, then the product of its marginals also achieves it.

Theorem 11.

If $P_{X_{1},...,X_{|\mathcal{A}|}}\in\mathcal{P}^{\star}(\bigwedge_{a\in\mathcal{% A}}G_{a})$ , then $\bigotimes_{a\in\mathcal{A}}P_{X_{a}}\in\mathcal{P}^{\star}(\bigwedge_{a\in% \mathcal{A}}G_{a})$ .

The proof of Theorem 11 is stated in App. E and relies on a codebook shifting argument: given a codebook composed of codewords $(x_{1}^{n},x_{2}^{n})$ that are typical with respect to the joint distribution $P_{X_{1},X_{2}}$ , we construct a set of permuted codebooks by applying a cyclic permutation only to the first component $x_{1}^{n}$ of each codeword. We concatenate all the permuted codebooks and we replicate them $n$ times so that the codewords length is equal to $n^{\prime}=n^{3}$ . Then, we remove the codewords $(x_{1}^{n^{\prime}},x_{2}^{n^{\prime}})$ that are not typical with respect to the product of marginal distributions $P_{X}\otimes P_{X^{\prime}}$ . We show that this construction has the same rate and preserves the zero-error property. However, it modifies the types of the codewords, which become the product of marginals as wished.

We can now establish the equivalence of the linearizations of $C_{0}$ and $C(\cdot,P_{X})$ for the AND product.

Theorem 12.

Let $\mathcal{A}$ be a finite set, and $(G_{a})_{a\in\mathcal{A}}=(\mathcal{X}_{a},\mathcal{E}_{a})_{a\in\mathcal{A}}$ be a family of graphs. The following equivalence holds:

		$\displaystyle C_{0}\left(\bigwedge_{a\in\mathcal{A}}G_{a}\right)=\sum_{a\in% \mathcal{A}}C_{0}(G_{a})$		(44)
	$\displaystyle\Longleftrightarrow\qquad$	$\displaystyle\exists P_{X_{1},...,X_{\|\mathcal{A}\|}}\in\mathcal{P}^{\star}% \left(\bigwedge_{a\in\mathcal{A}}G_{a}\right)\!,\qquad C\left(\bigwedge_{a\in% \mathcal{A}}G_{a},\;P_{X_{1},...,X_{\|\mathcal{A}\|}}\right)=\sum_{a\in\mathcal{% A}}C(G_{a},P_{X_{a}}).$		(45)

Furthermore, any distribution $P_{X_{1},...,X_{|\mathcal{A}|}}\in\mathcal{P}^{\star}\left(\bigwedge_{a\in% \mathcal{A}}G_{a}\right)$ that satisfies (45) also satisfies $P_{X_{a}}\in\mathcal{P}^{\star}(G_{a})$ for all $a\in\mathcal{A}$ .

The proof of Theorem 12 is given in App. G.

III-E Linearization of the sum of independent channels

Similarly to the side-information problem, see Sec. II-D, the disjoint union of graphs introduced in Proposition 5 for zero-error channel coding has an operational interpretation as a sum of channels, as depicted in Fig. 10. At each time step $t\leq n$ , the encoder uses one channel $a_{t}$ among the $|\mathcal{A}|$ channels. The decoder observes an output, deduces the chosen channel, and retrieves its input. Since the output alphabets of each individual channel are disjoint, the channel output symbol uniquely identifies the channel that is used.

Figure 10: Sum of the

|\mathcal{A}|

channels

(P_{Y_{a}|X_{a}})_{a\in\mathcal{A}}

: only the channel

a_{t}\in\mathcal{A}

is used at instant

t\in\{1,\ldots,n\}

In the vanishing error regime, the linearization of the capacity for the sum of channels holds since

\displaystyle C=\log\bigg{(}\sum_{a\in\mathcal{A}}2^{C_{a}}\bigg{)},

(46)

where $C_{a}\doteq\max_{P_{X_{a}}}I(X_{a};Y_{a})$ is the capacity of the channel $P_{Y_{a}|X_{a}}$ .

Proposition 7 (from [3]).

The zero-error capacity of the sum of channels is given by

\displaystyle C_{0}\left(\bigsqcup_{a\in\mathcal{A}}G_{a}\right).

(47)

In the zero-error regime, Shannon in [3, Theorem 4] shows that

\displaystyle\forall G,G^{\prime},\quad\log\left(2^{C_{0}(G)}+2^{C_{0}(G^{% \prime})}\right)\leq

\displaystyle\,C_{0}(G\sqcup G^{\prime}).

(48)

For the sum of channels, a natural coding scheme consists in using the optimal codebooks for each channel in a time sharing manner, with respect to the distribution $P_{A}$ that maximizes $H(P_{A})+\sum_{a\in\mathcal{A}}P_{A}(a)C_{0}(G_{a})$ . In other words, with this strategy, communicating over the sum channel is equivalent to sending 2 types of information: one related to identifying the chosen channel, $H(P_{A})$ , and the other to the information on this channel, $C_{0}(G_{a})$ .

Lemma 5.

The map** $P_{A}\mapsto H(P_{A})+\sum_{a\in\mathcal{A}}P_{A}(a)C_{0}(G_{a})$ has a unique maximum

\displaystyle P^{\star}_{A}\doteq\left(\frac{2^{C_{0}(G_{a})}}{\sum_{a^{\prime% }\in\mathcal{A}}2^{C_{0}(G_{a^{\prime}})}}\right)_{a\in\mathcal{A}},

(49)

which gives

\displaystyle H(P^{\star}_{A})+\sum_{a\in\mathcal{A}}P^{\star}_{A}(a)C_{0}(G_{% a})

\displaystyle=\log\left(\sum_{a^{\prime}\in\mathcal{A}}2^{C_{0}(G_{a^{\prime}}% )}\right).

(50)

The proof of Lemma 5 is given in App. F-B and relies on the fact that the function $(w_{a})_{a\in\mathcal{A}}\mapsto\log\left(\sum_{a\in\mathcal{A}}2^{w_{a}}\right)$ is the Legendre-Fenchel conjugate [36] of the entropy function $P_{A}\mapsto H(P_{A})$ .

We consider the time-sharing strategy between the optimal codebooks along with the distribution $P^{\star}_{A}\in\Delta(\mathcal{A})$ defined in (49). If this strategy is optimal, then

\displaystyle C_{0}\left(\bigsqcup_{a\in\mathcal{A}}G_{a}\right)=\log\left(% \sum_{a\in\mathcal{A}}2^{C_{0}(G_{a})}\right),

(51)

which means that the linearization property holds for the disjoint union of graphs.

Remark 2.

Note that $P^{\star}_{A}\in\Delta(\mathcal{A})$ is full-support: it can be observed $P_{A}\mapsto H(P_{A})$ has an infinite slope at the frontier of $\Delta(\mathcal{A})$ , consequently the maximizer of $P_{A}\mapsto H(P_{A})+\sum_{a\in\mathcal{A}}P_{A}(a)C_{0}(G_{a})$ is always an interior point. In other words, the information carried by the channel index $H(P_{A})$ offsets the loss in rate, if the channels with smaller capacities are not chosen too often. Therefore, in the sum of channels setting, always choosing the channel with highest capacity is suboptimal, and never choosing a channel is also suboptimal, even if this channel has zero-error capacity equal to $0$ .

Similar to Theorem 12, we establish the equivalence between the linearization property between $C_{0}$ and $C(\cdot,P_{X})$ for the disjoint union of a family of graphs $(G_{a})_{a\in\mathcal{A}}$ .

Theorem 13.

The following equivalence holds

		$\displaystyle C_{0}\left(\bigsqcup_{a\in\mathcal{A}}G_{a}\right)=\log\left(% \sum_{a\in\mathcal{A}}2^{C_{0}(G_{a})}\right)$		(52)
	$\displaystyle\Longleftrightarrow\qquad$	$\displaystyle\exists P_{X}\in\mathcal{P}^{\star}\left(\bigsqcup_{a\in\mathcal{% A}}G_{a}\right)\!,\qquad C\left(\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_{a},\;% \sum_{a\in\mathcal{A}}P_{A}(a)P_{X_{a}}\right)=H(P_{A})+\sum_{a\in\mathcal{A}}% P_{A}(a)C(G_{a},P_{X_{a}}),$		(53)

where $P_{X_{a}}=P_{X|X\in\mathcal{X}_{a}}$ and $P_{A}(a)=P_{X}(\mathcal{X}_{a})$ for all $a\in\mathcal{A}$ . Furthermore, any $\sum_{a\in\mathcal{A}}P_{A}(a)P_{X_{a}}$ that satisfies (53) also satisfies the following for all $a\in\mathcal{A}$ :

\displaystyle P_{A}(a)=\frac{2^{C_{0}(G_{a})}}{\sum_{a^{\prime}\in\mathcal{A}}% 2^{C_{0}(G_{a^{\prime}})}},\text{ and }P_{X_{a}}\in\mathcal{P}^{\star}(G_{a}).

(54)

The proof of Theorem 13 is given in App. H. This result relies on Lemma 5 which proves that the distribution $P_{A}(a)=\frac{2^{C_{0}(G_{a})}}{\sum_{a^{\prime}\in\mathcal{A}}2^{C_{0}(G_{a^% {\prime}})}}$ maximizes $P_{A}\mapsto H(P_{A})+\sum_{a\in\mathcal{A}}P_{A}(a)C_{0}(G_{a})$ .

Remark 3.

One could think of a possible strategy for proving Theorem 13, which is successively using the equivalences in Theorem 8, Theorem 12, and Proposition 5. However, doing so yields the following statement

	$\displaystyle\textstyle C_{0}\left(\textstyle\bigsqcup_{a\in\mathcal{A}}G_{a}% \right)=\log\left(\textstyle\sum_{a\in\mathcal{A}}2^{C_{0}(G_{a})}\right)$	(55)
$\displaystyle\Longleftrightarrow\;$	$\displaystyle\exists P_{A}\in\Delta(\mathcal{A})\text{ full-support},\;\exists P% _{X_{1},...,X_{\|\mathcal{A}\|}}\in\mathcal{P}^{\star}\left(\textstyle\bigwedge_% {a\in\mathcal{A}}G_{a}\right)\!,$	(56)
	$\displaystyle\textstyle C\left(\textstyle\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_% {a},\;\sum_{a\in\mathcal{A}}P_{A}(a)P_{X_{a}}\right)=H(P_{A})+\sum_{a\in% \mathcal{A}}P_{A}(a)C(G_{a},P_{X_{a}}),$

where it remains to link the sets of capacity achieving distributions $\mathcal{P}^{\star}\left(\bigsqcup_{a\in\mathcal{A}}G_{a}\right)$ and $\mathcal{P}^{\star}\left(\bigwedge_{a\in\mathcal{A}}G_{a}\right)$ .

Theorem 13 together with Theorem 12, Theorem 8 and Proposition 5, establish the equivalence of the linearization property between $C_{0}$ , $C(\cdot,P_{X})$ and $\overline{H}$ for the AND product $\wedge$ , and for the disjoint union $\sqcup$ of a family of graphs $(G_{a})_{a\in\mathcal{A}}$ , as depicted in Fig. 7.

IV Main Example and Counterexamples for the Linearization of Optimal Rates

In this section, we exploit the equivalences in the linearization depicted in Fig. 7, in order to provide single-letter characterization of $\overline{H}$ and $C_{0}$ for several new classes of graphs.

IV-A Perfect graphs

We show that perfect graphs allow for linearization of $C_{0}$ , $C(\cdot,P_{X})$ and $\overline{H}$ with respect to both $\sqcup$ and $\wedge$ with any underlying probability distribution. Perfect graphs are one of the only known examples of graphs with a single-letter formula for $\overline{H}$ and $C_{0}$ . Theorem. 4 allows us to provide new single-letter characterization for $C_{0}$ , $C(\cdot,P_{X})$ and $\overline{H}$ for products of perfect graphs, which are not perfect in general.

Definition 12 (Graph complement, clique number $\omega$ ).

For all $G=(\mathcal{X},\mathcal{E})$ , the complementary graph of $G$ is defined by $\overline{G}\doteq(\mathcal{X},\mathcal{E}^{c})$ . The clique number of $G$ is defined by $\omega(G)\doteq\alpha(\overline{G})$ , where the independence number $\alpha$ is stated in Definition 9.

Definition 13 (Perfect graph).

A graph $G=(\mathcal{X},\mathcal{E})$ is perfect if for all subset of vertices $\mathcal{S}\subseteq\mathcal{X},\;\chi(G[\mathcal{S}])=\omega(G[\mathcal{S}])$ . A probabilistic graph $(\mathcal{X},\mathcal{E},P_{X})$ is perfect if $(\mathcal{X},\mathcal{E})$ is perfect.

A remarkable property of perfect graphs is their single-letter characterizations for zero-error problems. For example, as stated in Theorem 14, for the side-information problem, the optimal rate $\overline{H}(G)$ equals the Körner graph entropy, defined below and introduced in [19], when $G$ is a perfect graph.

Definition 14 (Körner graph entropy $H_{\kappa}$ ).

For all $G=(\mathcal{X},\mathcal{E},P_{X})$ , let $\Gamma(G)$ be the collection of independent sets of vertices in $G$ . The Körner graph entropy of $G$ is defined by

\displaystyle H_{\kappa}(G)=\min_{X\in W\in\Gamma(G)}I(W;X),

(57)

where the minimum is taken over all distributions $P_{W|X}$ with the constraint that the random vertex $X$ belongs to the random independent set $W$ with probability one, i.e. $X\in W\in\Gamma(G)$ in (57).

Theorem 14 (from [37, Corollary 12]).

Let $G$ be a perfect probabilistic graph, then

\displaystyle\overline{H}(G)=H_{\kappa}(G).

(58)

Similarly, single-letter characterization holds for the zero-error capacity of perfect graphs, as stated below. This is a consequence of a more general result due to Shannon (see [3, Theorem 3]) that states that a graph $G$ whose vertex set can be partitioned into $\alpha(G)$ cliques, i.e. complete induced subgraphs, satisfies $C_{0}(G)=\log\alpha(G)$ . Perfect graphs satisfy this property as their complementary is also perfect, and satisfy $\chi(\overline{G})=\omega(\overline{G})=\alpha(G)$ , where $\omega(\overline{G})$ is the clique number, see [9, pp. 382].

Theorem 15 (from [3, Theorem 3]).

If $G$ is a perfect graph, then $C_{0}(G)=\log\alpha(G)$ .

We now derive single-letter characterizations of $C_{0}$ , $C(\cdot,P_{X})$ , and $\overline{H}$ for graphs where these quantities were previously unknown. These characterizations result from the linearization theorems: [22], [23] for $C_{0}$ , and Theorem 4 for $C(\cdot,P_{X})$ and $\overline{H}$ .

More precisely, consider some perfect graphs. Their disjoint union is perfect, as shown in Lemma 19, and $C_{0}$ linearizes: since $C_{0}(G\sqcup G^{\prime})=\log\alpha(G\sqcup G^{\prime})=\log(\alpha(G)+\alpha% (G^{\prime}))=\log(2^{C_{0}(G)}+2^{C_{0}(G^{\prime})})$ holds for all perfect graphs $G,G^{\prime}$ . According to the linearization theorems [22], [23], since $C_{0}$ of the disjoint union linearizes, so does $C_{0}$ of the AND product: $C_{0}(G\wedge G^{\prime})=C_{0}(G)+C_{0}(G^{\prime})$ . This leads to the following proposition.

Proposition 8.

Let $G$ and $G^{\prime}$ be perfect graphs, then

	$\displaystyle C_{0}(G\sqcup G^{\prime})=\log\left(2^{C_{0}(G)}+2^{C_{0}(G^{% \prime})}\right)=\log(\alpha(G)+\alpha(G^{\prime})),$		(59)
	$\displaystyle C_{0}(G\wedge G^{\prime})=C_{0}(G)+C_{0}(G^{\prime})=\log\alpha(% G)+\log\alpha(G^{\prime}).$		(60)

According to the previous proposition, $C_{0}(G\wedge G^{\prime})$ can now be computed for any pair of perfect graphs, as it linearizes. This result was previously unknown, as the AND product of perfect graphs is not necessarily perfect. For instance, cycle graphs $C_{6}$ and $C_{8}$ are perfect (due to the strong perfect graph theorem, mentioned below), but their AND product is not (also due to the strong perfect graph theorem); it contains an odd cycle of length 7, illustrated in Fig. 11.

Figure 11: A non-perfect AND product of perfect graphs:

C_{6}\wedge C_{8}

with an induced

C_{7}

Theorem 16 (Strong perfect graph theorem, from [38, Theorem 1.2]).

A graph $G$ is perfect if and only if neither $G$ nor $\overline{G}$ have an induced odd cycle of length at least 5.

Similarly, we show that the linearization property of $C(\cdot,P_{X})$ and $\overline{H}$ holds for perfect graphs and for all underlying probability distributions, and we provide new single-letter expression for $\overline{H}$ , $C(\cdot,P_{X})$ in that case.

Theorem 17.

When $(G_{a})_{a\in\mathcal{A}}=(\mathcal{X}_{a},\mathcal{E}_{a},P_{X_{a}})_{a\in% \mathcal{A}}$ is a family of perfect probabilistic graphs, we have the following single-letter characterizations:

$\displaystyle\overline{H}\left(\bigwedge_{a\in\mathcal{A}}G_{a}\right)$	$\displaystyle=\sum_{a\in\mathcal{A}}\overline{H}(G_{a})=\sum_{a\in\mathcal{A}}% H_{\kappa}(G_{a}),$	(61)
$\displaystyle\overline{H}\left(\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}\right)$	$\displaystyle=\sum_{a\in\mathcal{A}}P_{A}(a)\overline{H}(G_{a})=\sum_{a\in% \mathcal{A}}P_{A}(a)H_{\kappa}(G_{a}),$	(62)
$\displaystyle C\left(\bigwedge_{a\in\mathcal{A}}G_{a},\;\bigotimes_{a\in% \mathcal{A}}P_{X_{a}}\right)$	$\displaystyle=\sum_{a\in\mathcal{A}}C(G_{a},P_{X_{a}})=\sum_{a\in\mathcal{A}}% \big{(}H(X_{a})-H_{\kappa}(G_{a})\big{)},$	(63)
$\displaystyle C\left(\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a},\;\sum_{a\in% \mathcal{A}}P_{A}(a)P_{X_{a}}\right)$	$\displaystyle=H(P_{A})+\sum_{a\in\mathcal{A}}P_{A}(a)C(G_{a},P_{X_{a}})$
	$\displaystyle=H(P_{A})+\sum_{a\in\mathcal{A}}P_{A}(a)\Big{(}H(X_{a})-H_{\kappa% }(G_{a})\Big{)}.$	(64)

The proof of Theorem 17 is given in App. I. An an example, we consider below the AND product of the cycle graphs $C_{6}$ and $C_{8}$ .

Corollary 2.

Consider the cycle graphs $C_{6}$ and $C_{8}$ and denote by $P_{X_{6}}$ and $P_{X_{8}}$ the probability distributions on the vertices. We have

	$\displaystyle\overline{H}(C_{6}\wedge C_{8})=$	$\displaystyle H_{\kappa}(C_{6})+H_{\kappa}(C_{8}),$		(65)
	$\displaystyle C(C_{6}\wedge C_{8},P_{X_{6}}\otimes P_{X_{8}})=$	$\displaystyle H(P_{X_{6}})-H_{\kappa}(C_{6})+H(P_{X_{8}})-H_{\kappa}(C_{8}).$		(66)

We now explore the combination of a perfect graph with a non-perfect graph. More specifically, we consider the graph $C_{5}\sqcup G$ where $G$ is perfect, for which the linearization of $\overline{H}$ was studied by Tuncel et al. in [21]. The pentagon graph $C_{5}$ is not perfect, thereby making any disjoint union or AND product involving it non-perfect. However, Theorem 4 provides a non-perfect example where the linearization property holds, offering a single-letter characterization of $\overline{H}$ for the class of graphs $C_{5}\wedge G$ , where $G$ is perfect.

Theorem 18 (from [21, Lemma 3]).

Let $s\in[0,1]$ , let $G$ be a perfect probabilistic graph, and let $G_{5}\doteq(C_{5},\operatorname*{Unif}(\{0,...,4\}))$ , we have

	$\displaystyle\overline{H}(G_{5}\overset{(s,1-s)}{\sqcup}G)$	$\displaystyle=s\overline{H}(G_{5})+(1-s)\overline{H}(G)$		(67)
		$\displaystyle=\frac{s}{2}\log 5+(1-s)H_{\kappa}(G).$		(68)

Corollary 3.

For all perfect probabilistic graph $G$ ,

\displaystyle\overline{H}(G\wedge G_{5})=\overline{H}(G)+\overline{H}(G_{5})=H% _{\kappa}(G)+\textstyle\frac{1}{2}\log 5.

(69)

IV-B Vertex transitive graphs

We study the importance class of vertex-transitive graphs, where all the vertices of the graph play the same “role”, and show that the uniform distribution is capacity achieving for these graphs.

Definition 15 (Vertex-transitive graph).

An automorphism of a graph $G=(\mathcal{X},\mathcal{E})$ is a bijection $\psi:\mathcal{X}\rightarrow\mathcal{X}$ such that for all $v,x^{\prime}\in\mathcal{X}$ , $vx^{\prime}\in\mathcal{E}$ if and only if $\psi(x)\psi(x^{\prime})\in\mathcal{E}$ . The group of automorphisms of G is denoted by $\operatorname*{Aut}(G)$ .

A graph $G=(\mathcal{X},\mathcal{E})$ is vertex-transitive if $\operatorname*{Aut}(G)$ acts transitively on its vertices, i.e. for all $v,x^{\prime}\in\mathcal{X}$ , there exists $\psi\in\operatorname*{Aut}(G)$ such that $\psi(x)=x^{\prime}$ .

Proposition 9.

If $G=(\mathcal{X},\mathcal{E})$ is vertex-transitive, then

\displaystyle\operatorname*{Unif}(\mathcal{X})\in\mathcal{P}^{\star}(G).

(70)

The proof of Proposition 9 is given in App. F-A.

Corollary 4.

Let $(G_{a})_{a\in\mathcal{A}}=(\mathcal{X}_{a},\mathcal{E}_{a})_{a\in\mathcal{A}}$ be vertex-transitive graphs, their product is also vertex-transitive and

\displaystyle\operatorname*{Unif}\left(\prod_{a\in\mathcal{A}}\mathcal{X}_{a}% \right)=\bigotimes_{a\in\mathcal{A}}\operatorname*{Unif}(\mathcal{X}_{a})\in% \mathcal{P}^{\star}\left(\bigwedge_{a\in\mathcal{A}}G_{a}\right).

(71)

IV-C The Schläfli graph

We now study the important case of the Schläfli graph $S$ as it offers a counterexample for the linearizations of all quantities studied in this paper. In [20], Haemers showed that the linearization property does not hold for the product of the Schläfli graph $S$ with its complement $\overline{S}$ ,

\displaystyle C_{0}(S)+C_{0}(\overline{S})<C_{0}(S\wedge\overline{S}).

(72)

More specifically, Haemers shows that $C_{0}(S)=\log_{2}(3)$ , $C_{0}(\overline{S})\leq\log_{2}(7)$ and $\log_{2}(27)\leq C_{0}(S\wedge\overline{S})$ . In this section, we show that a similar conclusion holds for $C(\cdot,P_{X})$ and for $\overline{H}$ .

According to [39, Lemma 3.7], the Schläfli graph $S$ and its complement $\overline{S}$ are vertex transitive, as well as their product $S\wedge\overline{S}$ . By Proposition 9, the uniform distribution is capacity-achieving for $S$ , $\overline{S}$ , and $S\wedge\overline{S}$ .

Corollary 5.

Consider the Schläfli graph $S$ and its complement $\overline{S}$ . Then,

	$\displaystyle C(S,\operatorname{Unif}(\mathcal{X}_{S}))=C_{0}(S),\qquad C(% \overline{S},\operatorname{Unif}(\mathcal{X}_{S}))=C_{0}(\overline{S}),$		(73)
	$\displaystyle C(S\wedge\overline{S},\operatorname{Unif}(\mathcal{X}_{S})% \otimes\operatorname{Unif}(\mathcal{X}_{\overline{S}}))=C_{0}(S\wedge% \overline{S}).$		(74)

In Theorem 19, we extend Haemers’s results of [20] and we show that the linearization property does not hold for $C(\cdot,P_{X})$ and $\overline{H}$ when the distribution on the vertices is uniform. By using Lemma 1, we also equalize $\overline{H}$ , and similarly $C_{0}$ , for the AND product $S\wedge\overline{S}$ and for the disjoint union $S\sqcup\overline{S}$ , up to a certain constant.

Theorem 19.

Let $s\in(0,1)$ , let $S$ be the Schläfli graph and $\overline{S}$ its complementary, with uniform distributions on their vertices. Then,

$\displaystyle C(S\wedge\overline{S},\operatorname{Unif}(\mathcal{X}_{S})% \otimes\operatorname{Unif}(\mathcal{X}_{\overline{S}}))>\>$	$\displaystyle C(S,\operatorname{Unif}(\mathcal{X}_{S}))+C(\overline{S},% \operatorname{Unif}(\mathcal{X}_{\overline{S}})),$	(75)
$\displaystyle C(S\sqcup\overline{S},s\operatorname{Unif}(\mathcal{X}_{S})+(1-% s)\operatorname{Unif}(\mathcal{X}_{\overline{S}}))>\>$	$\displaystyle h_{b}(s)+sC(S,\operatorname{Unif}(\mathcal{X}_{S}))+(1-s)C(% \overline{S},\operatorname{Unif}(\mathcal{X}_{\overline{S}})),$	(76)
$\displaystyle\overline{H}(S\wedge\overline{S})<\>$	$\displaystyle\overline{H}(S)+\overline{H}(\overline{S}),$	(77)
$\displaystyle\overline{H}(S\overset{(s,1-s)}{\sqcup}\overline{S})<\>$	$\displaystyle s\overline{H}(S)+(1-s)\overline{H}(\overline{S});$	(78)

where $h_{b}$ is the binary entropy. Moreover, we have

	$\displaystyle\overline{H}(S\overset{(\frac{1}{2},\frac{1}{2})}{\sqcup}% \overline{S})=$	$\displaystyle\,\frac{1}{2}\overline{H}(S\wedge\overline{S}),$		(79)
	$\displaystyle C_{0}(S\sqcup\overline{S})=$	$\displaystyle\,1-\frac{1}{2}C_{0}(S\wedge\overline{S}).$		(80)

We obtain (75) from Theorem 12 and Corollary 5; (76) and (77) come from Proposition 5; and (78) comes from Theorem 4.

Remark 4.

Alon has built in [40] infinite families of graphs that satisfy $C_{0}(G\sqcup G^{\prime})>\log(2^{C_{0}(G)}+2^{C_{0}(G^{\prime})})$ . Similar results as in Theorem 19 can be derived for these graphs, by using their respective capacity-achieving distributions.

V Conclusion

We have shown the equivalences of linearization properties between $C_{0}$ , $C(\cdot,P_{X})$ , and $\overline{H}$ , as depicted in Fig. 7. We proved the equivalence between the suboptimality of separated zero-error coding on independent channels, and the suboptimality of separated compression of independent sources in the zero-error side-information setting, with same characteristic graph and capacity-achieving distribution.

We also state the following open questions:

As pointed out in Lemma 11, for all capacity-achieving distribution of a product graph, the product of its marginals is also capacity-achieving. Are these marginals capacity-achieving for the respective graphs in the product, and conversely, if we consider the product of capacity-achieving distributions of graphs, is this distribution capacity-achieving for the product of graphs? In other words,

\displaystyle\mathcal{P}^{\star}\left(\bigwedge_{a\in\mathcal{A}}G_{a}\right)% \cap\bigotimes_{a\in\mathcal{A}}\Delta(\mathcal{X}_{a})\overset{?}{=}% \bigotimes_{a\in\mathcal{A}}\mathcal{P}^{\star}(G_{a}).

(81)

We gave a partial answer in Theorem 12, in the sense that inclusion holds when the linearization of the product holds.

-

We have shown in Theorem 12 and Theorem 13 that the linearization property of $C_{0}$ holds if and only if the linearization property of $C(\cdot,P_{X})$ holds, where $P_{X}$ is any capacity-achieving distribution. Can we find graphs such that the linearization property of $C(\cdot,P_{X})$ holds when $P_{X}$ is capacity-achieving, but does not hold for some $P_{X}$ that is not capacity-achieving? A negative answer would imply that the linearization property of $C_{0}$ is equivalent to the linearization property of $C(\cdot,P_{X})$ and $\overline{H}$ for all $P_{X}$ , similarly to perfect graphs.
-

Finally, we have seen in Corollary 3 that $\overline{H}\big{(}G\wedge G_{5}\big{)}$ with $G$ perfect is an example where the linearization property holds. Is the non-linearization property of $\overline{H}(\wedge\>\cdot)$ tied to specific non-perfect induced subgraphs in each graph in the product? And if so, can we find a minimal family of these graphs?

Appendix A Proof of Theorem 4

In order to prove Theorem 4, we will need Lemma 1, Lemma 2 and Lemma 6. The proof of Lemma 1 is a direct consequence of Lemma 3, which is proved in App. A-A. Lemma 2 gives regularity properties of $P_{A}\mapsto\overline{H}\big{(}\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}\big{)}$ and is proved in App. A-B. Lemma 6 states that if a convex function $\gamma$ of $\Delta(\mathcal{A})$ meets the linear interpolation of $(\gamma(\mathds{1}_{a}))_{a\in\mathcal{A}}$ , where $(\mathds{1}_{a})_{a\in\mathcal{A}}$ are the extreme points of $\Delta(\mathcal{A})$ , then $\gamma$ is linear. The proof of Lemma 6 is given in App. A-C.

Lemma 6.

Let $\mathcal{A}$ be a finite set, and $\gamma:\Delta(\mathcal{A})\rightarrow\mathbb{R}$ be a convex function, and for all $a\in\mathcal{A}$ , let $\mathds{1}_{a}$ be the distribution that assigns 1 to the symbol $a$ and 0 to the others. Then the following holds:

		$\displaystyle\exists P_{A}\in\operatorname*{int}(\Delta(\mathcal{A})),\,\gamma% (P_{A})=\sum_{a\in\mathcal{A}}P_{A}(a)\gamma(\mathds{1}_{a})$		(82)
	$\displaystyle\Longleftrightarrow\;\;$	$\displaystyle\forall P_{A}\in\Delta(\mathcal{A}),\,\gamma(P_{A})=\sum_{a\in% \mathcal{A}}P_{A}(a)\gamma(\mathds{1}_{a})$		(83)

where $\operatorname*{int}(\Delta(\mathcal{A}))$ is the interior of $\Delta(\mathcal{A})$ (i.e. the full-support distributions on $\mathcal{A}$ ).

Now let us prove Theorem 4:

$(\Longrightarrow)$ Assume that $\overline{H}\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}\right)=\textstyle% \sum_{a\in\mathcal{A}}\overline{H}(G_{a})$ .

We can use Lemma 1, which states that $\overline{H}\big{(}\bigsqcup_{a\in\mathcal{A}}^{\operatorname*{Unif}(\mathcal{% A})}G_{a}\big{)}=\frac{1}{|\mathcal{A}|}\overline{H}\left(\textstyle\bigwedge_% {a\in\mathcal{A}}G_{a}\right)$ , hence $\overline{H}\big{(}\bigsqcup_{a\in\mathcal{A}}^{\operatorname*{Unif}(\mathcal{% A})}G_{a}\big{)}=\sum_{a\in\mathcal{A}}\frac{1}{|\mathcal{A}|}\overline{H}(G_{% a})$ . Thus, the function $\eta:P_{A}\mapsto\overline{H}\big{(}\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}% \big{)}$ satisfies (82) with the interior point $P_{A}=\operatorname*{Unif}(\mathcal{A})$ , and is convex by Lemma 2: by Lemma 6 we have

\displaystyle\forall P_{A}\in\Delta(\mathcal{A}),\;\overline{H}\big{(}% \textstyle\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}\big{)}=\textstyle\sum_{a\in% \mathcal{A}}P_{A}(a)\overline{H}(G_{a}).

(84)

$(\Longleftarrow)$ Conversely, assume (84), then $\eta:P_{A}\mapsto\overline{H}\big{(}\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}% \big{)}$ is linear. We can use Lemma 1, and we have $\overline{H}\big{(}\bigwedge_{a\in\mathcal{A}}G_{a}\big{)}=|\mathcal{A}|% \overline{H}\big{(}\bigsqcup_{a\in\mathcal{A}}^{\operatorname*{Unif}(\mathcal{% A})}G_{a}\big{)}=\sum_{a\in\mathcal{A}}\overline{H}(G_{a})$ .

A-A Proof of Lemma 3

Lemma 3 is the consequence of a more general result, which consider probability distributions $P_{A}\in\Delta(\mathcal{A})$ instead of types $P_{A}\in\Delta_{k}(\mathcal{A})$ , for some $k\in\mathbb{N}^{\star}$ .

Lemma 7.

Let $P_{A}\in\Delta(\mathcal{A})$ a probability distribution with full-support and let $(\overline{a}_{n})_{n\in\mathbb{N}^{\star}}\in\mathcal{A}^{\mathbb{N}^{\star}}$ be any sequence such that its type $T_{\overline{a}^{n}}\rightarrow P_{A}$ when $n\rightarrow\infty$ . Then we have

\overline{H}\left(\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}\right)=\lim_{n% \rightarrow\infty}\frac{1}{n}H_{\chi}\left(\bigwedge_{a\in\mathcal{A}}G_{a}^{% \wedge nT_{\overline{a}^{n}}(a)}\right).

(85)

The proof of Lemma 7 is stated in App. A-A1. Now let us prove Lemma 3. Let $(\overline{a}_{n})_{n\in\mathbb{N}^{\star}}$ be a $k$ -periodic sequence such that $T_{\overline{a}^{k}}=P_{A}$ , then $T_{\overline{a}^{nk}}=T_{\overline{a}^{k}}$ for all $n\in\mathbb{N}^{\star}$ , and $T_{\overline{a}^{n}}\underset{n\rightarrow\infty}{\rightarrow}P_{A}$ . We can use Lemma 7 and consider every $k$ -th term in the limit:

$\displaystyle\overline{H}\Big{(}\textstyle\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G% _{a}\Big{)}$	$\displaystyle=\lim\limits_{n\rightarrow\infty}\frac{1}{kn}H_{\chi}\Big{(}% \textstyle\bigwedge_{a\in\mathcal{A}}G_{a}^{\wedge knT_{\overline{a}^{kn}}(a)}% \Big{)}$	(86)
	$\displaystyle=\lim\limits_{n\rightarrow\infty}\frac{1}{kn}H_{\chi}\Big{(}\Big{% (}\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}^{\wedge kT_{\overline{a}^{k}}(a)}% \Big{)}^{\wedge n}\Big{)}$	(87)
	$\displaystyle=\frac{1}{k}\overline{H}\Big{(}\textstyle\bigwedge_{a\in\mathcal{% A}}G_{a}^{\wedge kP_{A}(a)}\Big{)}.$	(88)

A-A1 Proof of Lemma 7

We need several lemmas for this result. Lemma 8 establishes the distributivity of $\wedge$ with respect to $\sqcup$ for probabilistic graphs, similarly as in [41] for graphs without underlying distribution. Lemma 9 states that $\overline{H}$ can be computed with subgraphs induced by sets that have an asymptotic probability one, in particular we will use it with typical sets of vertices. Lemma 10 gives the chromatic entropy of a disjoint union of isomorphic probabilistic graphs. The proofs of Lemma 8, Lemma 9 and Lemma 10 are respectively given in App. A-A2, App. A-A3, and Appendix A-A4.

Lemma 8.

Let $\mathcal{A},\mathcal{B}$ be finite sets, let $P_{A}\in\Delta(\mathcal{A})$ and $P_{B}\in\Delta(\mathcal{B})$ . For all $a\in\mathcal{A}$ and $b\in\mathcal{B}$ , let $G_{a}=(\mathcal{X}_{a},\mathcal{E}_{a},P_{X_{a}})$ and $G_{b}=(\mathcal{X}_{b},\mathcal{E}_{b},P_{X_{b}})$ be probabilistic graphs. Then

\displaystyle\left(\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}\right)\wedge\left(% \bigsqcup_{b\in\mathcal{B}}^{P_{B}}G_{b}\right)=\bigsqcup_{(a,b)\in\mathcal{A}% \times\mathcal{B}}^{P_{A}P_{B}}G_{a}\wedge G_{b}.

(89)

Lemma 9.

Let $G=(\mathcal{X},\mathcal{E},P_{X})$ , and $(\mathcal{S}_{n})_{n\in\mathbb{N}^{\star}}$ be a sequence of sets such that for all $n\in\mathbb{N}^{\star}$ , $\mathcal{S}_{n}\subseteq\mathcal{X}^{n}$ , and $P^{n}_{X}(\mathcal{S}_{n})\rightarrow 1$ when $n\rightarrow\infty$ . Then $\overline{H}(G)=\lim_{n\rightarrow\infty}\frac{1}{n}H_{\chi}\big{(}G^{\wedge n% }[\mathcal{S}_{n}]\big{)}$ .

Definition 16 (Isomorphic probabilistic graphs).

Let $G_{1}=(\mathcal{X}_{1},\mathcal{E}_{1},P_{X_{1}})$ and $G_{2}=(\mathcal{X}_{2},\mathcal{E}_{2},P_{X_{2}})$ be two probabilistic graphs. We say that $G_{1}$ is isomorphic to $G_{2}$ (denoted by $G_{1}\simeq G_{2}$ ) if there exists an isomorphism between them, i.e. a bijection $\psi:\mathcal{X}_{1}\rightarrow\mathcal{X}_{2}$ such that:

•

For all $x_{1},x_{1}^{\prime}\in\mathcal{X}_{1}$ , $x_{1}x^{\prime}_{1}\in\mathcal{E}_{1}\Longleftrightarrow\psi(x_{1})\psi(x^{% \prime}_{1})\in\mathcal{E}_{2}$ ,
•

For all $x_{1}\in\mathcal{X}_{1}$ , $P_{X_{1}}(x_{1})=P_{X_{2}}\big{(}\psi(x_{1})\big{)}$ .

Lemma 10.

Let $\mathcal{B}$ be a finite set, let $P_{B}\in\Delta(\mathcal{B})$ and let $(G_{b})_{b\in\mathcal{B}}$ be a family of isomorphic probabilistic graphs, then $H_{\chi}\big{(}\bigsqcup_{b^{\prime}\in\mathcal{B}}^{P_{B}}G_{b^{\prime}}\big{% )}=H_{\chi}(G_{b})$ for all $b\in\mathcal{B}$ .

Now let us prove Lemma 7. Let $P_{A}\in\Delta(\mathcal{A})$ , and let $G=\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_{a}$ . Let $(\overline{a}_{n})_{n\in\mathbb{N}^{\star}}\in\mathcal{A}^{\mathbb{N}^{\star}}$ be a sequence such that $T_{\overline{a}^{n}}\rightarrow P_{A}$ when $n\rightarrow\infty$ .

Let $\varepsilon>0$ , and for all $n\in\mathbb{N}^{\star}$ let

	$\displaystyle\mathcal{T}^{n}_{\varepsilon}(P_{A})\doteq\big{\{}a^{n}\in% \mathcal{A}^{n}\>\big{\|}\>\\|T_{a^{n}}-P_{A}\\|_{\infty}\leq\varepsilon\big{\}},$		(90)
	$\displaystyle P^{\prime n}\doteq\frac{P^{n}_{A}}{P^{n}_{A}(\mathcal{T}^{n}_{% \varepsilon}(P_{A}))},\qquad\mathcal{S}_{n,\varepsilon}\doteq\bigsqcup_{a^{n}% \in\mathcal{T}^{n}_{\varepsilon}(P_{A})}\;\prod_{t\leq n}\mathcal{X}_{a_{t}}.$

Since $P^{n}_{X}(\mathcal{S}_{n,\varepsilon})\rightarrow 1$ when $n\rightarrow\infty$ , we have by Lemma 9

\displaystyle\overline{H}(G)=\lim_{n\rightarrow\infty}\frac{1}{n}H_{\chi}\Big{% (}G^{\wedge n}[\mathcal{S}_{n,\varepsilon}]\Big{)},

(91)

Let us study the limit in (91). For all $n$ large enough, $\overline{a}^{n}\in\mathcal{T}^{n}_{\varepsilon}(P_{A})$ as $T_{\overline{a}^{n}}\rightarrow P_{A}$ . Therefore, for all $a^{n}\in\mathcal{T}^{n}_{\varepsilon}(P_{A})$ , $a^{\prime}\in\mathcal{A}$ , and $n$ large enough, we have

\displaystyle\big{|}T_{\overline{a}^{n}}(a^{\prime})-T_{a^{n}}(a^{\prime})\big% {|}\leq 2\varepsilon.

(92)

We have on one hand

	$\displaystyle H_{\chi}\Big{(}\big{(}\textstyle\bigsqcup^{P_{A}}_{a\in\mathcal{% A}}G_{a}\big{)}^{\wedge n}[\mathcal{S}_{n,\varepsilon}]\Big{)}$
$\displaystyle=\;$	$\displaystyle H_{\chi}\left(\left(\textstyle\bigsqcup_{a^{n}\in\mathcal{A}^{n}% }^{P_{A}^{n}}\;\textstyle\bigwedge_{t\leq n}G_{a_{t}}\right)[\mathcal{S}_{n,% \varepsilon}]\right)$	(93)
$\displaystyle=\;$	$\displaystyle H_{\chi}\left(\textstyle\bigsqcup_{a^{n}\in\mathcal{T}^{n}_{% \varepsilon}(P_{A})}^{P^{\prime n}}\;\textstyle\bigwedge_{t\leq n}G_{a_{t}}\right)$	(94)
$\displaystyle=\;$	$\displaystyle H_{\chi}\left(\textstyle\bigsqcup_{a^{n}\in\mathcal{T}^{n}_{% \varepsilon}(P_{A})}^{P^{\prime n}}\;\textstyle\bigwedge_{a^{\prime}\in% \mathcal{A}}G_{a^{\prime}}^{\wedge nT_{a^{n}}(a^{\prime})}\right)$	(95)
$\displaystyle\leq\;$	$\displaystyle H_{\chi}\left(\textstyle\bigsqcup_{a^{n}\in\mathcal{T}^{n}_{% \varepsilon}(P_{A})}^{P^{\prime n}}\;\textstyle\bigwedge_{a^{\prime}\in% \mathcal{A}}G_{a^{\prime}}^{\wedge nT_{\overline{a}^{n}}(a^{\prime})+\lceil 2n% \varepsilon\rceil}\right)$	(96)
$\displaystyle=\;$	$\displaystyle H_{\chi}\left(\textstyle\bigwedge_{a^{\prime}\in\mathcal{A}}G_{a% ^{\prime}}^{\wedge nT_{\overline{a}^{n}}(a^{\prime})+\lceil 2n\varepsilon% \rceil}\right)$	(97)
$\displaystyle\leq\;$	$\displaystyle H_{\chi}\left(\textstyle\bigwedge_{a^{\prime}\in\mathcal{A}}G_{a% ^{\prime}}^{\wedge nT_{\overline{a}^{n}}(a^{\prime})}\right)+H_{\chi}\left(% \textstyle\bigwedge_{a^{\prime}\in\mathcal{A}}G_{a^{\prime}}^{\wedge\lceil 2n% \varepsilon\rceil}\right)$	(98)
$\displaystyle\leq\;$	$\displaystyle H_{\chi}\left(\textstyle\bigwedge_{a^{\prime}\in\mathcal{A}}G_{a% ^{\prime}}^{\wedge nT_{\overline{a}^{n}}(a^{\prime})}\right)+\lceil 2n% \varepsilon\rceil\|\mathcal{A}\|\log\|\mathcal{X}\|;$	(99)

where (93) comes from Lemma 8; (94) comes from the definition of $\mathcal{S}_{n,\varepsilon}$ and $P^{\prime n}$ in (90); (95) is a rearrangement of the terms inside the product; (96) comes from (92); (97) follows from Lemma 10, the graphs $\big{(}\bigwedge_{a^{\prime}\in\mathcal{A}}G_{a^{\prime}}^{\wedge nT_{% \overline{a}^{n}}(a^{\prime})+\lceil 2n\varepsilon\rceil}\big{)}_{a^{n}\in% \mathcal{T}^{n}_{\varepsilon}(P_{A})}$ are isomorphic as they do not depend on $a^{n}$ ; (98) follows from the subadditivity of $H_{\chi}$ ; and (99) is the upper bound on $H_{\chi}$ given by the highest entropy of a coloring.

On the other hand, we obtain with similar arguments

	$\displaystyle H_{\chi}\Big{(}\big{(}\textstyle\bigsqcup^{P_{A}}_{a\in\mathcal{% A}}G_{a}\big{)}^{\wedge n}[\mathcal{S}_{n,\varepsilon}]\Big{)}$
$\displaystyle\geq\>$	$\displaystyle H_{\chi}\left(\textstyle\bigwedge_{a^{\prime}\in\mathcal{A}}G_{a% ^{\prime}}^{\wedge nT_{\overline{a}^{n}}(a^{\prime})-\lceil 2n\varepsilon% \rceil}\right)$	(100)
$\displaystyle\geq\>$	$\displaystyle H_{\chi}\left(\textstyle\bigwedge_{a^{\prime}\in\mathcal{A}}G_{a% ^{\prime}}^{\wedge nT_{\overline{a}^{n}}(a^{\prime})}\right)-H_{\chi}\left(% \textstyle\bigwedge_{a^{\prime}\in\mathcal{A}}G_{a^{\prime}}^{\wedge\lceil 2n% \varepsilon\rceil}\right),$	(101)
$\displaystyle\geq\>$	$\displaystyle H_{\chi}\left(\textstyle\bigwedge_{a^{\prime}\in\mathcal{A}}G_{a% ^{\prime}}^{\wedge nT_{\overline{a}^{n}}(a^{\prime})}\right)-\lceil 2n% \varepsilon\rceil\|\mathcal{A}\|\log\|\mathcal{X}\|.$	(102)

Note that (101) also comes from the subadditivity of $H_{\chi}$ , as $H_{\chi}(G_{2})\geq H_{\chi}(G_{1}\wedge G_{2})-H_{\chi}(G_{1})$ for all $G_{1},G_{2}$ .

By combining (99) and (102) we obtain

	$\displaystyle\left\|\lim_{n\rightarrow\infty}\frac{1}{n}H_{\chi}(G^{\wedge n}[% \mathcal{S}_{n,\varepsilon}])-\lim_{n\rightarrow\infty}\frac{1}{n}H_{\chi}% \left(\textstyle\bigwedge_{a^{\prime}\in\mathcal{A}}G_{a^{\prime}}^{\wedge nT_% {\overline{a}^{n}}(a^{\prime})}\right)\right\|$
	$\displaystyle\leq 2\varepsilon\|\mathcal{A}\|\log\|\mathcal{X}\|.$		(103)

As this holds for all $\varepsilon>0$ , combining (91) and (103) yields the desired result.

A-A2 Proof of Lemma 8

The probabilistic graphs in both sides of (89) have

\displaystyle\left(\textstyle\bigsqcup_{a\in\mathcal{A}}\mathcal{X}_{a}\right)% \times\left(\textstyle\bigsqcup_{b\in\mathcal{B}}\mathcal{X}_{b}\right)=% \textstyle\bigsqcup_{(a,b)\in\mathcal{A}\times\mathcal{B}}\mathcal{X}_{a}% \times\mathcal{X}_{b}

(104)

as set of vertices, with underlying distribution

	$\displaystyle\left(\textstyle\sum_{a\in\mathcal{A}}P_{A}(a)P_{X_{a}}\right)% \left(\textstyle\sum_{b\in\mathcal{B}}P_{B}(b)P_{X_{b}}\right)$
	$\displaystyle=\textstyle\sum_{(a,b)\in\mathcal{A}\times\mathcal{B}}P_{A}(a)P_{% B}(b)P_{X_{a}}P_{X_{b}}.$		(105)

Now let us show that these two graphs have the same edges. Let $(x_{\mathcal{A}},x_{\mathcal{B}}),(x^{\prime}_{\mathcal{A}},x^{\prime}_{% \mathcal{B}})\in\left(\bigsqcup_{a\in\mathcal{A}}\mathcal{X}_{a}\right)\times% \left(\bigsqcup_{b\in\mathcal{B}}\mathcal{X}_{b}\right)$ , let $a_{*},a^{\prime}_{*}\in\mathcal{A}$ and $b_{*},b^{\prime}_{*}\in\mathcal{B}$ be the unique indexes such that

\displaystyle(x_{\mathcal{A}},x_{\mathcal{B}})\in\mathcal{X}_{a_{*}}\times% \mathcal{X}_{b_{*}}\quad\text{and}\quad(x^{\prime}_{\mathcal{A}},x^{\prime}_{% \mathcal{B}})\in\mathcal{X}_{a^{\prime}_{*}}\times\mathcal{X}_{b^{\prime}_{*}}.

(106)

We have:

	$\displaystyle\!\!\!(x_{\mathcal{A}},x_{\mathcal{B}}),(x^{\prime}_{\mathcal{A}}% ,x^{\prime}_{\mathcal{B}})\text{ are adjacent in }\left(\textstyle\bigsqcup_{a% \in\mathcal{A}}^{P_{A}}G_{a}\right)\wedge\left(\textstyle\bigsqcup_{b\in% \mathcal{B}}^{P_{B}}G_{b}\right)$	(107)
$\displaystyle\Longleftrightarrow$	$\displaystyle\;x_{\mathcal{A}},x^{\prime}_{\mathcal{A}}\text{ adjacent in }% \textstyle\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}\text{ and }x_{\mathcal{B}},% x^{\prime}_{\mathcal{B}}\text{ adjacent in }\textstyle\bigsqcup_{b\in\mathcal{% B}}^{P_{B}}G_{b}$	(108)
$\displaystyle\Longleftrightarrow$	$\displaystyle\;a_{}=a^{\prime}_{}\text{ and }x_{\mathcal{A}}x^{\prime}_{% \mathcal{A}}\in\mathcal{E}_{a_{}}\text{ and }b_{}=b^{\prime}_{}\text{ and }% x_{\mathcal{B}}x^{\prime}_{\mathcal{B}}\in\mathcal{E}_{b_{}}$	(109)
$\displaystyle\Longleftrightarrow$	$\displaystyle\;(a_{},b_{})=(a^{\prime}_{},b^{\prime}_{})\text{ and }(x_{% \mathcal{A}},x_{\mathcal{B}}),(x^{\prime}_{\mathcal{A}},x^{\prime}_{\mathcal{B% }})\text{ are adjacent in }G_{a_{}}\wedge G_{b_{}}$	(110)
$\displaystyle\Longleftrightarrow$	$\displaystyle\;(x_{\mathcal{A}},x_{\mathcal{B}}),(x^{\prime}_{\mathcal{A}},x^{% \prime}_{\mathcal{B}})\text{ are adjacent in }\textstyle\bigsqcup_{(a,b)\in% \mathcal{A}\times\mathcal{B}}^{P_{A}P_{B}}G_{a}\wedge G_{b}.$	(111)

A-A3 Proof of Lemma 9

In order to prove Lemma 9, we need Lemma 11. In Lemma 11 we give upper and lower bounds on the chromatic entropy of an induced subgraph $G[\mathcal{S}]$ , using the chromatic entropy of the whole graph $G$ and the probability $P_{X}(\mathcal{S})$ . The core idea is that if $P_{X}(\mathcal{S})$ is close to $1$ and $H_{\chi}(G)$ is big, then $H_{\chi}(G[\mathcal{S}])$ is close to $H_{\chi}(G)$ . The proof of Lemma 11 is given in App. A-A5

Lemma 11.

Let $G=(\mathcal{X},\mathcal{E},P_{X})$ and $\mathcal{S}\subseteq\mathcal{X}$ , then

\displaystyle H_{\chi}(G)-1-(1-P_{X}(\mathcal{S}))\log|\mathcal{X}|\leq H_{% \chi}(G[\mathcal{S}])\leq\frac{H_{\chi}(G)}{P_{X}(\mathcal{S})}.

(112)

Remark 5.

$H_{\chi}(G[\mathcal{S}])$ can be greater than $H_{\chi}(G)$ , even if $G[\mathcal{S}]$ has less vertices and inherits the structure of $G$ . This stems from the normalized distribution $P_{X}/P_{X}(\mathcal{S})$ on the vertices of $G[\mathcal{S}]$ which gives more weight to the vertices in $\mathcal{S}$ . For example, consider

\displaystyle G=\big{(}N_{5},\operatorname*{Unif}(\{1,...,5\})\big{)}\overset{% (1-\varepsilon,\varepsilon)}{\sqcup}\big{(}K_{5},\operatorname*{Unif}(\{1,...,% 5\})\big{)};

where $K_{n}$ (resp. $N_{n}$ ) is the complete (resp. empty) graph with $n$ vertices, i.e. there is an edge (resp. no edge) between any pair of distinct vertices, and with $\mathcal{S}$ being the vertices in the connected component $K_{5}$ in $G$ . Then $H_{\chi}(G)=\varepsilon\log 5$ and $H_{\chi}(G[\mathcal{S}])=\log 5$ .

Now let us prove Lemma 9. By Lemma 11, we have for all $n\in\mathbb{N}^{\star}$ :

		$\displaystyle H_{\chi}(G^{\wedge n})-1-(1-P^{n}_{X}(\mathcal{S}_{n}))\log\|% \mathcal{X}\|$
	$\displaystyle\leq\>$	$\displaystyle H_{\chi}(G^{\wedge n}[\mathcal{S}^{n}])\leq\frac{H_{\chi}(G^{% \wedge n})}{P^{n}_{X}(\mathcal{S}_{n})}.$		(113)

Since $P^{n}_{X}(\mathcal{S}_{n})\rightarrow 1$ , and $H_{\chi}(G^{\wedge n})=n\overline{H}(G)+o(n)$ when $n\rightarrow\infty$ , the desired results follows immediately by normalization and limit.

A-A4 Proof of Lemma 10

Let $(\tilde{G}_{i})_{i\leq N}$ be isomorphic probabilistic graphs and $G$ such that $G=\bigsqcup_{i}\tilde{G}_{i}$ . Let $c_{1}^{\star}:\mathcal{X}_{1}\rightarrow\mathcal{C}$ be the coloring of $\tilde{G}_{1}$ with minimal entropy, and let $c^{\star}$ be the coloring of $G$ defined by

	$\displaystyle c^{\star}:\>$	$\displaystyle\mathcal{X}\rightarrow\mathcal{C}$		(114)
		$\displaystyle v\mapsto c_{1}^{\star}\circ\psi_{i_{x}\rightarrow 1}(x),$		(115)

where $i_{x}$ is the unique integer such that $v\in\mathcal{X}_{i_{x}}$ , and $\psi_{i_{x}\rightarrow 1}:\mathcal{X}_{i_{x}}\rightarrow\mathcal{X}_{1}$ is an isomorphism between $\tilde{G}_{i_{x}}$ and $\tilde{G}_{1}$ . In other words $c^{\star}$ applies the same coloring pattern $c^{\star}_{1}$ on each connected component of $G$ . We have

$\displaystyle H_{\chi}(G)$	$\displaystyle\leq H(c^{\star}(X))$	(116)
	$\displaystyle=h\Big{(}\textstyle\sum_{j\leq N}P_{i_{X}}(j)P_{c^{\star}(X_{j})}% \Big{)}$	(117)
	$\displaystyle=h\Big{(}\textstyle\sum_{j\leq N}P_{i_{X}}(j)P_{c_{1}^{\star}(X_{% 1})}\Big{)}$	(118)
	$\displaystyle=H(c^{\star}_{1}(X_{1}))$	(119)
	$\displaystyle=H_{\chi}(\tilde{G}_{1}),$	(120)

where $h$ denotes the entropy of a distribution; (118) comes from the definition of $c^{\star}$ ; and (120) comes from the definition of $c_{1}^{\star}$ .

Now let us prove the upper bound on $H_{\chi}(\tilde{G}_{1})$ . Let $c$ be a coloring of $G$ , and let $i^{\star}\doteq\operatorname*{\arg\!\min}_{i}H(c(X_{i}))$ (i.e. $i^{\star}$ is the index of the connected component for which the entropy of the coloring induced by $c$ is minimal). We have

$\displaystyle H(c(X))$	$\displaystyle=h\Big{(}\textstyle\sum_{j\leq N}P_{i_{X}}(j)P_{c(X_{j})}\Big{)}$	(121)
	$\displaystyle\geq\textstyle\sum_{j\leq N}P_{i_{X}}(j)h(P_{c(X_{j})})$	(122)
	$\displaystyle\geq\textstyle\sum_{j\leq N}P_{i_{X}}(j)H(c(X_{i^{\star}}))$	(123)
	$\displaystyle\geq H_{\chi}(\tilde{G}_{i^{\star}}),$	(124)
	$\displaystyle=H_{\chi}(\tilde{G}_{1}),$	(125)

where (122) follows from the concavity of $h$ ; (123) follows from the definition of $i^{\star}$ ; (124) comes from the fact that $c$ induces a coloring of $\tilde{G}_{i^{\star}}$ ; (125) comes from the fact that $\tilde{G}_{1}$ and $\tilde{G}_{i^{\star}}$ are isomorphic. Now, we can combine the bounds (120) and (125): for all coloring $c$ of $G$ we have

\displaystyle H_{\chi}(G)\leq H_{\chi}(\tilde{G}_{1})\leq H(c(X)),

(126)

which yields the desired equality when taking the infimum over $c$ .

A-A5 Proof of Lemma 11

Let $c^{\star}:\mathcal{X}\rightarrow\mathcal{C}$ and $c^{\star}_{\mathcal{S}}:\mathcal{S}\rightarrow\mathcal{C}$ be the optimal colorings of $G$ and $G[\mathcal{S}]$ , respectively. Consider the coloring $c:\mathcal{X}\rightarrow\mathcal{C}\sqcup\mathcal{X}$ of $G$ defined by $c(x)=c^{\star}_{\mathcal{S}}$ if $v\in\mathcal{S}$ , $c(x)=v$ otherwise.

(Lower bound) On one hand, we have

$\displaystyle H_{\chi}(G)\leq$	$\displaystyle\>H(c(X),\mathds{1}_{X\in\mathcal{S}})$	(127)
$\displaystyle=$	$\displaystyle\>H(\mathds{1}_{X\in\mathcal{S}})+P_{X}(\mathcal{S})H(c(X)\|X\in% \mathcal{S})$
	$\displaystyle+(1-P_{X}(\mathcal{S}))H(c(X)\|X\notin\mathcal{S})$	(128)
$\displaystyle\leq$	$\displaystyle\>1+H(c^{\star}_{\mathcal{S}}(X)\|X\in\mathcal{S})+(1-P_{X}(% \mathcal{S}))\log\|\mathcal{X}\|$	(129)
$\displaystyle=$	$\displaystyle\>H_{\chi}(G[\mathcal{S}])+1+(1-P_{X}(\mathcal{S}))\log\|\mathcal{% X}\|;$	(130)

where (127) comes from the fact that $c$ is a coloring of $G$ ; (128) is a decomposition using conditional entropies; (129) comes from the construction of $c$ : $c|_{\mathcal{S}}=c^{\star}_{\mathcal{S}}$ ; (130) follows from the optimality of $c^{\star}_{\mathcal{S}}$ as a coloring of $G[\mathcal{S}]$ .

(Upper bound) On the other hand,

$\displaystyle H_{\chi}(G[\mathcal{S}])\leq$	$\displaystyle\>H(c^{\star}(X)\|V\in\mathcal{S})$	(131)
$\displaystyle=$	$\displaystyle\,\frac{1}{P_{X}(\mathcal{S})}\Big{(}\!H(c^{\star}(X)\|\mathds{1}_% {X\in\mathcal{S}})-(1-P_{X}(\mathcal{S}))H(c^{\star}(X)\|V\notin\mathcal{S})\!% \Big{)}$	(132)
$\displaystyle\leq$	$\displaystyle\>\frac{H(c^{\star}(X))}{P_{X}(\mathcal{S})}=\frac{H_{\chi}(G)}{P% _{X}(\mathcal{S})}$	(133)

where (131) comes from the fact that $c^{\star}$ induces a coloring of $G[\mathcal{S}]$ ; (132) is a decomposition using conditional entropies; (133) results from the elimination of negative terms and the optimality of $c^{\star}$ .

A-B Proof of Lemma 2

In order to prove Lemma 2 we need Lemma 7, which can be found in App. A-A1; and Lemma 12, which is a generalization for infinite sequences of the following observation: if $T_{\overline{a}^{n}}=P_{A}\in\Delta_{n}(\mathcal{A})$ satisfies $P_{A}=\frac{i}{n}P^{\prime}_{A}+\frac{n-i}{n}P^{\prime\prime}_{A}$ with $P^{\prime}_{A}\in\Delta_{i}(\mathcal{A})$ and $P^{\prime\prime}_{A}\in\Delta_{n-i}(\mathcal{A})$ , then $\overline{a}^{n}$ can be separated into two subsequences $a^{\prime i}$ and $a^{\prime\prime n-i}$ such that $T_{a^{\prime i}}=P^{\prime}_{A}$ and $T_{a^{\prime\prime n-i}}=P^{\prime\prime}_{A}$ .

Lemma 12 (Type-splitting lemma).

Let $(\overline{a}_{n})_{n\in\mathbb{N}^{\star}}\in\mathcal{A}^{\mathbb{N}^{\star}}$ be a sequence such that $T_{\overline{a}^{n}}\rightarrow P_{A}\in\Delta(\mathcal{A})$ when $n\rightarrow\infty$ , let $\beta\in(0,1)$ and $P^{\prime}_{A},P^{\prime\prime}_{A}\in\Delta(\mathcal{A})$ such that

\displaystyle P_{A}=\beta P^{\prime}_{A}+(1-\beta)P^{\prime\prime}_{A}.

(134)

Then there exists a sequence $(b_{n})_{n\in\mathbb{N}^{\star}}\in\{0,1\}^{\mathbb{N}^{\star}}$ such that the two extracted sequences $a^{\prime}\doteq(\overline{a}_{n})_{\begin{subarray}{c}n\in\mathbb{N}^{\star},% \\ b_{n}=0\end{subarray}}$ and $a^{\prime\prime}\doteq(\overline{a}_{n})_{\begin{subarray}{c}n\in\mathbb{N}^{% \star},\\ b_{n}=1\end{subarray}}$ satisfy

	$\displaystyle T_{b^{n}}\underset{n\rightarrow\infty}{\rightarrow}(\beta,1-% \beta),$			(135)
	$\displaystyle T_{a^{\prime n}}\underset{n\rightarrow\infty}{\rightarrow}P^{% \prime}_{A},$	$\displaystyle T_{a^{\prime\prime n}}\underset{n\rightarrow\infty}{\rightarrow}% P^{\prime\prime}_{A}.$		(136)

The proof of Lemma 12 is given in App. A-B1. Now let us prove Lemma 2. We recall the definition of the function

\displaystyle\eta:P_{A}\mapsto\overline{H}\Bigg{(}\bigsqcup_{a\in\mathcal{A}}^% {P_{A}}G_{a}\Bigg{)}.

(137)

( $\eta$ Lipschitz) Let us first prove that the function $\eta$ is Lipschitz. For all $P_{A},P^{\prime}_{A}\in\Delta(\mathcal{A})$ we need to bound the quantity $|\eta(P_{A})-\eta(P^{\prime}_{A})|$ ; by Lemma 7 this is equivalent to bounding

\displaystyle\lim_{n\rightarrow\infty}\frac{1}{n}\left|H_{\chi}\left(% \textstyle\bigwedge_{a\in\mathcal{A}}G_{a}^{\wedge nT_{\overline{a}^{n}}(a)}% \right)-H_{\chi}\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}^{\wedge nT_{% \overline{a}^{\prime n}}(a)}\right)\right|

(138)

where $(T_{\overline{a}^{n}},T_{\overline{a}^{\prime n}})\rightarrow(P_{A},P^{\prime}% _{A})$ when $n\rightarrow\infty$ .

Fix $n\in\mathbb{N}^{\star}$ , we assume that the quantity inside $|\cdot|$ in (138) is positive; the other case can be treated with the same arguments by symmetry of the roles. We have

	$\displaystyle H_{\chi}\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}^{\wedge nT% _{\overline{a}^{n}}(a)}\right)-H_{\chi}\left(\textstyle\bigwedge_{a\in\mathcal% {A}}G_{a}^{\wedge nT_{\overline{a}^{\prime n}}(a)}\right)$	(139)
$\displaystyle\leq$	$\displaystyle\;H_{\chi}\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}^{% \wedge nT_{\overline{a}^{n}}(a)}\right)-H_{\chi}\left(\textstyle\bigwedge_{a% \in\mathcal{A}}G_{a}^{\wedge n\min(T_{\overline{a}^{n}}(a),T_{\overline{a}^{% \prime n}}(a))}\right)$	(140)
$\displaystyle=$	$\displaystyle\;H_{\chi}\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}^{% \wedge n\min(T_{\overline{a}^{n}}(a),T_{\overline{a}^{\prime n}}(a))}% \textstyle\bigwedge_{a\in\mathcal{A}}G_{a}^{\wedge n\|T_{\overline{a}^{n}}(a)-T% _{\overline{a}^{\prime n}}(a)\|_{+}}\right)$
	$\displaystyle-H_{\chi}\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}^{\wedge n% \min(T_{\overline{a}^{n}}(a),T_{\overline{a}^{\prime n}}(a))}\right)$	(141)
$\displaystyle\leq$	$\displaystyle\;H_{\chi}\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}^{% \wedge n\|T_{\overline{a}^{n}}(a)-T_{\overline{a}^{\prime n}}(a)\|_{+}}\right)$	(142)
$\displaystyle\leq$	$\displaystyle\;\log\left(\max_{a}\|\mathcal{X}_{a}\|\right)\textstyle\sum_{a\in% \mathcal{A}}n\|T_{\overline{a}^{n}}(a)-T_{\overline{a}^{\prime n}}(a)\|_{+}$	(143)
$\displaystyle\leq$	$\displaystyle\;n\log\left(\max_{a}\|\mathcal{X}_{a}\|\right)\\|T_{\overline{a}^{n% }}-T_{\overline{a}^{\prime n}}\\|_{1},$	(144)

where $|\cdot|_{+}=\max(\cdot,0)$ and $\|T_{\overline{a}^{n}}-T_{\overline{a}^{\prime n}}\|_{1}=\sum_{a\in\mathcal{A}% }|T_{\overline{a}^{n}}(a)-T_{\overline{a}^{\prime n}}(a)|$ ; (140) follows from the removal of terms in the second product, as $H_{\chi}(G\wedge G^{\prime})\geq H_{\chi}(G)$ for all probabilistic graphs $G,G^{\prime}$ ; (141) is an arrangement of the terms in the first product, as $\min(s,t)+\max(s-t,0)=s$ for all real numbers $s,t$ ; (142) comes from the subadditivity of $H_{\chi}$ ; (143) follows from $H_{\chi}(G_{a})\leq\log\max_{a^{\prime}}|\mathcal{X}_{a^{\prime}}|$ for all $a\in\mathcal{A}$ ; (144) results from $|T_{\overline{a}^{n}}(a)-T_{\overline{a}^{\prime n}}(a)|_{+}\leq|T_{\overline{% a}^{n}}(a)-T_{\overline{a}^{\prime n}}(a)|$ for all $a\in\mathcal{A}$ .

By normalization and limit, it follows that

	$\displaystyle\|\eta(P_{A})-\eta(P^{\prime}_{A})\|$	$\displaystyle\leq\lim_{n\rightarrow\infty}\log\left(\max_{a}\|\mathcal{X}_{a}\|% \right)\cdot\\|T_{\overline{a}^{n}}-T_{\overline{a}^{\prime n}}\\|_{1}$		(145)
		$\displaystyle=\log\left(\max_{a}\|\mathcal{X}_{a}\|\right)\cdot\\|P_{A}-P^{\prime% }_{A}\\|_{1}.$		(146)

Hence $\eta$ is $(\log\max_{a}|\mathcal{X}_{a}|)$ -Lipschitz.

( $\eta$ convex) Let us now prove that $\eta$ is convex. Let $P^{\prime}_{A},P^{\prime\prime}_{A}\in\Delta(\mathcal{A})$ , and $\beta\in(0,1)$ , we have by Lemma 7

\displaystyle\eta\big{(}\beta P^{\prime}_{A}+(1-\beta)P^{\prime\prime}_{A}\big% {)}=\lim_{n\rightarrow\infty}\frac{1}{n}H_{\chi}\left(\textstyle\bigwedge_{a% \in\mathcal{A}}G_{a}^{\wedge nT_{\overline{a}^{n}}(a)}\right),

(147)

where $T_{\overline{a}^{n}}\rightarrow\beta P^{\prime}_{A}+(1-\beta)P^{\prime\prime}_% {A}$ when $n\rightarrow\infty$ . By Lemma 12, there exists $(b_{n})_{n\in\mathbb{N}^{\star}}\in\{0,1\}^{\mathbb{N}^{\star}}$ such that the decomposition of $(\overline{a}_{n})_{n\in\mathbb{N}^{\star}}$ into two subsequences $a^{\prime}\doteq(\overline{a}_{n})_{\begin{subarray}{c}n\in\mathbb{N}^{\star},% \\ b_{n}=0\end{subarray}}$ and $a^{\prime\prime}\doteq(\overline{a}_{n})_{\begin{subarray}{c}n\in\mathbb{N}^{% \star},\\ b_{n}=1\end{subarray}}$ satisfies

	$\displaystyle T_{b^{n}}\underset{n\rightarrow\infty}{\rightarrow}(\beta,1-% \beta),$			(148)
	$\displaystyle T_{a^{\prime n}}\underset{n\rightarrow\infty}{\rightarrow}P^{% \prime}_{A},$	$\displaystyle T_{a^{\prime\prime n}}\underset{n\rightarrow\infty}{\rightarrow}% P^{\prime\prime}_{A}.$		(149)

For all $n\in\mathbb{N}^{\star}$ , let $\Xi(n)\doteq nT_{b^{n}}(0)$ , we have

	$\displaystyle\eta\big{(}\beta P^{\prime}_{A}+(1-\beta)P^{\prime\prime}_{A}\big% {)}$	(150)
$\displaystyle=$	$\displaystyle\;\lim_{n\rightarrow\infty}\frac{1}{n}H_{\chi}\left(\textstyle% \bigwedge_{a\in\mathcal{A}}G_{a}^{\wedge\Xi(n)T_{a^{\prime\Xi(n)}}(a)+(n-\Xi(n% ))T_{a^{\prime\prime n-\Xi(n)}}(a)}\right)$	(151)
$\displaystyle\leq$	$\displaystyle\;\lim_{n\rightarrow\infty}\frac{\Xi(n)}{n}\frac{1}{\Xi(n)}H_{% \chi}\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}^{\wedge\Xi(n)T_{a^{% \prime\Xi(n)}}(a)}\right)$	(152)
	$\displaystyle+\frac{n-\Xi(n)}{n}\frac{1}{n-\Xi(n)}H_{\chi}\left(\textstyle% \bigwedge_{a\in\mathcal{A}}G_{a}^{\wedge(n-\Xi(n))T_{a^{\prime\prime n-\Xi(n)}% }(a)}\right)$	(153)
$\displaystyle=$	$\displaystyle\;\beta\eta(P^{\prime}_{A})+(1-\beta)\eta(P^{\prime\prime}_{A});$	(154)

where (151) comes from (147); (153) follows from the subadditivity of $H_{\chi}$ ; (154) comes from (148), (149) and Lemma 7. Since (154) holds for all $P^{\prime}_{A},P^{\prime\prime}_{A}\in\Delta(\mathcal{A})$ and $\beta\in(0,1)$ , we have that $\eta$ is convex.

A-B1 Proof of Lemma 12

Let $(\overline{a}_{n})_{n\in\mathbb{N}^{\star}}\in\mathcal{A}^{\mathbb{N}^{\star}}$ be a sequence such that $T_{\overline{a}^{n}}\rightarrow P_{A}=\beta P^{\prime}_{A}+(1-\beta)P^{\prime% \prime}_{A}$ when $n\rightarrow\infty$ .

Consider a sequence $(B_{n})_{n\in\mathbb{N}^{\star}}$ of independent Bernoulli random variables such that for all $n\in\mathbb{N}^{\star}$ ,

\displaystyle\mathbb{P}(B_{n}=0)=\frac{\beta P^{\prime}_{A}(\overline{a}_{n})}% {P_{A}(\overline{a}_{n})}.

(155)

By the strong law of large numbers,

\displaystyle\mathbb{P}\left(T_{B^{n},\overline{a}^{n}}\underset{n\rightarrow% \infty}{\rightarrow}(\beta P^{\prime}_{A},(1-\beta)P^{\prime\prime}_{A})\right% )=1.

(156)

Therefore, there exists at least one realization $(b_{n})_{n\in\mathbb{N}^{\star}}$ of $(B_{n})_{n\in\mathbb{N}^{\star}}$ such that $T_{b^{n},\overline{a}^{n}}$ converges to $\big{(}\beta P^{\prime}_{A},(1-\beta)P^{\prime\prime}_{A}\big{)}$ . The convergences of marginal and conditional types yield

	$\displaystyle T_{b^{n}}\underset{n\rightarrow\infty}{\rightarrow}(\beta,1-% \beta),$		(157)
	$\displaystyle T_{a^{\prime n}}\underset{n\rightarrow\infty}{\rightarrow}P^{% \prime}_{A},\qquad T_{a^{\prime\prime n}}\underset{n\rightarrow\infty}{% \rightarrow}P^{\prime\prime}_{A},$		(158)

where $a^{\prime}\doteq(\overline{a}_{n})_{\begin{subarray}{c}n\in\mathbb{N}^{\star},% \\ b_{n}=0\end{subarray}}$ and $a^{\prime\prime}\doteq(\overline{a}_{n})_{\begin{subarray}{c}n\in\mathbb{N}^{% \star},\\ b_{n}=1\end{subarray}}$ are the extracted sequences.

A-C Proof of Lemma 6

It can be easily observed that

		$\displaystyle\exists P_{A}\in\operatorname*{int}(\Delta(\mathcal{A})),\,\gamma% (P_{A})=\textstyle\sum_{a\in\mathcal{A}}P_{A}(a)\gamma(\mathds{1}_{a})$		(159)
	$\displaystyle\Longleftarrow\;\;$	$\displaystyle\forall P_{A}\in\Delta(\mathcal{A}),\,\gamma(P_{A})=\textstyle% \sum_{a\in\mathcal{A}}P_{A}(a)\gamma(\mathds{1}_{a}).$		(160)

Now let us prove (159) $\Rightarrow$ (160). Let $P^{\star}_{A}\in\operatorname*{int}\Delta(\mathcal{A})$ such that $\gamma(P^{\star}_{A})=\sum_{a\in\mathcal{A}}P^{\star}_{A}(a)\gamma(\mathds{1}_% {a})$ . Let $m:\Delta(\mathcal{A})\rightarrow\mathbb{R}$ linear such that $m(P^{\star}_{A})=\gamma(P^{\star}_{A})$ and $\forall P_{A}\in\Delta(\mathcal{A}),\,m(P_{A})\leq\gamma(P_{A})$ . We have

\displaystyle 0=\gamma(P^{\star}_{A})-m(P^{\star}_{A})=\textstyle\sum_{a\in% \mathcal{A}}P^{\star}_{A}(a)\big{(}\gamma(\mathds{1}_{a})-m(\mathds{1}_{a})% \big{)};

(161)

and therefore $\gamma(\mathds{1}_{a})=m(\mathds{1}_{a})$ for all $a\in\mathcal{A}$ , as $\gamma-m\geq 0$ and $P^{\star}_{A}(a)>0$ for all $a\in\mathcal{A}$ . For all $P_{A}\in\Delta(\mathcal{A})$ , we have

	$\displaystyle f(P_{A})$	$\displaystyle\leq\textstyle\sum_{a\in\mathcal{A}}P_{A}(a)\gamma(\mathds{1}_{a})$		(162)
		$\displaystyle=\textstyle\sum_{a\in\mathcal{A}}P_{A}(a)m(\mathds{1}_{a})=m(P_{A% }),$		(163)

hence $\gamma=m$ and $\gamma$ is linear.

Appendix B Proof of Proposition 5

In order to prove Proposition 5 we need Lemma 13, which is a consequence of Marton’s formula in Theorem 9 applied to a disjoint union. The proof of Lemma 13 can be found in App. B-A.

Lemma 13.

Let $P_{A}\in\Delta(\mathcal{A})$ , then

\displaystyle\overline{H}\left(\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_{a}\right)% +C\left(\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a},\;\sum_{a\in\mathcal{A}}P_{A}% (a)P_{X_{a}}\right)=H(P_{A})+\sum_{a\in\mathcal{A}}P_{A}(a)H(P_{X_{a}}).

(164)

Let us prove Proposition 5. We have on one hand:

	$\displaystyle H(P_{A})+\textstyle\sum_{a\in\mathcal{A}}P_{A}(a)C(G_{a},P_{X_{a% }})$		(165)
	$\displaystyle=H(P_{A})-\textstyle\sum_{a\in\mathcal{A}}P_{A}(a)\overline{H}(G_% {a})+\textstyle\sum_{a\in\mathcal{A}}P_{A}(a)H(P_{X_{a}})$		(166)
	$\displaystyle\leq H(P_{A})-\overline{H}\left(\textstyle\bigsqcup^{P_{A}}_{a\in% \mathcal{A}}G_{a}\right)+\textstyle\sum_{a\in\mathcal{A}}P_{A}(a)H(P_{X_{a}})$		(167)
	$\displaystyle=C\left(\textstyle\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a},\;\sum% _{a\in\mathcal{A}}P_{A}(a)P_{X_{a}}\right);$		(168)

where (166) comes from Theorem 9; (167) follows from (15), see [21, Theorem 2]; and (168) follows from Lemma 13. Therefore,

		$\displaystyle C\left(\textstyle\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a},\;% \textstyle\sum_{a\in\mathcal{A}}P_{A}(a)P_{X_{a}}\right)=H(P_{A})+\textstyle% \sum_{a\in\mathcal{A}}P_{A}(a)C(G_{a},P_{X_{a}})$		(169)
	$\displaystyle\Longleftrightarrow\;$	$\displaystyle\overline{H}\left(\textstyle\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_% {a}\right)=\textstyle\sum_{a\in\mathcal{A}}P_{A}(a)\overline{H}(G_{a}).$		(170)

On the other hand:

$\displaystyle\textstyle\sum_{a\in\mathcal{A}}C(G_{a},P_{X_{a}})$	$\displaystyle=-\textstyle\sum_{a\in\mathcal{A}}\overline{H}(G_{a})+H(P_{X_{a}})$	(171)
	$\displaystyle\leq-\overline{H}\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}% \right)+\textstyle\sum_{a\in\mathcal{A}}H(P_{X_{a}})$	(172)
	$\displaystyle=-\overline{H}\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}% \right)+H\left(\textstyle\bigotimes_{a\in\mathcal{A}}P_{X_{a}}\right)$	(173)
	$\displaystyle=C\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a},\;\textstyle% \bigotimes_{a\in\mathcal{A}}P_{X_{a}}\right);$	(174)

where (171) comes from Theorem 9; (172) follows from (14), see [21, Theorem 2]; and (174) also follows from Theorem 9. Therefore,

		$\displaystyle\textstyle\sum_{a\in\mathcal{A}}C(G_{a},P_{X_{a}})=C\left(% \textstyle\bigwedge_{a\in\mathcal{A}}G_{a},\;\bigotimes_{a\in\mathcal{A}}P_{X_% {a}}\right)$		(175)
	$\displaystyle\Longleftrightarrow\;$	$\displaystyle\overline{H}\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}% \right)=\textstyle\sum_{a\in\mathcal{A}}\overline{H}(G_{a}).$		(176)

B-A Proof of Lemma 13

The probabilistic graph $\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_{a}$ has $\sum_{a\in\mathcal{A}}P_{A}(a)P_{X_{a}}$ as underlying distribution. Let $A,V$ be two random variables such that $A$ is drawn with $P_{A}$ , and $V$ is drawn with $P_{X|A}(\cdot|a)\doteq P_{X_{a}}$ , so that

\displaystyle P_{X}=\textstyle\sum_{a\in\mathcal{A}}P_{A}(a)P_{X_{a}}.

(177)

We have

	$\displaystyle\overline{H}\left(\textstyle\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_% {a}\right)+C\left(\textstyle\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a},% \textstyle\sum_{a\in\mathcal{A}}P_{A}(a)P_{X_{a}}\right)$		(178)
	$\displaystyle=H(X)$		(179)
	$\displaystyle=H(A,X)$		(180)
	$\displaystyle=H(A)+H(X\|A)$		(181)
	$\displaystyle=H(P_{A})+\textstyle\sum_{a\in\mathcal{A}}P_{A}(a)H(P_{X_{a}});$		(182)

where (179) comes from Theorem 9 and (177); and (180) comes from the fact that $A$ can be written as a function of $V$ : by definition, the vertex set of $\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_{a}$ writes $\mathcal{X}=\bigsqcup_{a\in\mathcal{A}}\mathcal{X}_{a}$ and $\operatorname*{supp}P_{X_{a}}\subseteq\mathcal{X}_{a}$ , therefore $A$ is the unique index such that $V\in\mathcal{X}_{A}$ .

Appendix C Proof of Theorem 10

$(\leq)$ By definition of $C_{0}$ and $C(G,P_{X})$ we have

$\displaystyle\sup_{P_{X}\in\Delta(\mathcal{X})}C(G,P_{X})$	$\displaystyle=\sup_{P_{X}\in\Delta(\mathcal{X})}\lim_{\varepsilon\rightarrow 0% }\limsup_{n\rightarrow\infty}\frac{1}{n}\log\alpha\big{(}G^{\wedge n}[\mathcal% {T}^{n}_{\varepsilon}(P_{X})]\big{)}$	(183)
	$\displaystyle\leq\sup_{P_{X}\in\Delta(\mathcal{X})}\lim_{\varepsilon% \rightarrow 0}\limsup_{n\rightarrow\infty}\frac{1}{n}\log\alpha\big{(}G^{% \wedge n}\big{)}$	(184)
	$\displaystyle=C_{0}(G).$	(185)

$(\geq)$ Let $(\mathcal{C}_{n})_{n\in\mathbb{N}^{\star}}$ be a sequence such that for all $n\in\mathbb{N}^{\star}$ , $\mathcal{C}_{n}$ is an independent set in $G^{\wedge n}$ , and

\displaystyle\lim_{n\rightarrow\infty}\frac{1}{n}\log|\mathcal{C}_{n}|=C_{0}(G);

(186)

the existence of the sequence $(\mathcal{C}_{n})_{n\in\mathbb{N}^{\star}}$ follows from the definition of $C_{0}$ .

Let $(\tau_{n})_{n\in\mathbb{N}^{\star}}$ be the sequence defined by: for all $n\in\mathbb{N}^{\star}$ ,

\displaystyle\tau_{n}\doteq\frac{1}{|\mathcal{C}_{n}|}\sum_{x^{n}\in\mathcal{C% }_{n}}T_{x^{n}}.

(187)

The terms of the sequence $(\tau_{n})_{n\in\mathbb{N}^{\star}}$ are in $\Delta(\mathcal{X})$ , which is a compact set. Therefore, by Bolzano-Weierstrass theorem, $(\tau_{n})_{n\in\mathbb{N}^{\star}}$ has a convergent subsequence $(\tau_{\phi(n)})_{n\in\mathbb{N}^{\star}}$ , where $\phi:\mathbb{N}^{\star}\rightarrow\mathbb{N}^{\star}$ is strictly increasing. We denote by $(\mathcal{C}_{\phi(n)})_{n\in\mathbb{N}^{\star}}$ the corresponding subsequence of independent sets, and

\displaystyle P^{\star}_{X}\doteq\lim_{n\rightarrow\infty}\tau_{\phi(n)}\in% \Delta(\mathcal{X}).

(188)

By construction, we also have

\displaystyle\lim_{n\rightarrow\infty}\frac{\log|\mathcal{C}_{\phi(n)}|}{\phi(% n)}=C_{0}(G).

(189)

Let us build an adequate sequence of codebooks with type converging uniformly to $P^{\star}_{X}$ , and with asymptotic rate $C_{0}(G)$ . For all $n\in\mathbb{N}^{\star}$ , let

\displaystyle\mathcal{C}^{\star}_{n\phi(n)}\doteq(\mathcal{C}_{\phi(n)})^{n}% \cap\mathcal{T}_{\varepsilon_{n}}^{n\phi(n)}(P^{\star}_{X}),

(190)

where $\varepsilon_{n}\doteq\|P^{\star}_{X}-\tau_{\phi(n)}\|_{\infty}+\frac{1}{\sqrt[% 4]{n}}$ .

It can be easily observed that $\varepsilon_{n}\underset{n\rightarrow\infty}{\rightarrow}0$ and $\mathcal{C}^{\star}_{n\phi(n)}\subseteq\mathcal{T}_{\varepsilon_{n}}^{n\phi(n)% }(P^{\star}_{X})$ : by construction we have

\displaystyle\max_{x^{n\phi(n)}\in\mathcal{C}^{\star}_{n\phi(n)}}\|T_{x^{n\phi% (n)}}-P^{\star}_{X}\|_{\infty}\underset{n\rightarrow\infty}{\rightarrow}0.

(191)

Furthermore, for all $n\in\mathbb{N}^{\star}$ , $\mathcal{C}^{\star}_{n\phi(n)}$ is independent in $G^{\wedge n\phi(n)}$ , as it is contained in the independent set $(\mathcal{C}_{\phi(n)})^{n}$ .

Now let us prove that $\frac{\log|\mathcal{C}^{\star}_{n\phi(n)}|}{n\phi(n)}\rightarrow C_{0}(G)$ when $n\rightarrow\infty$ . Let us draw a codeword

\displaystyle C^{n\phi(n)}=(C^{\phi(n)}_{1},...,C^{\phi(n)}_{n})

(192)

uniformly from $(\mathcal{C}_{\phi_{n}})^{n}$ , and show that it is in $\mathcal{T}_{\varepsilon_{n}}^{n\phi(n)}(P^{\star}_{X})$ with high probability. On one hand, for all $t\leq n$ , the average type of $C^{\phi(n)}_{t}$ writes

\displaystyle\mathbb{E}\left[T_{C^{\phi(n)}_{t}}\right]=\frac{1}{|\mathcal{C}_% {\phi(n)}|}\sum_{c^{\phi(n)}\in\mathcal{C}_{\phi(n)}}T_{c^{\phi(n)}}=\tau_{% \phi(n)}.

(193)

On the other hand,

$\displaystyle\frac{\|\mathcal{C}_{n\phi(n)}^{\star}\|}{\|(\mathcal{C}_{\phi(n)})^% {n}\|}$	$\displaystyle=\frac{\|(\mathcal{C}_{\phi(n)})^{n}\cap\mathcal{T}_{\varepsilon_{% n}}^{n\phi(n)}(P^{\star}_{X})\|}{\|(\mathcal{C}_{\phi(n)})^{n}\|}$	(194)
	$\displaystyle=\mathbb{P}\big{(}C^{n\phi(n)}\in\mathcal{T}_{\varepsilon_{n}}^{n% \phi(n)}(P^{\star}_{X})\big{)}$	(195)
	$\displaystyle=\mathbb{P}\left(\\|T_{C^{n\phi(n)}}-P^{\star}_{X}\\|_{\infty}\leq% \varepsilon_{n}\right)$	(196)
	$\displaystyle\geq\mathbb{P}\left(\\|T_{C^{n\phi(n)}}-\tau_{\phi(n)}\\|_{\infty}+% \\|\tau_{\phi(n)}-P^{\star}_{X}\\|_{\infty}\leq\varepsilon_{n}\right)$	(197)
	$\displaystyle=\mathbb{P}\left(\\|T_{C^{n\phi(n)}}-\tau_{\phi(n)}\\|_{\infty}\leq n% ^{-1/4}\right)$	(198)
	$\displaystyle=\mathbb{P}\left(\left\\|\textstyle\frac{1}{n}\sum_{t\leq n}T_{C_{% t}^{\phi(n)}}-\tau_{\phi(n)}\right\\|_{\infty}\leq n^{-1/4}\right)$	(199)
	$\displaystyle=1-\mathbb{P}\left(\left\\|\left(\textstyle\sum_{t\leq n}T_{C_{t}^% {\phi(n)}}\right)-n\tau_{\phi(n)}\right\\|_{\infty}>n^{3/4}\right)$	(200)
	$\displaystyle\geq 1-\textstyle\sum_{x\in\mathcal{X}}\mathbb{P}\left(\left\|\sum% _{t\leq n}T_{C_{t}^{\phi(n)}}(x)-n\tau_{\phi(n)}(x)\right\|>n^{3/4}\right)$	(201)
	$\displaystyle\geq\textstyle 1-\sum_{x\in\mathcal{X}}\frac{1}{n^{3/2}}\mathbb{V% }\left[\textstyle\sum_{t\leq n}T_{C_{t}^{\phi(n)}}(x)\right]$	(202)
	$\displaystyle\geq\textstyle 1-\frac{\|\mathcal{X}\|}{n^{1/2}}\underset{n% \rightarrow\infty}{\rightarrow}1;$	(203)

where (195) and (199) come from the construction of $C^{n\phi(n)}$ ; (198) comes from the construction of $\varepsilon_{n}$ ; (201) follows from the union bound; (202) comes from Chebyshev’s inequality and (193); (203) follows from $\mathbb{V}\left[\textstyle\sum_{t\leq n}T_{C_{t}^{\phi(n)}}(x)\right]$ $=\sum_{t\leq n}\mathbb{V}\left[T_{C_{t}^{\phi(n)}}(x)\right]\leq n$ , as the random variables $T_{C_{t}^{\phi(n)}}(x)$ are iid and takes values in $[0,1]$ . Hence

\displaystyle\lim_{n\rightarrow\infty}\frac{\log|\mathcal{C}^{\star}_{n\phi(n)% }|}{n\phi(n)}=\lim_{n\rightarrow\infty}\frac{\log|(\mathcal{C}_{\phi(n)})^{n}|% }{n\phi(n)}=C_{0}(G).

(204)

By combining (191), (204) and Lemma 4, it follows that

\displaystyle C_{0}(G)=\lim_{n\rightarrow\infty}\frac{\log|\mathcal{C}^{\star}% _{n\phi(n)}|}{n\phi(n)}\leq C(G,P^{\star}_{X}).

(205)

In conclusion, we have constructed $P^{\star}_{X}\in\Delta(\mathcal{V})$ such that

\displaystyle C_{0}(G)=C(G,P^{\star}_{X})=\max_{P_{X}}C(G,P_{X}).

(206)

Appendix D Proof of Proposition 6

Let us show that for all graph $G=(\mathcal{X},\mathcal{E})$ , the function $P_{X}\mapsto C(G,P_{X})$ is concave. Let $P_{X},P^{\prime}_{X}\in\Delta(\mathcal{X})$ and $\beta\in[0,1]$ . Let $(b_{n})_{n\in\mathbb{N}}$ be a sequence of integers such that $\frac{b_{n}}{n}\underset{n\rightarrow\infty}{\rightarrow}\beta$ .

By Lemma 4, there exists two sequences $(\mathcal{C}_{n})_{n\in\mathbb{N}}$ and $(\mathcal{C}^{\prime}_{n})_{n\in\mathbb{N}}$ that satisfy the following:

\displaystyle\forall n\in\mathbb{N}^{\star},\;\mathcal{C}_{n}\subseteq\mathcal% {X}^{n}\text{ and }\mathcal{C}^{\prime}_{n}\subseteq\mathcal{X}^{n}\text{ are % independent in }G^{\wedge n};

(207)

and

	$\displaystyle\frac{\log\|\mathcal{C}_{n}\|}{n}\underset{n\rightarrow\infty}{% \rightarrow}C(G,P_{X}),$	$\displaystyle\frac{\log\|\mathcal{C}^{\prime}_{n}\|}{n}\underset{n\rightarrow% \infty}{\rightarrow}C(G,P^{\prime}_{X}),$		(208)
	$\displaystyle\max_{x^{n}\in\mathcal{C}_{n}}\\|T_{x^{n}}-P_{X}\\|_{\infty}% \underset{n\rightarrow\infty}{\rightarrow}0,$	$\displaystyle\max_{x^{n}\in\mathcal{C}^{\prime}_{n}}\\|T_{x^{n}}-P_{X}\\|_{% \infty}\underset{n\rightarrow\infty}{\rightarrow}0.$		(209)

Let us build a sequence of codebooks $(\mathcal{C}^{\prime\prime}_{n})_{n\in\mathbb{N}^{\star}}$ adapted to the distribution $\beta P_{X}+(1-\beta)P^{\prime}_{X}$ by using a time-sharing between $(\mathcal{C}_{n})_{n\in\mathbb{N}^{\star}}$ and $(\mathcal{C}^{\prime}_{n})_{n\in\mathbb{N}^{\star}}$ . For all $n\in\mathbb{N}^{\star}$ , let

\displaystyle\mathcal{C}^{\prime\prime}_{n}\doteq\mathcal{C}_{n}^{b_{n}}\times% \mathcal{C}^{\prime n-b_{n}}_{n}.

(210)

For all $n\in\mathbb{N}^{\star}$ , $\mathcal{C}^{\prime\prime}_{n}\subseteq\mathcal{X}^{n^{2}}$ is independent in $G^{\wedge n^{2}}$ as a product of independent sets.

The rate associated to $\mathcal{C}^{\prime\prime}_{n}$ writes

$\displaystyle\frac{\log\|\mathcal{C}^{\prime\prime}_{n}\|}{n^{2}}$	$\displaystyle=\frac{b_{n}\log\|\mathcal{C}_{n}\|+(n-b_{n})\log\|\mathcal{C}^{% \prime}_{n}\|}{n^{2}}$	(211)
	$\displaystyle=\frac{b_{n}}{n}\frac{\log\|\mathcal{C}_{n}\|}{n}+\frac{n-b_{n}}{n}% \frac{\log\|\mathcal{C}^{\prime}_{n}\|}{n}$	(212)
	$\displaystyle\underset{n\rightarrow\infty}{\rightarrow}\beta C(G,P_{X})+(1-% \beta)C(G,P^{\prime}_{X});$	(213)

and the types of the codewords in $\mathcal{C}^{\prime\prime}_{n}$ satisfy

	$\displaystyle\max_{x^{n^{2}}\in\mathcal{C}^{\prime\prime}_{n}}\left\\|T_{x^{n^{% 2}}}-\beta P_{X}-(1-\beta)P^{\prime}_{X}\right\\|_{\infty}$	(214)
$\displaystyle=\>$	$\displaystyle\max_{x^{nb_{n}}\in\mathcal{C}_{n}}\max_{x^{\prime n(n-b_{n})}\in% \mathcal{C}^{\prime}_{n}}\left\\|\frac{nb_{n}}{n^{2}}T_{x^{nb_{n}}}+\frac{n(n-b% _{n})}{n^{2}}T_{x^{\prime n(n-b_{n})}}-\beta P_{X}-(1-\beta)P^{\prime}_{X}% \right\\|_{\infty}$	(215)
$\displaystyle\leq\>$	$\displaystyle\max_{x^{nb_{n}}\in\mathcal{C}_{n}}\left\\|\frac{b_{n}}{n}T_{x^{nb% _{n}}}-\beta P_{X}\right\\|_{\infty}+\max_{x^{\prime n(n-b_{n})}\in\mathcal{C}^% {\prime}_{n}}\left\\|\frac{n-b_{n}}{n}T_{x^{\prime n(n-b_{n})}}-(1-\beta)P^{% \prime}_{X}\right\\|_{\infty}$	(216)
$\displaystyle=\>$	$\displaystyle\beta\max_{x^{nb_{n}}\in\mathcal{C}_{n}}\left\\|T_{x^{nb_{n}}}-P_{% X}+o(1)T_{x^{nb_{n}}}\right\\|_{\infty}$
	$\displaystyle+(1-\beta)\max_{x^{\prime n(n-b_{n})}\in\mathcal{C}^{\prime}_{n}}% \left\\|T_{x^{\prime n(n-b_{n})}}-P^{\prime}_{X}+o(1)T_{x^{\prime n(n-b_{n})}}% \right\\|_{\infty}$	(217)
$\displaystyle\leq\>$	$\displaystyle\beta\max_{x^{nb_{n}}\in\mathcal{C}_{n}}\left\\|T_{x^{nb_{n}}}-P_{% X}\right\\|_{\infty}+o(1)\left\\|T_{x^{nb_{n}}}\right\\|_{\infty}$
	$\displaystyle+(1-\beta)\max_{x^{\prime n(n-b_{n})}\in\mathcal{C}^{\prime}_{n}}% \left\\|T_{x^{\prime n(n-b_{n})}}-P^{\prime}_{X}\right\\|_{\infty}+o(1)\left\\|T_% {x^{\prime n(n-b_{n})}}\right\\|_{\infty}$	(218)
	$\displaystyle\underset{n\rightarrow\infty}{\rightarrow}0.$	(219)

By Lemma 4, $\lim_{n\rightarrow\infty}\frac{\log|\mathcal{C}^{\prime\prime}_{n}|}{n^{2}}% \leq C(G,\beta P_{X}+(1-\beta)P^{\prime}_{X})$ , thus

\displaystyle\beta C(G,P_{X})+(1-\beta)C(G,P^{\prime}_{X})\leq C(G,\beta P_{X}% +(1-\beta)P^{\prime}_{X}).

(220)

The function $P_{X}\mapsto C(G,P_{X})$ is concave on the convex compact set $\Delta(\mathcal{X})$ , therefore its set of maximizers $\mathcal{P}^{\star}(G)=\operatorname*{\arg\!\max}_{P_{X}\in\Delta(\mathcal{X})% }C(G,P_{X})$ is convex. Furthermore, by Theorem 10, the set $\mathcal{P}^{\star}(G)$ is nonempty and satisfies

\displaystyle\forall P_{X}\in\mathcal{P}^{\star}(G),\;C(G,P_{X})=C_{0}(G).

(221)

Appendix E Proof of Theorem 11

The proof techniques used here are similar as in the proof of Theorem 10 in App. C.

Let us start by showing that Theorem 11 is true when $\mathcal{A}$ has two elements. Let $G=(\mathcal{X},\mathcal{E})$ , and $G^{\prime}=(\mathcal{X}^{\prime},\mathcal{E^{\prime}})$ be two graphs, and let $P_{X,X^{\prime}}\in\mathcal{P}^{\star}(G\wedge G^{\prime})$ . We will prove that $P_{X}\otimes P_{X^{\prime}}$ is also capacity-achieving by building an adequate sequence of codebooks.

For all $n\in\mathbb{N}^{\star}$ , let $\mathcal{C}_{n}\subseteq(\mathcal{X}\times\mathcal{X}^{\prime})^{n}$ such that $\mathcal{C}_{n}$ is an independent set in $(G\wedge G^{\prime})^{\wedge n}$ , and

	$\displaystyle\frac{1}{n}\log\|\mathcal{C}_{n}\|\underset{n\rightarrow\infty}{% \rightarrow}C_{0}(G\wedge G^{\prime}),$		(222)
	$\displaystyle\max_{(x^{n},x^{\prime n})\in\mathcal{C}_{n}}\\|T_{x^{n},x^{\prime n% }}-P_{X,X^{\prime}}\\|_{\infty}\underset{n\rightarrow\infty}{\rightarrow}0.$		(223)

The existence of such a sequence is given by Lemma 4, and Proposition 6. Let

\displaystyle Q^{(n)}_{X,X^{\prime}}\doteq\frac{1}{|\mathcal{C}_{n}|}\sum_{(x^% {n},x^{\prime n})\in\mathcal{C}_{n}}T_{x^{n},x^{\prime n}}.

(224)

An immediate observation is that

\displaystyle Q^{(n)}_{X,X^{\prime}}\underset{n\rightarrow\infty}{\rightarrow}% P_{X,X^{\prime}}

(225)

as a consequence of (223).

Let us build a sequence of codebooks with asymptotic rate $C_{0}(G\wedge G^{\prime})$ , such that the type of their codewords converge uniformly to $P_{X}\otimes P_{X^{\prime}}$ :

\displaystyle\mathcal{C}^{\star}_{n^{3}}\doteq\mathcal{T}^{n^{3}}_{\varepsilon% _{n}}(P_{X}\otimes P_{X^{\prime}})\cap\left(\textstyle\prod_{t\leq n}\mathcal{% C}^{(t)}_{n}\right)^{n};

(226)

where

\displaystyle\varepsilon_{n}\doteq\|Q^{(n)}_{X}\otimes Q^{(n)}_{X^{\prime}}-P_% {X}\otimes P_{X^{\prime}}\|_{\infty}+\textstyle\frac{1}{\sqrt[4]{n}};

(227)

and where for all $t\leq n$ , the shifted codebook $\mathcal{C}_{n}^{(t)}$ is defined by

\displaystyle\mathcal{C}^{(t)}_{n}\doteq\Big{\{}\Big{(}(x_{t},x_{t+1},...,x_{n% },x_{1},...,x_{t-1}),x^{\prime n}\Big{)}\>\Big{|}\>(x^{n},x^{\prime n})\in% \mathcal{C}_{n}\Big{\}}.

(228)

By construction, $\mathcal{C}^{\star}_{n^{3}}\subseteq\mathcal{T}^{n^{3}}_{\varepsilon_{n}}(P_{X% }\otimes P_{X^{\prime}})$ thanks to (226), and $\varepsilon_{n}\underset{n\rightarrow\infty}{\rightarrow}0$ thanks to (227) and (225); therefore we have

\displaystyle\max_{x^{n^{3}}\in\mathcal{C}^{\star}_{n}}\|T_{x^{n^{3}}}-P_{X}% \otimes P_{X^{\prime}}\|_{\infty}\underset{n\rightarrow\infty}{\rightarrow}0.

(229)

Furthermore, $\mathcal{C}^{\star}_{n^{3}}$ is an independent set in $(G\wedge G^{\prime})^{\wedge n^{3}}$ , as it is contained in the product independent set $\left(\textstyle\prod_{t\leq n}\mathcal{C}^{(t)}_{n}\right)^{n}$ ; note that this holds because the shifted codebook $\mathcal{C}^{(t)}_{n}$ is an independent set in $(G\wedge G^{\prime})^{\wedge n}$ for all $t\leq n$ .

Now let us prove that $\frac{\log|\mathcal{C}^{\star}_{n^{3}}|}{n^{3}}\underset{n\rightarrow\infty}{% \rightarrow}C_{0}(G\wedge G^{\prime})$ . Let us draw a codeword uniformly from $\left(\textstyle\prod_{t\leq n}\mathcal{C}^{(t)}_{n}\right)^{n}$ :

\displaystyle C^{n^{3}}\doteq(C^{n^{2}}_{1},...,C^{n^{2}}_{n}),

(230)

where for all $t\leq n$ , $C^{n^{2}}_{t}$ is a random $n\times n$ -sequence drawn uniformly from $\textstyle\prod_{t\leq n}\mathcal{C}^{(t)}_{n}$ . We want to prove that $C^{n^{3}}\in\mathcal{T}^{n^{3}}_{\varepsilon_{n}}(P_{X}\otimes P_{X^{\prime}})$ with high probability.

On one hand we have to determine the average type of the random variables $(C^{n^{2}}_{t})_{t\leq n}$ which are iid copies of $C^{n^{2}}=(C^{n}_{1},...,C^{n}_{n})$ ; where each $C^{n}_{t}$ is drawn uniformly from $\mathcal{C}^{(t)}_{n}$ , and the $(C^{n}_{t})_{t\leq n}$ are mutually independent.

$\displaystyle\mathbb{E}\left[T_{C^{n^{2}}_{t}}\right]$	$\displaystyle=\frac{1}{n}\sum_{t\leq n}\mathbb{E}\left[T_{C^{n}_{t}}\right]$	(231)
	$\displaystyle=\frac{1}{n}\sum_{t\leq n}\frac{1}{\|\mathcal{C}^{(t)}_{n}\|}\sum_{% (x^{n},x^{\prime n})\in\mathcal{C}^{(t)}_{n}}T_{x^{n},x^{\prime n}}$	(232)
	$\displaystyle=\frac{1}{n}\sum_{t\leq n}\frac{1}{\|\mathcal{C}_{n}\|}\sum_{(x^{n}% ,x^{\prime n})\in\mathcal{C}_{n}}T_{\sigma_{t}(x^{n}),x^{\prime n}}$	(233)
	$\displaystyle=\frac{1}{\|\mathcal{C}_{n}\|}\sum_{(x^{n},x^{\prime n})\in\mathcal% {C}_{n}}\frac{1}{n}\sum_{t\leq n}T_{\sigma_{t}(x^{n}),x^{\prime n}}$	(234)
	$\displaystyle=\frac{1}{\|\mathcal{C}_{n}\|}\sum_{(x^{n},x^{\prime n})\in\mathcal% {C}_{n}}T_{x^{n}}\otimes T_{x^{\prime n}}$	(235)
	$\displaystyle=Q^{(n)}_{X}\otimes Q^{(n)}_{X^{\prime}},$	(236)

where $\sigma_{t}(x^{n})=(x_{t},x_{t+1},...,x_{n},x_{1},...,x_{n-1})$ ; (233) comes from the construction of $\mathcal{C}^{(t)}_{n}$ in (228); and (235) comes from the following observation:

	$\displaystyle\sum_{t\leq n}T_{\sigma_{t}(x^{n}),x^{\prime n}}=\sum_{t\leq n}% \sum_{s\leq n}T_{x_{s+t},x^{\prime}_{s}}=\sum_{s\leq n}\sum_{t\leq n}T_{x_{s+t% },x^{\prime}_{s}}$		(237)
	$\displaystyle=\sum_{s\leq n}T_{x^{n},(x^{\prime}_{s},...,x^{\prime}_{s})}=\sum% _{s\leq n}T_{x^{n}}\otimes T_{x^{\prime}_{s}}=T_{x^{n}}\otimes T_{x^{\prime n}},$		(238)

where the index $s+t$ is taken modulo $n$ .

On the other hand we have

	$\displaystyle\frac{\|\mathcal{C}^{\star}_{n^{3}}\|}{\left\|\left(\textstyle\prod_% {t\leq n}\mathcal{C}^{(t)}_{n}\right)^{n}\right\|}$		(239)
	$\displaystyle=\frac{\left\|\mathcal{T}^{n^{3}}_{\varepsilon_{n}}(P_{X}\otimes P% _{X^{\prime}})\cap\left(\textstyle\prod_{t\leq n}\mathcal{C}^{(t)}_{n}\right)^% {n}\right\|}{\left\|\left(\textstyle\prod_{t\leq n}\mathcal{C}^{(t)}_{n}\right)^% {n}\right\|}$		(240)
	$\displaystyle=\mathbb{P}\left(C^{n^{3}}\in\mathcal{T}^{n^{3}}_{\varepsilon_{n}% }(P_{X}\otimes P_{X^{\prime}})\right)$		(241)
	$\displaystyle=\mathbb{P}\left(\left\\|\textstyle\frac{1}{n}\sum_{t\leq n}T_{C_{% t}^{n^{2}}}-P_{X}\otimes P_{X^{\prime}}\right\\|_{\infty}\leq\varepsilon_{n}\right)$		(242)
	$\displaystyle\geq\mathbb{P}\left(\left\\|\textstyle\frac{1}{n}\sum_{t\leq n}T_{% C_{t}^{n^{2}}}-Q^{(n)}_{X}\otimes Q^{(n)}_{X^{\prime}}\right\\|_{\infty}+\left% \\|Q^{(n)}_{X}\otimes Q^{(n)}_{X^{\prime}}-P_{X}\otimes P_{X^{\prime}}\right\\|_% {\infty}\leq\varepsilon_{n}\right)$		(243)
	$\displaystyle=\mathbb{P}\left(\left\\|\textstyle\sum_{t\leq n}T_{C_{t}^{n^{2}}}% -nQ^{(n)}_{X}\otimes Q^{(n)}_{X^{\prime}}\right\\|_{\infty}\leq n^{3/4}\right)$		(244)
	$\displaystyle\geq 1-\textstyle\sum_{(x,x^{\prime})\in\mathcal{X}\times\mathcal% {X}^{\prime}}\mathbb{P}\left(\left\|\textstyle\sum_{t\leq n}T_{C_{t}^{n^{2}}}(x% ,x^{\prime})-nQ^{(n)}_{X}\otimes Q^{(n)}_{X^{\prime}}(x,x^{\prime})\right\|>n^{% 3/4}\right)$		(245)
	$\displaystyle\geq 1-\textstyle\sum_{(x,x^{\prime})\in\mathcal{X}\times\mathcal% {X}^{\prime}}\frac{1}{n^{3/2}}\mathbb{X}\left[\textstyle\sum_{t\leq n}T_{C_{t}% ^{n^{2}}}(x,x^{\prime})\right]$		(246)
	$\displaystyle\geq 1-\textstyle\frac{\|\mathcal{X}\|\|\mathcal{X}^{\prime}\|}{n^{1/% 2}}\underset{n\rightarrow\infty}{\rightarrow}1;$		(247)

where (241) and (242) come from the construction of $C^{n^{3}}$ ; (244) comes from the construction of $\varepsilon_{n}$ ; (245) follows from the union bound; (246) comes from Chebyshex’s inequality and (236); and (247) comes from the fact that $\mathbb{V}\left[\sum_{t\leq n}T_{C_{t}^{n^{2}}}(x,x^{\prime})\right]=\sum_{t% \leq n}\mathbb{V}\left[T_{C_{t}^{n^{2}}}(x,x^{\prime})\right]\leq n$ , as the random variables $T_{C_{t}^{n^{2}}}(x,x^{\prime})$ are iid and takes values in $[0,1]$ . Hence

\displaystyle\lim_{n\rightarrow\infty}\frac{\log|\mathcal{C}^{\star}_{n^{3}}|}% {n^{3}}=\lim_{n\rightarrow\infty}\frac{\log\left|\left(\textstyle\prod_{t\leq n% }\mathcal{C}^{(t)}_{n}\right)^{n}\right|}{n^{3}}=\lim_{n\rightarrow\infty}% \frac{\log|\mathcal{C}_{n}|}{n}=C_{0}(G\wedge G^{\prime});

(248)

where the second equality holds as the shifted codebooks $(\mathcal{C}^{(t)}_{n})_{t\leq n}$ all have cardinality $|\mathcal{C}_{n}|$ .

Thus, by combining (248), Lemma 4, and Proposition 6 we obtain

\displaystyle C_{0}(G\wedge G^{\prime})=\lim_{n\rightarrow\infty}\frac{\log|% \mathcal{C}^{\star}_{n^{3}}|}{n^{3}}\leq C(G\wedge G^{\prime},P_{X}\otimes P_{% X^{\prime}})\leq C_{0}(G\wedge G^{\prime}),

(249)

hence $P_{X}\otimes P_{X^{\prime}}\in\mathcal{P}^{\star}(G\wedge G^{\prime})$ .

Therefore, Theorem 11 is proved when $\mathcal{A}$ has two elements:

\displaystyle P_{X,X^{\prime}}\in\mathcal{P}^{\star}(G\wedge G^{\prime})% \Longrightarrow P_{X}\otimes P_{X^{\prime}}\in\mathcal{P}^{\star}(G\wedge G^{% \prime}).

(250)

Now let us consider the case where $\mathcal{A}$ has a cardinality greater than 2. Let $P_{X_{1},...,X_{\mathcal{A}}}\in\mathcal{P}^{\star}(\bigwedge_{a\in\mathcal{A}% }G_{a})$ . By considering the product graphs

\displaystyle\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}=\Big{(}\bigwedge_{1% \leq i<i^{\star}}G_{i}\Big{)}\wedge\Big{(}\bigwedge_{i^{\star}\leq i\leq|% \mathcal{A}|}G_{i}\Big{)};

(251)

for all $i^{\star}\leq|\mathcal{A}|$ , and applying (250) successively, we obtain

$\displaystyle P_{X_{1},...,X_{\mathcal{A}}}\in\mathcal{P}^{\star}\left(% \textstyle\bigwedge_{a\in\mathcal{A}}G_{a}\right)$	$\displaystyle\Longrightarrow P_{X_{1}}\otimes P_{X_{2},...,X_{\|\mathcal{A}\|}}% \in\mathcal{P}^{\star}\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}\right)$	(252)
	$\displaystyle\Longrightarrow(P_{X_{1}}\otimes P_{X_{2}})\otimes P_{X_{3},...,X% _{\|\mathcal{A}\|}}\in\mathcal{P}^{\star}\left(\textstyle\bigwedge_{a\in\mathcal% {A}}G_{a}\right)$	(253)
	$\displaystyle\Longrightarrow\textstyle\bigotimes_{a\in\mathcal{A}}P_{X_{a}}\in% \mathcal{P}^{\star}\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}\right).$	(254)

Appendix F Results on capacity-achieving distributions

F-A Proof of Proposition 9

Let $G$ be a vertex-transitive graph, and let $P_{X}\in\mathcal{P}^{\star}(G)$ . Let $\psi\in\operatorname*{Aut}(G)$ , we first prove that $P_{\psi(X)}\in\mathcal{P}^{\star}(G)$ , then we will conclude by using the convexity of $\mathcal{P}^{\star}(G)$ .

Let $(\mathcal{C}_{n})_{n\in\mathbb{N}^{\star}}$ be a sequence such that

	$\displaystyle\forall n\in\mathbb{N}^{\star},\;\mathcal{C}_{n}\subseteq\mathcal% {X}^{n}\text{ is an independent set in }G^{\wedge n},$		(255)
	$\displaystyle\max_{x^{n}\in\mathcal{C}_{n}}\\|T_{x^{n}}-P_{X}\\|_{\infty}% \underset{n\rightarrow\infty}{\rightarrow}0,$		(256)
	$\displaystyle\frac{\log\|\mathcal{C}_{n}\|}{n}\underset{n\rightarrow\infty}{% \rightarrow}C(G,P_{X})=C_{0}(G).$		(257)

The existence of such a sequence is given by Lemma 4. Note that the last equality in (257) comes from the assumption $P_{X}\in\mathcal{P}^{\star}(G)$ .

Now, for all $n\in\mathbb{N}^{\star}$ the codebook

\displaystyle\psi(\mathcal{C}_{n})\doteq\{(\psi(x_{1}),...,\psi(x_{n}))\>|\>x^% {n}\in\mathcal{C}_{n}\}

(258)

is also independent in $G^{\wedge n}$ , as $\psi$ is a graph automorphism and therefore preserves adjacencies. We have by construction

\displaystyle\max_{x^{n}\in\psi(\mathcal{C}_{n})}\|T_{x^{n}}-P_{\psi(X)}\|_{% \infty}\underset{n\rightarrow\infty}{\rightarrow}0.

(259)

Furthermore, since $\psi$ is a bijection we have $|\psi(\mathcal{C}_{n})|=|\mathcal{C}_{n}|$ for all $n\in\mathbb{N}^{\star}$ , thus

\displaystyle\frac{\log|\psi(\mathcal{C}_{n})|}{n}=\frac{\log|\mathcal{C}_{n}|% }{n}\underset{n\rightarrow\infty}{\rightarrow}C_{0}(G).

(260)

Hence

\displaystyle P_{\psi(X)}\in\mathcal{P}^{\star}(G).

(261)

Now, for all $v,x^{\prime}\in\mathcal{X}$ , denote by $\mathcal{S}_{x^{\prime}\rightarrow v}\subseteq\operatorname*{Aut}(G)$ the set of automorphisms that map $x^{\prime}$ to $v$ ; note that this set is nonempty thanks to the vertex-transitivity of $G$ . We have for all $v\in\mathcal{X}$

\displaystyle\operatorname*{Aut}(G)=\textstyle\bigsqcup_{x^{\prime}\in\mathcal% {X}}\mathcal{S}_{x^{\prime}\rightarrow v}.

(262)

Furthermore, for all $v\in\mathcal{X}$ , all the sets $(\mathcal{S}_{x^{\prime}\rightarrow v})_{x^{\prime}\rightarrow v}$ have the same cardinality: for all $x^{\prime},x^{\prime\prime}\in\mathcal{X}$ ,

\displaystyle\mathcal{S}_{x^{\prime\prime}\rightarrow v}\circ\psi_{1}\subseteq% \mathcal{S}_{x^{\prime}\rightarrow v},

(263)

where $\psi_{1}\in\mathcal{S}_{x^{\prime}\rightarrow x^{\prime\prime}}$ . It follows that for all $v,x^{\prime}\in\mathcal{X}$ ,

\displaystyle|\mathcal{S}_{x^{\prime}\rightarrow v}|=\frac{|\operatorname*{Aut% }(G)|}{|\mathcal{X}|}.

(264)

Therefore, for all $v\in\mathcal{X}$ we have

	$\displaystyle\frac{1}{\|\operatorname{Aut}(G)\|}\sum_{\psi\in\operatorname{Aut% }(G)}P_{\psi(X)}\qquad\in\mathcal{P}^{\star}(G)$	(265)
$\displaystyle=\;$	$\displaystyle\bigg{(}\frac{1}{\|\operatorname{Aut}(G)\|}\sum_{\psi\in% \operatorname{Aut}(G)}P_{X}(\psi^{-1}(x))\bigg{)}_{x\in\mathcal{X}}$	(266)
$\displaystyle=\;$	$\displaystyle\bigg{(}\frac{1}{\|\operatorname*{Aut}(G)\|}\sum_{x^{\prime}\in% \mathcal{X}}\|\mathcal{S}_{x^{\prime}\rightarrow v}\|P_{X}(x^{\prime})\bigg{)}_{% x\in\mathcal{X}}$	(267)
$\displaystyle=\;$	$\displaystyle\bigg{(}\frac{1}{\|\operatorname{Aut}(G)\|}\sum_{x^{\prime}\in% \mathcal{X}}\frac{\|\operatorname{Aut}(G)\|}{\|\mathcal{X}\|}P_{X}(x^{\prime})% \bigg{)}_{x\in\mathcal{X}}$	(268)
$\displaystyle=\;$	$\displaystyle\operatorname*{Unif}(\mathcal{X});$	(269)

where (265) comes from the convexity of $\mathcal{P}^{\star}(G)$ given by Proposition 6 and (261); (267) comes from (262); and (268) comes from (264).

F-B Proof of Lemma 5

Let $(w_{a})_{a\in\mathcal{A}}\in\mathbb{R}^{|\mathcal{A}|}$ , and maximize

\displaystyle\zeta:P_{A}\mapsto H(P_{A})+\sum_{a\in\mathcal{A}}P_{A}(a)w_{a}.

(270)

It can be easily observed that $\zeta$ is strictly concave, hence the existence and uniqueness of the maximum. We have

\displaystyle\nabla\zeta(P_{A})=\left(-\log P_{A}(a)-\frac{1}{\ln 2}+w_{a}% \right)_{a\in\mathcal{A}},

(271)

hence

$\displaystyle\nabla\zeta(P_{A})\perp\Delta(\mathcal{A})$	$\displaystyle\Longleftrightarrow\exists C\in\mathbb{R},\,\nabla\zeta(P_{A})=(C% ,...,C)$	(272)
	$\displaystyle\Longleftrightarrow\exists C^{\prime}\in\mathbb{R},\,(-\log P_{A}% (a)+w_{a})_{a\in\mathcal{A}}=(C^{\prime},...,C^{\prime})$	(273)
	$\displaystyle\Longleftrightarrow\exists C^{\prime}\in\mathbb{R},\,P_{A}=2^{-C^% {\prime}}\left(2^{w_{a}}\right)_{a\in\mathcal{A}}$	(274)

The value of $C^{\prime}$ can be deduced from the fact that $P_{A}$ is a probability distribution: $2^{C^{\prime}}$ is the normalization constant $\sum_{a^{\prime}\in\mathcal{A}}2^{w_{a^{\prime}}}$ . Hence the maximum of $\zeta$ writes

\displaystyle P^{\star}_{A}=\left(\frac{2^{w_{a}}}{\sum_{a^{\prime}\in\mathcal% {A}}2^{w_{a^{\prime}}}}\right)_{a\in\mathcal{A}};

(275)

and we have

	$\displaystyle\zeta(P^{\star}_{A})$	$\displaystyle=\sum_{a\in\mathcal{A}}\ P^{\star}_{A}(a)\left(\log\left(\frac{% \sum_{a^{\prime}\in\mathcal{A}}2^{w_{a^{\prime}}}}{2^{w_{a}}}\right)+w_{a}\right)$		(276)
		$\displaystyle=\log\left(\sum_{a^{\prime}\in\mathcal{Z}}2^{w_{a^{\prime}}}% \right).$		(277)

Appendix G Proof of Theorem 12

We prove Theorem 12 in two steps, which are Lemma 14 and Lemma 15. The proofs are respectively given in App. G-A and G-B.

Lemma 14.

		$\displaystyle C_{0}\left(\bigwedge_{a\in\mathcal{A}}G_{a}\right)=\sum_{a\in% \mathcal{A}}C_{0}(G_{a})$		(278)
	$\displaystyle\Longrightarrow\;$	$\displaystyle\forall(P^{\star}_{X_{a}})_{a\in\mathcal{A}}\in\prod_{a\in% \mathcal{A}}\mathcal{P}^{\star}(G_{a}),\begin{cases}\bigotimes_{a\in\mathcal{A% }}P^{\star}_{X_{a}}\in\mathcal{P}^{\star}\left(\bigwedge_{a\in\mathcal{A}}G_{a% }\right)\text{ and }\\ C\left(\bigwedge_{a\in\mathcal{A}}G_{a},\;\bigotimes_{a\in\mathcal{A}}P^{\star% }_{X_{a}}\right)=\sum_{a\in\mathcal{A}}C(G_{a},P^{\star}_{X_{a}}).\end{cases}$		(279)

Lemma 15.

For all $P_{X_{1},...,X_{|\mathcal{A}|}}\in\mathcal{P}^{\star}\left(\bigwedge_{a\in% \mathcal{A}}G_{a}\right)$ , the following holds

		$\displaystyle C\left(\bigwedge_{a\in\mathcal{A}}G_{a},\;P_{X_{1},...,X_{\|% \mathcal{A}\|}}\right)=\sum_{a\in\mathcal{A}}C(G_{a},P_{X_{a}})$		(280)
	$\displaystyle\Longrightarrow\;$	$\displaystyle C_{0}\left(\bigwedge_{a\in\mathcal{A}}G_{a}\right)=\sum_{a\in% \mathcal{A}}C_{0}(G_{a})\text{ and }\forall a\in\mathcal{A},\;P_{X_{a}}\in% \mathcal{P}^{\star}(G_{a}).$		(281)

Let us prove Theorem 12. We consider a family of distributions $P_{X_{a}}\in\mathcal{P}^{\star}(G_{a})$ with $a\in\mathcal{A}$ . By Lemma 14, we have

	$\displaystyle C_{0}\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}\right)=% \textstyle\sum_{a\in\mathcal{A}}C_{0}(G_{a})$	(282)
$\displaystyle\Longrightarrow\;$	$\displaystyle\textstyle\bigotimes_{a\in\mathcal{A}}P^{\star}_{X_{a}}\in% \mathcal{P}^{\star}\left(\bigwedge_{a\in\mathcal{A}}G_{a}\right)\text{ and }C% \left(\bigwedge_{a\in\mathcal{A}}G_{a},\;\bigotimes_{a\in\mathcal{A}}P^{\star}% _{X_{a}}\right)=\sum_{a\in\mathcal{A}}C(G_{a},P^{\star}_{X_{a}})$	(283)
$\displaystyle\Longrightarrow\;$	$\displaystyle\exists P_{X_{1},...,X_{\|\mathcal{A}\|}}\in\mathcal{P}^{\star}% \left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}\right)\!,\;\;C\left(% \textstyle\bigwedge_{a\in\mathcal{A}}G_{a},\;P_{X_{1},...,X_{\|\mathcal{A}\|}}% \right)=\textstyle\sum_{a\in\mathcal{A}}C(G_{a},P_{X_{a}}).$	(284)

Conversely, by Lemma 15 we have

		$\displaystyle\exists P_{X_{1},...,X_{\|\mathcal{A}\|}}\in\mathcal{P}^{\star}% \left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}\right)\!,\;\;C\left(% \textstyle\bigwedge_{a\in\mathcal{A}}G_{a},\;P_{X_{1},...,X_{\|\mathcal{A}\|}}% \right)=\textstyle\sum_{a\in\mathcal{A}}C(G_{a},P_{X_{a}})$		(285)
	$\displaystyle\Longrightarrow\;$	$\displaystyle C_{0}\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}\right)=% \textstyle\sum_{a\in\mathcal{A}}C_{0}(G_{a}).$		(286)

Moreover, all distribution $P_{X_{1},...,X_{|\mathcal{A}|}}$ that satisfies (285), also satisfies $P_{X_{a}}\in\mathcal{P}^{\star}(G_{a})$ , for all $a\in\mathcal{A}$ .

G-A Proof of Lemma 14

For all family of graphs $(G_{a})_{a\in\mathcal{A}}$ , and family of distributions $P_{X_{a}}\in\mathcal{P}^{\star}(G_{a})$ with $a\in\mathcal{A}$ , we have

$\displaystyle C_{0}\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}\right)$	$\displaystyle=\max_{P_{X_{1},...,X_{\|\mathcal{A}\|}}}\;C\left(\textstyle% \bigwedge_{a\in\mathcal{A}}G_{a},\;P_{X_{1},...,X_{\|\mathcal{A}\|}}\right)$	(287)
	$\displaystyle\geq C\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a},\;% \bigotimes_{a\in\mathcal{A}}P^{\star}_{X_{a}}\right)$	(288)
	$\displaystyle\geq\textstyle\sum_{a\in\mathcal{A}}C(G_{a},P^{\star}_{X_{a}})$	(289)
	$\displaystyle=\textstyle\sum_{a\in\mathcal{A}}C_{0}(G_{a});$	(290)

where (287) comes from Theorem 10; (289) comes from Proposition 5; and (290) follows from the hypothesis $P_{X_{a}}\in\mathcal{P}^{\star}(G_{a})$ with $a\in\mathcal{A}$ .

Now assume that $\sum_{a\in\mathcal{A}}C_{0}(G_{a})=C_{0}\left(\bigwedge_{a\in\mathcal{A}}G_{a}\right)$ , then equality holds between the left-hand side of (287) and the term in (290). Therefore, we have

	$\displaystyle C_{0}\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}\right)=C% \left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a},\;\bigotimes_{a\in\mathcal{A}% }P^{\star}_{X_{a}}\right),\text{ hence }\textstyle\bigotimes_{a\in\mathcal{A}}% P^{\star}_{X_{a}}\in\mathcal{P}^{\star}\left(\textstyle\bigwedge_{a\in\mathcal% {A}}G_{a}\right);$		(291)
	$\displaystyle C\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a},\;\bigotimes_{% a\in\mathcal{A}}P^{\star}_{X_{a}}\right)=\textstyle\sum_{a\in\mathcal{A}}C(G_{% a},P^{\star}_{X_{a}}).$		(292)

G-B Proof of Lemma 15

Let $P_{X_{1},...,X_{|\mathcal{A}|}}\in\mathcal{P}^{\star}(\bigwedge_{a\in\mathcal{% A}}G_{a})$ , and let $P^{\star}_{X_{a}}\in\mathcal{P}^{\star}(G_{a})$ , for all $a\in\mathcal{A}$ . The following holds

$\displaystyle C_{0}\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}\right)$	$\displaystyle=C\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a},\;P_{X_{1},...% ,X_{\|\mathcal{A}\|}}\right)$	(293)
	$\displaystyle\geq C\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a},\;% \textstyle\bigotimes_{a\in\mathcal{A}}P^{\star}_{X_{a}}\right)$	(294)
	$\displaystyle\geq\textstyle\sum_{a\in\mathcal{A}}C(G_{a},P^{\star}_{X_{a}})$	(295)
	$\displaystyle\geq\textstyle\sum_{a\in\mathcal{A}}C(G_{a},P_{X_{a}});$	(296)

where (294) comes from the hypothesis $P_{X_{1},...,X_{|\mathcal{A}|}}\in\mathcal{P}^{\star}(\bigwedge_{a\in\mathcal{% A}}G_{a})$ ; (295) comes from Proposition 5; and (296) comes from the hypothesis $P^{\star}_{X_{a}}\in\mathcal{P}^{\star}(G_{a})$ , for all $a\in\mathcal{A}$ .

Now assume that

\displaystyle C\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a},\;P_{X_{1},...% ,X_{|\mathcal{A}|}}\right)=\textstyle\sum_{a\in\mathcal{A}}C(G_{a},P_{X_{a}}).

(297)

Then equality holds in between the right-hand side of (293) and the term in (296). In particular, we have for all $a\in\mathcal{A}$

\displaystyle C(G_{a},P_{X_{a}})=C(G_{a},P^{\star}_{X_{a}}),

(298)

which implies that $P_{X_{a}}$ also maximizes $C(G_{a},\cdot)$ for all $a\in\mathcal{A}$ :

\displaystyle\forall a\in\mathcal{A},\;P_{X_{a}}\in\mathcal{P}^{\star}(G_{a}).

(299)

Furthermore,

$\displaystyle C_{0}\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a}\right)$	$\displaystyle=C\left(\textstyle\bigwedge_{a\in\mathcal{A}}G_{a},\;P_{X_{1},...% ,X_{\|\mathcal{A}\|}}\right)$	(300)
	$\displaystyle=\textstyle\sum_{a\in\mathcal{A}}C(G_{a},P_{X_{a}})$	(301)
	$\displaystyle=\textstyle\sum_{a\in\mathcal{A}}C_{0}(G_{a});$	(302)

where (301) is a consequence of the equality in equations (293)-(296), and (302) comes from (299).

Appendix H Proof of Theorem 13

The techniques used in this proof are the same as in the proof of Theorem 12. We prove Theorem 13 in two steps, which are Lemma 16 and Lemma 17; their proofs are respectively given in App. H-A and H-B.

Lemma 16.

Let

\displaystyle P^{\star}_{A}\doteq\left(\frac{2^{C_{0}(G_{a})}}{\sum_{a^{\prime% }\in\mathcal{A}}2^{C_{0}(G_{a^{\prime}})}}\right)_{a\in\mathcal{A}},

(303)

we have

	$\displaystyle C_{0}\left(\bigsqcup_{a\in\mathcal{A}}G_{a}\right)=\log\left(% \sum_{a\in\mathcal{A}}2^{C_{0}(G_{a})}\right)$
$\displaystyle\Longrightarrow\;$	$\displaystyle\forall(P^{\star}_{X_{a}})_{a\in\mathcal{A}}\in\prod_{a\in% \mathcal{A}}\mathcal{P}^{\star}(G_{a}),\;\sum_{a\in\mathcal{A}}P^{\star}_{A}(a% )P^{\star}_{X_{a}}\in\mathcal{P}^{\star}\left(\bigsqcup_{a\in\mathcal{A}}G_{a}% \right)\text{ and }$
	$\displaystyle C\left(\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_{a},\;\sum_{a\in% \mathcal{A}}P^{\star}_{A}(a)P^{\star}_{X_{a}}\right)=H(P^{\star}_{A})+\sum_{a% \in\mathcal{A}}P^{\star}_{A}(a)C(G_{a},P^{\star}_{X_{a}}),$	(304)

Lemma 17.

Let

\displaystyle P^{\star}_{A}\doteq\left(\frac{2^{C_{0}(G_{a})}}{\sum_{a^{\prime% }\in\mathcal{A}}2^{C_{0}(G_{a^{\prime}})}}\right)_{a\in\mathcal{A}},

(305)

for all $\sum_{a\in\mathcal{A}}P_{A}(a)P_{X_{a}}\in\mathcal{P}^{\star}\left(\bigsqcup_{% a\in\mathcal{A}}G_{a}\right)$ the following holds

		$\displaystyle C\left(\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_{a},\;\sum_{a\in% \mathcal{A}}P_{A}(a)P_{X_{a}}\right)=H(P_{A})+\sum_{a\in\mathcal{A}}P_{A}(a)C(% G_{a},P_{X_{a}})$
	$\displaystyle\Longrightarrow\;$	$\displaystyle C_{0}\left(\bigsqcup_{a\in\mathcal{A}}G_{a}\right)=\log\left(% \sum_{a\in\mathcal{A}}2^{C_{0}(G_{a})}\right)\!,\,(P_{X_{a}})_{a\in\mathcal{A}% }\in\prod_{a\in\mathcal{A}}\mathcal{P}^{\star}(G_{a}),\text{ and }P_{A}=P^{% \star}_{A}.$		(306)

Now let us prove Theorem 13. Let $(P^{\star}_{X_{a}})_{a\in\mathcal{A}}\in\prod_{a\in\mathcal{A}}\mathcal{P}^{% \star}(G_{a})$ , we have by Lemma 16

	$\displaystyle C_{0}\left(\textstyle\bigsqcup_{a\in\mathcal{A}}G_{a}\right)=% \log\left(\textstyle\sum_{a\in\mathcal{A}}2^{C_{0}(G_{a})}\right)$	(307)
$\displaystyle\Longrightarrow\;$	$\displaystyle\textstyle\sum_{a\in\mathcal{A}}P^{\star}_{A}(a)P^{\star}_{X_{a}}% \in\mathcal{P}^{\star}\left(\textstyle\bigsqcup_{a\in\mathcal{A}}G_{a}\right)% \text{ and }$
	$\displaystyle C\left(\textstyle\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_{a},\;% \textstyle\sum_{a\in\mathcal{A}}P^{\star}_{A}(a)P^{\star}_{X_{a}}\right)=H(P^{% \star}_{A})+\textstyle\sum_{a\in\mathcal{A}}P^{\star}_{A}(a)C(G_{a},P^{\star}_% {X_{a}}),$	(308)
$\displaystyle\Longrightarrow\;$	$\displaystyle\exists P_{X}\in\mathcal{P}^{\star}\left(\textstyle\bigsqcup_{a% \in\mathcal{A}}G_{a}\right)\!,\,$
	$\displaystyle C\left(\textstyle\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_{a},\;% \textstyle\sum_{a\in\mathcal{A}}P^{\star}_{A}(a)P^{\star}_{X_{a}}\right)=H(P^{% \star}_{A})+\textstyle\sum_{a\in\mathcal{A}}P^{\star}_{A}(a)C(G_{a},P^{\star}_% {X_{a}}),$

where $P^{\star}_{X_{a}}=P_{X|X\in\mathcal{X}_{a}}$ and $P^{\star}_{A}(a)=P_{X}(\mathcal{X}_{a})$ for all $a\in\mathcal{A}$ .

Conversely, by Lemma 17 we have

	$\displaystyle\exists P_{X}\in\mathcal{P}^{\star}\left(\textstyle\bigsqcup_{a% \in\mathcal{A}}G_{a}\right)\!,$	(309)
	$\displaystyle C\left(\textstyle\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_{a},\;% \textstyle\sum_{a\in\mathcal{A}}P_{A}(a)P_{X_{a}}\right)=H(P_{A})+\textstyle% \sum_{a\in\mathcal{A}}P_{A}(a)C(G_{a},P_{X_{a}})$
$\displaystyle\Longrightarrow\;$	$\displaystyle C_{0}\left(\textstyle\bigsqcup_{a\in\mathcal{A}}G_{a}\right)=% \log\left(\textstyle\sum_{a\in\mathcal{A}}2^{C_{0}(G_{a})}\right),$	(310)

and any $P_{X}=\sum_{a\in\mathcal{A}}P_{A}(a)P_{X_{a}}$ that satisfies (309) also satisfies

\displaystyle(P_{X_{a}})_{a\in\mathcal{A}}\in\prod_{a\in\mathcal{A}}\mathcal{P% }^{\star}(G_{a}),\text{ and }P_{A}=\left(\frac{2^{C_{0}(G_{a})}}{\sum_{a^{% \prime}\in\mathcal{A}}2^{C_{0}(G_{a^{\prime}})}}\right)_{a\in\mathcal{A}}.

(311)

H-A Proof of Lemma 16

Now let us prove Lemma 16. Let

	$\displaystyle\textstyle\sum_{a\in\mathcal{A}}P_{A}(a)P_{X_{a}}\in\mathcal{P}^{% \star}\left(\textstyle\bigsqcup_{a\in\mathcal{A}}G_{a}\right),$		(312)
	$\displaystyle(P^{\star}_{X_{a}})_{a\in\mathcal{A}}\in\textstyle\prod_{a\in% \mathcal{A}}\mathcal{P}^{\star}(G_{a}),$		(313)
	$\displaystyle P^{\star}_{A}\doteq\left(\frac{2^{C_{0}(G_{a})}}{\sum_{a^{\prime% }\in\mathcal{A}}2^{C_{0}(G_{a^{\prime}})}}\right)_{a\in\mathcal{A}}.$		(314)

We have

$\displaystyle C_{0}\left(\textstyle\bigsqcup_{a\in\mathcal{A}}G_{a}\right)$	$\displaystyle=C\left(\textstyle\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_{a},\;\sum% _{a\in\mathcal{A}}P_{A}(a)P_{X_{a}}\right)$	(315)
	$\displaystyle\geq C\left(\textstyle\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_{a},\;% \sum_{a\in\mathcal{A}}P^{\star}_{A}(a)P^{\star}_{X_{a}}\right)$	(316)
	$\displaystyle\geq H(P^{\star}_{A})+\textstyle\sum_{a\in\mathcal{A}}P^{\star}_{% A}(a)C(G_{a},P^{\star}_{X_{a}})$	(317)
	$\displaystyle=H(P^{\star}_{A})+\textstyle\sum_{a\in\mathcal{A}}P^{\star}_{A}(a% )C_{0}(G_{a})$	(318)
	$\displaystyle=\log\left(\textstyle\sum_{a\in\mathcal{A}}2^{C_{0}(G_{a})}\right);$	(319)

where (315) and (316) come from (312) and Proposition 6; (317) comes from Proposition 5; (318) comes from (313) and Proposition 6; and (319) comes from (314) and Lemma 5.

Assume that $C_{0}\left(\textstyle\bigsqcup_{a\in\mathcal{A}}G_{a}\right)=\log\left(% \textstyle\sum_{a\in\mathcal{A}}2^{C_{0}(G_{a})}\right)$ , then equality holds in (315) to (319), therefore the following holds:

	$\displaystyle C_{0}\left(\textstyle\bigsqcup_{a\in\mathcal{A}}G_{a}\right)=% \log\left(\textstyle\sum_{a\in\mathcal{A}}2^{C_{0}(G_{a})}\right)$
$\displaystyle\Longrightarrow\;$	$\displaystyle\forall(P^{\star}_{X_{a}})_{a\in\mathcal{A}}\in\textstyle\prod_{a% \in\mathcal{A}}\mathcal{P}^{\star}(G_{a}),\;\textstyle\sum_{a\in\mathcal{A}}P^% {\star}_{A}(a)P^{\star}_{X_{a}}\in\mathcal{P}^{\star}\left(\textstyle\bigsqcup% _{a\in\mathcal{A}}G_{a}\right)\text{ and }$
	$\displaystyle C\left(\textstyle\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_{a},\;\sum% _{a\in\mathcal{A}}P^{\star}_{A}(a)P^{\star}_{X_{a}}\right)=H(P^{\star}_{A})+% \textstyle\sum_{a\in\mathcal{A}}P^{\star}_{A}(a)C(G_{a},P^{\star}_{X_{a}}).$	(320)

H-B Proof of Lemma 17

Let

	$\displaystyle\textstyle\sum_{a\in\mathcal{A}}P_{A}(a)P_{X_{a}}\in\mathcal{P}^{% \star}\left(\textstyle\bigsqcup_{a\in\mathcal{A}}G_{a}\right),$		(321)
	$\displaystyle(P^{\star}_{X_{a}})_{a\in\mathcal{A}}\in\textstyle\prod_{a\in% \mathcal{A}}\mathcal{P}^{\star}(G_{a}),$		(322)
	$\displaystyle P^{\star}_{A}\doteq\left(\frac{2^{C_{0}(G_{a})}}{\sum_{a^{\prime% }\in\mathcal{A}}2^{C_{0}(G_{a^{\prime}})}}\right)_{a\in\mathcal{A}}.$		(323)

We have

$\displaystyle C\left(\textstyle\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_{a},\;\sum% _{a\in\mathcal{A}}P_{A}(a)P_{X_{a}}\right)$	$\displaystyle=C_{0}\left(\textstyle\bigsqcup_{a\in\mathcal{A}}G_{a}\right)$	(324)
	$\displaystyle\geq\log\left(\textstyle\sum_{a\in\mathcal{A}}2^{C_{0}(G_{a})}\right)$	(325)
	$\displaystyle=H(P^{\star}_{A})+\textstyle\sum_{a\in\mathcal{A}}P^{\star}_{A}(a% )C_{0}(G_{a})$	(326)
	$\displaystyle\geq H(P_{A})+\textstyle\sum_{a\in\mathcal{A}}P_{A}(a)C_{0}(G_{a})$	(327)
	$\displaystyle=H(P_{A})+\textstyle\sum_{a\in\mathcal{A}}P_{A}(a)C(G_{a},P^{% \star}_{X_{a}})$	(328)
	$\displaystyle\geq H(P_{A})+\textstyle\sum_{a\in\mathcal{A}}P_{A}(a)C(G_{a},P_{% X_{a}});$	(329)

where (324) comes from (321) and Proposition 6; (325) comes from (29), see [3, Theorem 4]; (326) and (327) come from (323) and Lemma 5, which can be found in App. H-A; (328) and (329) come from (322) and Proposition 6.

Assume that $C\left(\textstyle\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_{a},\;\sum_{a\in\mathcal% {A}}P_{A}(a)P_{X_{a}}\right)=H(P_{A})+\textstyle\sum_{a\in\mathcal{A}}P_{A}(a)% C(G_{a},P_{X_{a}})$ , then equality holds in (324) to (329). In particular $P_{A}=P^{\star}_{A}$ as a consequence of the equality between (326) and (327); and $(P_{X_{a}})_{a\in\mathcal{A}}\in\prod_{a\in\mathcal{A}}\mathcal{P}^{\star}(G_{% a})$ as a consequence of the equality between (328) and (329). Thus, for all $\sum_{a\in\mathcal{A}}P_{A}(a)P_{X_{a}}\in\mathcal{P}^{\star}\left(\textstyle% \bigsqcup_{a\in\mathcal{A}}G_{a}\right)$ the following holds:

		$\displaystyle\textstyle C\left(\textstyle\bigsqcup^{P_{A}}_{a\in\mathcal{A}}G_% {a},\;\sum_{a\in\mathcal{A}}P_{A}(a)P_{X_{a}}\right)=H(P_{A})+\textstyle\sum_{% a\in\mathcal{A}}P_{A}(a)C(G_{a},P_{X_{a}})$
	$\displaystyle\Longrightarrow\;$	$\displaystyle C_{0}\left(\textstyle\bigsqcup_{a\in\mathcal{A}}G_{a}\right)=% \log\left(\textstyle\sum_{a\in\mathcal{A}}2^{C_{0}(G_{a})}\right),\;(P_{X_{a}}% )_{a\in\mathcal{A}}\in\prod_{a\in\mathcal{A}}\mathcal{P}^{\star}(G_{a}),\text{% and }P_{A}=P^{\star}_{A}.$		(330)

Appendix I Proof of Theorem 17

Lemma 18 comes from [35, Corollary 1], and states that the function $P_{A}\mapsto H_{\kappa}\big{(}\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}\big{)}$ , defined analogously to $P_{A}\mapsto\overline{H}\big{(}\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}\big{)}$ , is always linear. The proof of Lemma 19 is given in App. I-A.

Lemma 18 (from [42, Corollary 3.4]).

For all probabilistic graphs $(G_{a})_{a\in\mathcal{A}}$ and $P_{A}\in\Delta(\mathcal{A})$ , we have $H_{\kappa}\big{(}\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}\big{)}=\sum_{a\in% \mathcal{A}}P_{A}(a)H_{\kappa}(G_{a})$ .

Lemma 19.

The probabilistic graph $\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}$ is perfect if and only if $G_{a}$ is perfect for all $a\in\mathcal{A}$ .

Now let us prove Theorem 17.

For all $a\in\mathcal{A}$ , let $G_{a}=(\mathcal{X}_{a},\mathcal{E}_{a},P_{X_{a}})$ be a perfect probabilistic graph. By Lemma 19, $\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}$ is also perfect; and we have $\overline{H}\big{(}\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}\big{)}=H_{\kappa}% \big{(}\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}\big{)}$ by Theorem 14. We also have $H_{\kappa}\big{(}\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}\big{)}=\sum_{a\in% \mathcal{A}}P_{A}(a)H_{\kappa}(G_{a})=\sum_{a\in\mathcal{A}}P_{A}(a)\overline{% H}(G_{a})$ by Lemma 18 and Theorem 14 used on the perfect graphs $(G_{a})_{a\in\mathcal{A}}$ . Thus

\displaystyle\overline{H}\big{(}\textstyle\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G% _{a}\big{)}=\textstyle\sum_{a\in\mathcal{A}}P_{A}(a)\overline{H}(G_{a}).

(331)

By Theorem 4, it follows that $\overline{H}\left(\bigwedge_{a\in\mathcal{A}}G_{a}\right)=\sum_{a\in\mathcal{A% }}\overline{H}(G_{a})=\sum_{a\in\mathcal{A}}H_{\kappa}(G_{a})$ , where the last equality comes from Theorem 14.

I-A Proof of Lemma 19

$(\Longrightarrow)$ Let $G=\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}$ be a perfect probabilistic graph. Let $a^{\prime}\in\mathcal{A}$ and $\mathcal{S}_{a^{\prime}}\subset\mathcal{X}_{a^{\prime}}$ . We have $\chi\big{(}\big{(}\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}\big{)}[\mathcal{S}_% {a^{\prime}}]\big{)}=\omega\big{(}\big{(}\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_% {a}\big{)}[\mathcal{S}_{a^{\prime}}]\big{)}$ since $G$ is perfect, and therefore $\chi(G_{a^{\prime}}[\mathcal{S}_{a^{\prime}}])=\omega(G_{a^{\prime}}[\mathcal{% S}_{a^{\prime}}])$ , as $\big{(}\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}\big{)}[\mathcal{S}_{a^{\prime}% }]=G_{a^{\prime}}[\mathcal{S}_{a^{\prime}}]$ . Thus all the graphs $(G_{a})_{a\in\mathcal{A}}$ are perfect.

$(\Longleftarrow)$ Conversely, assume that for all $a\in\mathcal{A}$ , $G_{a}=(\mathcal{X}_{a},\mathcal{E}_{a},P_{X_{a}})$ is perfect. Then for all $\mathcal{S}\subset\bigsqcup_{a\in\mathcal{A}}\mathcal{X}_{a}$ , $\mathcal{S}$ can be written as $\bigsqcup_{a\in\mathcal{A}}\mathcal{S}_{a}$ where $\mathcal{S}_{a}\subset\mathcal{X}_{a}$ for all $a\in\mathcal{A}$ , and we have for all $P_{A}\in\Delta(\mathcal{A})$ :

$\displaystyle\chi\left(\left(\textstyle\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a% }\right)[\mathcal{S}]\right)$	$\displaystyle=\chi\left(\textstyle\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}[% \mathcal{S}_{a}]\right)$	(332)
	$\displaystyle=\max_{a\in\mathcal{A}}\chi\left(G_{a}[\mathcal{S}_{a}]\right)$	(333)
	$\displaystyle=\max_{a\in\mathcal{A}}\omega\left(G_{a}[\mathcal{S}_{a}]\right),$	(334)

and similarly, $\omega\big{(}\big{(}\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}\big{)}[\mathcal{S% }]\big{)}=\max_{a\in\mathcal{A}}\omega\left(G_{a}[\mathcal{S}_{a}]\right)$ . Hence $\bigsqcup_{a\in\mathcal{A}}^{P_{A}}G_{a}$ is also perfect.

References

[1] C. E. Shannon, “A mathematical theory of communication,” The Bell system technical journal, vol. 27, no. 3, pp. 379–423, 1948.
[2] D. A. Huffman, “A method for the construction of minimum-redundancy codes,” Proceedings of the IRE, vol. 40, no. 9, pp. 1098–1101, 1952.
[3] C. Shannon, “The zero error capacity of a noisy channel,” IRE Transactions on Information Theory, vol. 2, no. 3, pp. 8–19, 1956.
[4] A. Vesel and J. Žerovnik, “Improved lower bound on the Shannon capacity of $C_{7}$ ,” Information Processing Letters, vol. 81, no. 5, pp. 277–282, 2002.
[5] B. Codenotti, I. Gerace, and G. Resta, “Some remarks on the Shannon capacity of odd cycles,” Ars Combinatoria, vol. 66, pp. 243–258, 2003.
[6] S. C. Polak and A. Schrijver, “New lower bound on the Shannon capacity of $C_{7}$ from circular graphs,” Information Processing Letters, vol. 143, pp. 37–40, 2019.
[7] I. Csiszár and J. Körner, Information theory: coding theorems for discrete memoryless systems. Cambridge University Press, 2011.
[8] S. Klavzar, R. Hammack, and W. Imrich, “Handbook of graph products,” 2011.
[9] C. Berge, Graphs and Hypergraphs, ser. North-Holland mathematical library. Amsterdam, 1973.
[10] M. Grötschel, L. Lovász, and A. Schrijver, “Polynomial algorithms for perfect graphs,” Ann. Discrete Math, vol. 21, pp. 325–356, 1984.
[11] C. Berge, “Farbung von graphen, deren samtliche bzw. deren ungerade kreise starr sind,” Wissenschaftliche Zeitschrift, 1961.
[12] M. Chudnovsky, N. Robertson, P. D. Seymour, and R. Thomas, “Progress on perfect graphs,” Mathematical Programming, vol. 97, no. 1-2, pp. 405–422, 2003.
[13] L. Lovász, “On the Shannon capacity of a graph,” IEEE Transactions on Information Theory, vol. 25, no. 1, pp. 1–7, 1979.
[14] H. Witsenhausen, “The zero-error side information problem and chromatic numbers (corresp.),” IEEE Transactions on Information Theory, vol. 22, no. 5, pp. 592–593, 1976.
[15] D. Slepian and J. Wolf, “Noiseless coding of correlated information sources,” IEEE Transactions on information Theory, vol. 19, no. 4, pp. 471–480, 1973.
[16] N. Alon and A. Orlitsky, “Source coding and graph entropies,” IEEE Transactions on Information Theory, vol. 42, no. 5, pp. 1329–1339, 1996.
[17] P. Koulgi, E. Tuncel, S. L. Regunathan, and K. Rose, “On zero-error source coding with decoder side information,” IEEE Transactions on Information Theory, vol. 49, no. 1, pp. 99–111, 2003.
[18] J. Körner and G. Longo, “Two-step encoding for finite sources,” IEEE Transactions on Information Theory, vol. 19, no. 6, pp. 778–782, 1973.
[19] J. Körner, “Coding of an information source having ambiguous alphabet and the entropy of graphs,” Transactions of the 6th Prague Conference on Information Theory, pp. 411—425, 1973.
[20] W. Haemers, “On some problems of Lovász concerning the Shannon capacity of a graph,” IEEE Transactions on Information Theory, vol. 25, no. 2, pp. 231–232, 1979.
[21] E. Tuncel, J. Nayak, P. Koulgi, and K. Rose, “On complementary graph entropy,” IEEE transactions on information theory, vol. 55, no. 6, pp. 2537–2546, 2009.
[22] A. Wigderson and J. Zuiddam, Asymptotic spectra: Theory, applications and extensions. manuscript, 2023.
[23] A. Schrijver, “On the Shannon capacity of sums and products of graphs,” Indagationes Mathematicae, vol. 34, no. 1, pp. 37–41, 2023.
[24] I. Csiszár and J. Körner, “On the capacity of the arbitrarily varying channel for maximum probability of error,” Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, vol. 57, no. 1, pp. 87–101, 1981.
[25] L. Gargano, J. Körner, and U. Vaccaro, “Capacities: from information theory to extremal set theory,” Journal of Combinatorial Theory, Series A, vol. 68, no. 2, pp. 296–316, 1994.
[26] K. Marton, “On the Shannon capacity of probabilistic graphs,” Journal of Combinatorial Theory, Series B, vol. 57, no. 2, pp. 183–195, 1993.
[27] L. Lovász, “On the Shannon capacity of a graph,” IEEE Transactions on Information theory, vol. 25, no. 1, pp. 1–7, 1979.
[28] I. Sason, “Observations on the Lovász $\theta$ -function, graph capacity, eigenvalues, and strong products,” Entropy, vol. 25, no. 1, p. 104, 2023.
[29] A. Vesel and J. Žerovnik, “Improved lower bound on the Shannon capacity of $C_{7}$ ,” Information processing letters, vol. 81, no. 5, pp. 277–282, 2002.
[30] K. A. Mathew and P. R. Östergård, “New lower bounds for the Shannon capacity of odd cycles,” Designs, Codes and Cryptography, vol. 84, pp. 13–22, 2017.
[31] S. C. Polak and A. Schrijver, “New lower bound on the Shannon capacity of c7 from circular graphs,” Information Processing Letters, vol. 143, pp. 37–40, 2019.
[32] M. Chudnovsky and P. D. Seymour, “The structure of claw-free graphs.” in BCC, 2005, pp. 153–171.
[33] B. Bukh and C. Cox, “On a fractional version of Haemers’ bound,” IEEE Transactions on Information Theory, vol. 65, no. 6, pp. 3340–3348, 2018.
[34] L. Gao, S. Gribling, and Y. Li, “On a tracial version of Haemers bound,” IEEE Transactions on Information Theory, 2022.
[35] G. Simonyi, “Perfect graphs and graph entropy. An updated survey,” in Perfect Graphs, B. A. Reed and J. L. R. Alfonsin, Eds. John Wiley & Sons, 2001, ch. 13, pp. 293–328.
[36] H. Touchette, “Legendre-Fenchel transforms in a nutshell,” URL http://www. maths. qmul. ac. uk/~ ht/archive/lfth2. pdf, 2005.
[37] I. Csiszár, J. Körner, L. Lovász, K. Marton, and G. Simonyi, “Entropy splitting for antiblocking corners and perfect graphs,” Combinatorica, vol. 10, no. 1, pp. 27–40, 1990.
[38] M. Chudnovsky, N. Robertson, P. Seymour, and R. Thomas, “The strong perfect graph theorem,” Annals of mathematics, pp. 51–229, 2006.
[39] P. J. Cameron, “6-transitive graphs,” Journal of Combinatorial Theory, Series B, vol. 28, no. 2, pp. 168–179, 1980.
[40] N. Alon, “The Shannon capacity of a union,” Combinatorica, vol. 18, no. 3, pp. 301–310, 1998.
[41] J. Zuiddam et al., Algebraic complexity, asymptotic spectra and entanglement polytopes. Institute for Logic, Language and Computation, 2018.
[42] G. Simonyi, “Graph entropy: a survey,” Combinatorial Optimization, vol. 20, pp. 399–441, 1995.

	$\displaystyle\|\eta(P_{A})-\eta(P^{\prime}_{A})\|$	$\displaystyle\leq\lim_{n\rightarrow\infty}\log\left(\max_{a}\|\mathcal{X}_{a}\|% \right)\cdot\\|T_{\overline{a}^{n}}-T_{\overline{a}^{\prime n}}\\|_{1}$		(145)
		$\displaystyle=\log\left(\max_{a}\|\mathcal{X}_{a}\|\right)\cdot\\|P_{A}-P^{\prime% }_{A}\\|_{1}.$		(146)

	$\displaystyle\frac{\log\|\mathcal{C}_{n}\|}{n}\underset{n\rightarrow\infty}{% \rightarrow}C(G,P_{X}),$	$\displaystyle\frac{\log\|\mathcal{C}^{\prime}_{n}\|}{n}\underset{n\rightarrow% \infty}{\rightarrow}C(G,P^{\prime}_{X}),$		(208)
	$\displaystyle\max_{x^{n}\in\mathcal{C}_{n}}\\|T_{x^{n}}-P_{X}\\|_{\infty}% \underset{n\rightarrow\infty}{\rightarrow}0,$	$\displaystyle\max_{x^{n}\in\mathcal{C}^{\prime}_{n}}\\|T_{x^{n}}-P_{X}\\|_{% \infty}\underset{n\rightarrow\infty}{\rightarrow}0.$		(209)

Linearization of optimal rates for independent zero-error source and channel problems

Abstract

I Introduction

II Zero-Error Source Coding With Decoder Side Information

II-A Problem Statement and Results from the Literature

Definition 1.

Theorem 1 (from [15, Theorem 2]).

Definition 2 (Characteristic graph).

Definition 3 (Coloring, chromatic number χ𝜒\chiitalic_χ).

Definition 4 (AND product ∧\wedge∧).

Definition 5 (Chromatic entropy Hχsubscript𝐻𝜒H_{\chi}italic_H start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT).

Theorem 2 (from [16, Lemma 6]).

Definition 6.

Theorem 3 (from [17, Theorem 1]).

Proposition 1 (Full support, from [14]).

Proof.

II-B Independent Zero-Error Side-Information problems

Proposition 2.

Definition 7 (Disjoint union of probabilistic graphs ⊔square-union\sqcup⊔).

Theorem 4 (Equivalence of the linearization of ∧\wedge∧ and ⊔square-union\sqcup⊔).

Corollary 1.

II-C Sketch of Proof of Theorem 4

Lemma 1.

Lemma 2.

Lemma 3.

II-D Zero-error source coding with side information at the decoder and partial side information at the encoder

Proposition 3.

III Zero-Error Channel Coding Problem

III-A Zero-error channel capacity

Definition 8.

Theorem 5 (from [1]).

Definition 9 (Independent subset, independence number α𝛼\alphaitalic_α).

Theorem 6 (from [3]).

Remark 1.

III-B Independent zero-error channel coding problems

Proposition 4 (from [3]).

Theorem 7 (from [20]).

Theorem 8 (from [22, Theorem 4.1] and [23, Theorem 2]).

Definition 10.

Lemma 4 (from [24]).

Theorem 9 (from [26, Lemma 1]).

Proposition 5.

III-D Capacity achieving distributions and equivalence of the linearizations of C0subscript𝐶0C_{0}italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and H¯¯𝐻\overline{H}over¯ start_ARG italic_H end_ARG for the AND product

Theorem 10 (from [25, Theorem 2]).

Definition 11.

Proposition 6.

Theorem 11.

Theorem 12.

III-E Linearization of the sum of independent channels

Proposition 7 (from [3]).

Lemma 5.

Remark 2.

Theorem 13.

Remark 3.

IV Main Example and Counterexamples for the Linearization of Optimal Rates

IV-A Perfect graphs

Definition 12 (Graph complement, clique number ω𝜔\omegaitalic_ω).

Definition 13 (Perfect graph).

Definition 14 (Körner graph entropy Hκsubscript𝐻𝜅H_{\kappa}italic_H start_POSTSUBSCRIPT italic_κ end_POSTSUBSCRIPT).

Theorem 14 (from [37, Corollary 12]).

Theorem 15 (from [3, Theorem 3]).

Proposition 8.

Theorem 16 (Strong perfect graph theorem, from [38, Theorem 1.2]).

Theorem 17.

Corollary 2.

Theorem 18 (from [21, Lemma 3]).

Corollary 3.

IV-B Vertex transitive graphs

Definition 15 (Vertex-transitive graph).

Proposition 9.

Corollary 4.

IV-C The Schläfli graph

Corollary 5.

Theorem 19.

Remark 4.

V Conclusion

Appendix A Proof of Theorem 4

Lemma 6.

A-A Proof of Lemma 3

Lemma 7.

Definition 3 (Coloring, chromatic number $\chi$ ).

Definition 4 (AND product $\wedge$ ).

Definition 5 (Chromatic entropy $H_{\chi}$ ).

Definition 7 (Disjoint union of probabilistic graphs $\sqcup$ ).

Theorem 4 (Equivalence of the linearization of $\wedge$ and $\sqcup$ ).

Definition 9 (Independent subset, independence number $\alpha$ ).

III-D Capacity achieving distributions and equivalence of the linearizations of $C_{0}$ and $\overline{H}$ for the AND product

Definition 12 (Graph complement, clique number $\omega$ ).

Definition 14 (Körner graph entropy $H_{\kappa}$ ).